Flume Solr BlobDeserializer Configuration Options
Using SpoolDirectorySource, Flume can ingest data from files located in a directory on disk. Unlike other asynchronous sources, SpoolDirectorySource does not lose data even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.
By default, SpoolDirectorySource uses the newline (\n) delimiter to split input into Flume events. You can change this behavior by configuring the Solr BlobDeserializer to read binary large objects (BLOBs) from SpoolDirectorySource. Generally, each file is one BLOB (such as a PDF or image file). Because the entire BLOB is buffered in RAM, this usage is not generally appropriate for very large objects.
The Solr BlobDeserializer supports the following configuration options (required options in bold):
Property Name | Default | Description |
---|---|---|
deserializer | Must be set to the fully qualified class name (FQCN) org.apache.flume.sink.solr. morphline.BlobDeserializer$Builder. | |
deserializer.maxBlobLength | 100000000 (100 MB) | Specifies the maximum number of bytes to read and buffer per request. |
agent.sources.spoolSrc.type = spooldir agent.sources.spoolSrc.spoolDir = /tmp/myspooldir agent.sources.spoolSrc.ignorePattern = \. agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000 agent.sources.spoolSrc.batchSize = 1 agent.sources.spoolSrc.fileHeader = true agent.sources.spoolSrc.fileHeaderKey = resourceName agent.sources.spoolSrc.interceptors = uuidinterceptor agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id #agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false #agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = flume01.example.com agent.sources.spoolSrc.channels = memoryChannel