Text Extractor

This stage uses Apache Tika to extract textual contents from a given binary, text or HTML document. It also adds additional metadata which are generated during text extraction to the document metadata.

This stage does not have additional configuration parameters.