Content Transformation

Before documents become indexed, they run through the connector’s content transformation pipeline. This pipeline can be empty but you can also apply different transformation steps.

Please note that the content transformation is only executed if a document was detected as new or changed. Unchanged and deleted documents do not trigger the content transformation pipeline.

The following stages are available

  1. ACL Assigner

  2. Document Splitter

  3. Metadata Assigner

  4. Metadata Mapper

  5. Text Extractor

  6. Content Transformation - Vectorizer

Matcher

You can decide if a content transformation stage should be executed. If the list of matchers per stage is empty, the stage is executed for every indexed document.

Configuration Parameters:

  1. Field name: is the name of a document field. Please refer to the metadata displayed in State View

  2. Condition Type:

    1. Match regex. Here you can add a regular expression in the next field. The regular expression must be a valid Java regex format.

    2. Match empty value. This means that the matcher matches, if the given field in 1. does not carry a value or if the value(s) are empty.

You can add multiple matchers per stage. This means that all matchers must match the document. Otherwise the stage is not executed for this document.

Debugging and Visualization

The content transformation pipeline leaves log traces. Moreover, you can see how a document got transformed within the State View . For each document, each applied stage generates one set of metadata with highlighted differences.

Execution Order

The ordering of the respective transformers in the pipeline’s configuration dialog determines the execution order of the stages. In the example below, the query is transformed as follows:

Field Mapping → Value Mapping

If you want to change the execution order, click on the respective stage, you want to move and click on move left or move right.