Retrieval Augmented Generation

Retrieval Augmented Generation stands for the combined use of large language models and (vector) search engines. Retrieval Augmented Generation allows for answering questions based on your corporate or organizational knowledge. Therefore, Retrieval Augmented Generation extends the capabilities of large language models, whose “knowledge” is limited to the corpus the model was trained with, towards your needs and your user’s use cases.

Therefore, the RheinInsights Retrieval Suite offers everything you need to build your retrieval augmented generation use cases and applications. Even more, it offers Secure Retrieval Augmented Generation, as it enable security trimming for all supported search engines. For more information, please refer to Secure Retrieval Augmented Generation .

Retrieval Augmented Generation always comprises the following building blocks:

Indexing

  1. Enterprise search connectors. Each connector connects one content source to the (vector) search engine. Its purpose is to keep the index in synchronization with the content source’s state.
    Furthermore, the connector must make sure that indexed document permissions stay up to date, as well as to transfer user-group relationships to the search engine.

  2. Content Transformation. Prior to indexing the document contents into a vector search engine, the document contents must be transformed into a vector representation. Here, the same embedding algorithm will be used as in Step 2.a below.

Query Processing

  1. An input/output channel for user interaction. This is either a search interface, a chat interface or similar, such as support questionnaires.
    This might be only available for authenticated users.

  2. A query transformation pipeline which might pre-process and transform input and detect intents.

    1. It transforms the input query into a vector representation, using an embedding of a large language model.

    2. For security trimming, also an ACL filter will be constructed.

  3. Search engine. A vector or a keyword search engine will be in use and queries with the result from step 2.

  4. The answers will then be transformed and combined using a large language model. Here, normally the top results will be put into relation with the input query as a prompt for the large language model. It generates an answer based on the inputs.

  5. This answer is presented as a response to the user’s input in 1.

Getting Started

In order to get started for Retrieval Augmented Generation, please do the following.

Please choose one embedding model for your favorite large language model which you would like to use.
Make a note of the dimension of this model. This is needed for the index creation at the search engine.

Please note that Microsoft Search with Graph Connectors comes with natural language understanding itself. Therefore, you cannot index vectors into Microsoft Search with Graph Connectors directly and the following does not fully apply to it.

Connector Configuration

  1. Managing Connectors: Please add the connector-search engine combination as described in Managing Connectors .
    In order to ease your life, please use a Knowledge Search and Retrieval Augmented Generation Connector Template so that the respective content transformation pipeline is generated as a stub for you.

  2. Content source configuration: configure the connector’s source system settings following the specific configuration instructions. For more information, see the detailed configuration steps below Enterprise Search Connectors .

  3. Content transformation configuration: Configure the content processing pipeline.
    In particular configure the stage according to the embedding model, you would like to use. For more information see Content Transformation .

  4. Search engine configuration: configure the search engine according to your needs. This means in particular to create a search index schema which has exactly the vector dimensions, as produced by the embeddings algorithm in Step 3. For more information see Search Engines .

  5. Index the content source’s documents and the user-group relationships, cf. Crawl Scheduling .

Query Configuration

You can use our Template Configuration Wizard to configure query pipelines and search experiences from a template. For more information please have a look at Configuration Template Wizard .

  1. Add the search engine from Step 4 above as a search engine within Search UX > Search Engine (see Search Experiences ).
    In the respective configuration dialogs you can add the index fields which store the vector representations for the document titles and bodies.

  2. Add a new query pipeline Query Pipelines for your Retrieval Augmented Generation.

    1. Within the query pipeline, choose the search engine from Step 1.

    2. Add at least a Vectorizer as Query Transformer (see ). Also here, please make sure that the embedding model is the same as in Step 3 of the Content Transformation. Moreover, the resulting vector dimension must match exactly the search index’s dimensions for the vectorTitle and vectorBody fields.

    3. As a result transformer, add at least a GPT summarizer (see Result Summarizer Transformer ). This will be the final step which generates an answer based on the user’s input and the search results.

Please note that a search interface integration is optional. You can implement REST calls against the query pipeline following the same approach as the search interface. Also you can realize the steps of the query configuration also without the use of our query pipelines.