General Crawl Settings

The following settings are available for all content source configurations.

  1. Maximum content size (MB): This setting limits the net document size (before text extraction) to the given value. The default is 100MB.

  2. Number of crawl threads: Determines how many crawl threads run in parallel to detect new, changed and deleted documents. The default are 10 threads.
    Please note the following:

    1. Please make sure that to not overwhelm the content source and be gentle.

    2. Please note that multithreading is part of the crawl strategy and takes place where possible. Normally, a crawl thread takes care of one folder, space, project or a page and its attachments. In some scenarios a multithreading might degenerate to a single threading.

  3. Number of transformation threads: This determines how many threads take care of a content transformation. Each thread takes care of one documentation and can also wait for heavy-lifting operations such as LLM-embeddings or image extraction. So please make sure to have sufficiently many threads to achieve a reasonable crawl performance. The default is 10.

  4. Number of search engine submission threads: This value determines how many threads are used to push changes to the search engine. The default is 40.