File Shares

For a general introduction to the connector, please refer to https://www.rheininsights.com/en/connectors/file-share.php .

File Share Configuration

Our connector can crawl all mounted file shares.

The connector can crawl all local and mounted file shares, such as Windows file shares, SMB-shares, Samba NetApp, Azure File Shares and more. Due to the nature of such mounts, an authentication of the connector is not needed but it needs to be accessible to the machine(s) the RheinInsights Retrieval Suite is deployed to (cf. Deployment and Base Configuration ).

Content Source Configuration

The content source configuration of the connector comprises the following configuration fields.

  1. Excluded files from crawling: here you can add file extensions to filter attachments which should not be sent to the search engine.

  2. Excluded files by regular expression: here you can add directory names or file names in a list. The format can be exact file names or Java regular expressions (cf. Pattern (Java Platform SE 8 ) (oracle.com))
    Please note that a change to this list will yield an incremental crawl to remove all filtered files and folders from the search index.

  3. Crawl hidden files: if turned on, hidden files and folders are filtered from crawling.

  4. File Systems

    1. Here you can add as many start folders for crawling as you like. Each folder is crawled in its whole depth. However, symbolic links will not be followed.

    2. If you want to remove a start folder from the list, click on remove

  5. The general settings are described at General Crawl Settings and you can leave these with its default values.

After entering the configuration parameters, click on validate. This validates the content crawl configuration directly against the content source. If there are issues when connecting, the validator will indicate these on the page. Otherwise, you can save the configuration and continue with Content Transformation configuration.

Recommended Crawl Schedules

File shares do not offer a change log, though they are normally relatively fast in offering document metadata. Thus, our File Share Connector supports Full Scans and Recrawls as crawl modes. In normal operations, Full Scans will be used to keep the search index up to date.

For secure search, it is also needed to configure a separate Active Directory Connector (cf. Active Directory and LDAP ). This connector will synchronize the Active Directory’s security principals into the security store. In combination, you can implement secure search.

Therefore, we recommend to configure Full Scans to run every day. For more information see Crawl Scheduling .