Microsoft SharePoint Online Connector

For a general introduction to the connector, please refer to https://www.rheininsights.com/en/connectors/sharepoint-online.php .

Entra Id Configuration

Application Registration

The connector acts as an Entra Id application. This application must be registered as follows

  1. Navigate to https://portal.azure.com

  2. Open Entra Id

  3. Open App registrations

  4. Click on New registration

  5. Give it a name

  6. Click on Register

  7. Click on API permissions

  8. Add a Permission

  9. Click on Microsoft Graph

  10. Choose Application Permissions

  11. Please search for the following permissions and check the respective boxes:

    1. Search for User.Read.All and check the Box

    2. Search for Group.Read.All and check the box

    3. Sites.FullControl.All (needed for accessing the SharePoint Online Site permissions and secure search)

    4. Sites.Read.All

  12. Click on Add permissions

  13. Grant the consent

  14. Got to certificates and secrets

  15. Generate a new Client Secret

  16. Give it a name and an expiration date

  17. Create the secret

  18. Then make a note of the value

  19. Click on Overview and make a note of client Id and tenant Id

User Configuration

Unfortunately, the SharePoint Online does not fully rely on Graph APIs. This means that for the following operations, legacy SharePoint-APIs must be used:

  1. Getting site groups

  2. Getting members of site groups

  3. Getting site roles

  4. Getting site role assignments

Therefore, for implementing secure search / enterprise search, the connector needs to have a crawl user.

This Entra Id user must have permissions to access the respective sites groups and site roles from the sites which should be crawled.

The user must not have two-factor authentication turned on.

Content Source Configuration

The content source configuration of the connector comprises the following mandatory configuration fields.

Within the connector’s configuration please add the following information:

  1. Tenant Id. Is the tenant Id information from Step 20 above.

  2. Client Id. Is the client Id information from Step 20 above.

  3. Client secret is the client secret from Step 19 above.

  4. SharePoint Online user (needed for crawling SharePoint Online contents): this is a user which the connector uses to read site groups and roles, as well as members and role permissions from SharePoint.

  5. SharePoint Online user's password (needed for crawling SharePoint Online contents): This is the according password for this user.

  6. Rate limit: You can furthermore reduce the number of API calls per second.

  7. Index One Drives: If turned on OneDrives are crawled (cf. OneDrive connector)

  8. Index One Drives: If turned on SharePoint Online sites are crawled (cf. SharePoint Online connector)

  9. Index hidden lists: By default, the connector skips hidden SharePoint lists.

  10. Included Sites: here you can add site urls. If given, only these sites will be crawled.
    Then all previously indexed sites which are not included anymore will be deleted from the search index.

  11. Excluded Sites: here you can add site urls. If given, these sites will be not be crawled.
    Then all previously indexed sites which are not included anymore will be deleted from the search index.

  12. Excluded attachments: the file suffixes in this list will be used to determine if certain documents should not be indexed, such as images or executables.

After entering the configuration parameters, click on validate. This validates the content crawl configuration directly against the content source. If there are issues when connecting, the validator will indicate these on the page. Otherwise, you can save the configuration and continue with Content Transformation configuration.

Recommended Crawl Schedules

Content Crawls

The connector supports incremental crawls. These are based on the SharePoint changelog and depending on your tenant’s size, these can run every few hours.

The change log might not be complete and factor in all permission changes. Therefore depending on your requirements, we recommend to run a Full Scan every week.

For more information see Crawl Scheduling .

Principal Crawls

Depending on your requirements, we recommend to run a Full Principal Scan every day or less often.

For more information see Crawl Scheduling .