Documentation

Microsoft SharePoint Online Connector

For a general introduction to the connector, please refer to https://www.rheininsights.com/en/connectors/sharepoint-online.php .

Entra Id Configuration

Application Registration

The connector acts as an Entra Id application. This application must be registered as follows

Navigate to https://portal.azure.com
Open Entra Id
Open App registrations
Click on New registration
Give it a name
Click on Register
Click on API permissions
Add a Permission
Click on Microsoft Graph
Choose Application Permissions
Please search for the following permissions and check the respective boxes:
1. Search for User.Read.All and check the Box
2. Search for Group.Read.All and check the box
3. Sites.FullControl.All (needed for accessing the SharePoint Online Site permissions and secure search)
4. Sites.Read.All
Click on Add permissions
Grant the consent
Got to certificates and secrets
Generate a new Client Secret
Give it a name and an expiration date
Create the secret
Then make a note of the value
Click on Overview and make a note of client Id and tenant Id

User Configuration

Unfortunately, the SharePoint Online does not fully rely on Graph APIs. This means that for the following operations, legacy SharePoint-APIs must be used:

Getting site groups
Getting members of site groups
Getting site roles
Getting site role assignments

Therefore, for implementing secure search / enterprise search, the connector needs to have a crawl user.

This Entra Id user must have permissions to access the respective sites groups and site roles from the sites which should be crawled.

The user must not have two-factor authentication turned on.

Content Source Configuration

The content source configuration of the connector comprises the following mandatory configuration fields.

Within the connector’s configuration please add the following information:

Tenant Id. Is the tenant Id information from Step 20 above.
Client Id. Is the client Id information from Step 20 above.
Client secret is the client secret from Step 19 above.
SharePoint Online user (needed for crawling SharePoint Online contents): this is a user which the connector uses to read site groups and roles, as well as members and role permissions from SharePoint.
SharePoint Online user's password (needed for crawling SharePoint Online contents): This is the according password for this user.
Rate limit: You can furthermore reduce the number of API calls per second.
Index One Drives: If turned on OneDrives are crawled (cf. OneDrive connector)
Index One Drives: If turned on SharePoint Online sites are crawled (cf. SharePoint Online connector)
Index hidden lists: By default, the connector skips hidden SharePoint lists.
Included Sites: here you can add site urls. If given, only these sites will be crawled.
Then all previously indexed sites which are not included anymore will be deleted from the search index.
Excluded Sites: here you can add site urls. If given, these sites will be not be crawled.
Then all previously indexed sites which are not included anymore will be deleted from the search index.
Excluded attachments: the file suffixes in this list will be used to determine if certain documents should not be indexed, such as images or executables.

After entering the configuration parameters, click on validate. This validates the content crawl configuration directly against the content source. If there are issues when connecting, the validator will indicate these on the page. Otherwise, you can save the configuration and continue with Content Transformation configuration.

Recommended Crawl Schedules

Content Crawls

The connector supports incremental crawls. These are based on the SharePoint changelog and depending on your tenant’s size, these can run every few hours.

The change log might not be complete and factor in all permission changes. Therefore depending on your requirements, we recommend to run a Full Scan every week.

For more information see Crawl Scheduling .

Principal Crawls

Depending on your requirements, we recommend to run a Full Principal Scan every day or less often.

For more information see Crawl Scheduling .