Google Mail (GMail) Connector

For a general introduction to our Google Mail Connector, please refer to https://www.rheininsights.com/en/connectors/google-mail.php .

Google Mail Configuration

Our Google Mail Connector is intended to index all documents from an organization. This means that it needs the following

  1. Service Account User

  2. This service account authenticates using a JSON secret and thus the policy disableServiceAccountKeyCreation must be disabled

  3. The service account must use domain-wide delegation

  4. You need to enable Google Drive APIs and Google Admin Directory APIs for this service account

In order to set up the crawl user, please proceed as follows:

Create a new Project

  1. Open https://console.cloud.google.com/cloud-resource-manager

  2. Create a new project, name it such as “Google Mail Connector”

Enable the APIs

  1. Open the project

  2. Open API & Services

  3. Click on Enable APIs und Services

  4. Search for GMail API and click on it

  5. Hit enable API

  6. Go back to Enable APIs und Services

  7. Search for Admin SDK API and click on Admin SDK API

  8. Click on enable

Create a Service Account

  1. Within the project search, open the service accounts dialog in IAM

  2. Click on Create Service Account

  3. Give it a name and click on create and continue

  4. At Grant this service account access to project, click Done

  5. At Grant users access to this service account, click Done

  6. In the next dialog, click on the newly created service account

  7. Click on Keys

  8. Click on Add Key

  9. Click on Create new key

  10. Choose JSON

Enable Domain-Wide Delegation

  1. Copy the service account’s Unique ID from the service account’s detail view

  2. Go to Domain-wide Delegation in the Google Workspace Admin Console

  3. Click on Add New

  4. Enter the unique id of the crawl account

    and the following scopes
    https://www.googleapis.com/auth/admin.directory.group,
    https://www.googleapis.com/auth/admin.directory.user,
    https://www.googleapis.com/auth/admin.directory.group.member.readonly,
    https://www.googleapis.com/auth/gmail.labels,
    https://www.googleapis.com/auth/gmail.readonly

Customer Id

  1. Go to https://admin.google.com > Account > Account Settings > Profile and make a note of your customer Id

Content Source Configuration

The content source configuration of the connector comprises the following mandatory configuration fields.

 

  1. Service name: This identifier is used to tell the Google APIs who is connecting against them. You will find this id in the API metrics in your Google Cloud project.

  2. Service user certificate (as JSON): Here you need to paste the contents of the private key for the crawl user into. You generated this in one of the steps above.

  3. Organization's customer ID: Here you need to add your organization Id into. Above, we described, where you can find this ID.

  4. Admin directory user: Here you have to add a valid e-mail address of an Google Directory admin into. This admin needs to have view permission of users, groups and user-group relationships in your Admin Directory.

  5. Reindex message updates: This flag tells the connector whether label changes or other updates to messages in Google Mail yield a reindexation. This flag is disabled by default.

  6. Excluded files by extension: here you can add a list of file suffixes which will be filtered out while crawling and not being indexed at all.

  7. Excluded users by regular expression: here you can add regexes or individual user Ids to exclude these mailboxes from crawling.

  8. Included users by regular expression: here you can add regexes or individual user Ids to only include these mailboxes in crawling.

  9. Include delegates in ACL: Enable this flag to include delegate users in the ACLs. Each mailbox comes with a Group ACL and delegate users will be part of this group. Otherwise, it will be just the owner.

  10. Maximum content size (MB): This is file size limitation. If attachments exceed this size, they won’t be crawled.

  11. The general settings are described at General Crawl Settings and you can leave these with its default values.

After entering the configuration parameters, click on validate. This validates the content crawl configuration directly against the content source. If there are issues when connecting, the validator will indicate these on the page. Otherwise, you can save the configuration and continue with Content Transformation configuration.

Recommended Crawl Schedules

Google Mail offers a complete change log. So the connector can efficiently detect new, updated and deleted mails and attachments. However, due to the vast amount of changes in an organization, it may vary how quickly the connector is able to get through all changes.

However, we recommend to configure incremental crawls to run every 60 minutes.

Principal scans should run twice per day.

Furthermore, full content scans are normally not needed for Google Mail, only if you change content processing and need to reindex everything. For more information see Crawl Scheduling .