GMC (DocumentDB) Change Data Capture

Overview

Both the Connections and Recommendations features of the Revalidation Application rely on Elasticsearch to drive their list views and allow complex filtering and sorting. The Elasticsearch indices are aggregations of data from TIS, GMC and our own Recommendations service.

In order to keep these Elasticsearch indices up to date, we must capture changes in these data sources and propagate them to the various indices.

This page describes the process of capturing data changes from GMC and Recommendations via the tis-revalidation-recommendation service.

Process

  1. To start capturing data changes from DocumentDB, we needed to enable change stream of the DocumentDB collection. We can did this via a command in Mongo Shell after we logged in to the DB.

    Here for Reval CDC, collections recommendations and doctorsForDB are enabled
    For example

    db.adminCommand({modifyChangeStreams: 1, database: "revalidation", collection: "doctorsForDB", enable: true});



  2. The Eventbridge rules (schedule) are created to trigger the Lambda for polling of changelogs every 2 minutes
    Preprod: rule-scheduling-preprod-revalidation-documentdb-cdc-processor Prod: rule-scheduling-prod-revalidation-documentdb-cdc-processor


  3. The Lambda functions are responsible for the polling the changelogs from DocumentDB and publishing them to the SQS queue
    It checks the resume token (the id of the last changelog) from the DB revalidation.cdcResumeToken collection and starts polling the changelogs after the resume token

    We can see this setting from the Lambda code:

    stream = collection.watch(full_document='updateLookup', resume_after=resumeToken)


    The Lamda will then publish the changelogs to the SQS queues (with names of the queues as follow):
    Preprod queue for doctorsForDB: tis-revalidation-documentdb-cdc-preprod-doctorsfordb Preprod queue for recommendation: tis-revalidation-documentdb-cdc-preprod-recommendation Preprod dead letter queue: tis-revalidation-documentdb-cdc-preprod-dlq Prod queue for doctorsForDB: tis-revalidation-documentdb-cdc-prod-doctorsfordb Prod queue for recommendation: tis-revalidation-documentdb-cdc-prod-recommendation Prod dead letter queue: tis-revalidation-documentdb-cdc-prod-dlq


    The ID of the changelog from the DB revalidation.cdcResumeToken

  4. The messages in queues will be consumed by the Integration service for populating DocumentDB changes into MasterIndex

 

Reference:
Capture changes from Amazon DocumentDB via AWS Lambda and publish them to Amazon MSK
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html