/
GMC (DocumentDB) Change Data Capture

GMC (DocumentDB) Change Data Capture

Overview

Both the Connections and Recommendations features of the Revalidation Application rely on Elasticsearch to drive their list views and allow complex filtering and sorting. The Elasticsearch indices are aggregations of data from TIS, GMC and our own Recommendations service.

In order to keep these Elasticsearch indices up to date, we must capture changes in these data sources and propagate them to the various indices.

This page describes the process of capturing data changes from GMC and Recommendations via the tis-revalidation-recommendation service.

Process

  1. To start capturing data changes from DocumentDB, we needed to enable change stream of the DocumentDB collection. We can did this via a command in Mongo Shell after we logged in to the DB.

    Here for Reval CDC, collections recommendations and doctorsForDB are enabled
    For example

    db.adminCommand({modifyChangeStreams: 1, database: "revalidation", collection: "doctorsForDB", enable: true});



  2. The Eventbridge rules (schedule) are created to trigger the Lambda for polling of changelogs every 2 minutes
    Preprod: rule-scheduling-preprod-revalidation-documentdb-cdc-processor Prod: rule-scheduling-prod-revalidation-documentdb-cdc-processor


  3. The Lambda functions are responsible for the polling the changelogs from DocumentDB and publishing them to the SQS queue
    It checks the resume token (the id of the last changelog) from the DB revalidation.cdcResumeToken collection and starts polling the changelogs after the resume token

    We can see this setting from the Lambda code:

    stream = collection.watch(full_document='updateLookup', resume_after=resumeToken)


    The Lamda will then publish the changelogs to the SQS queues (with names of the queues as follow):
    Preprod queue for doctorsForDB: tis-revalidation-documentdb-cdc-preprod-doctorsfordb Preprod queue for recommendation: tis-revalidation-documentdb-cdc-preprod-recommendation Preprod dead letter queue: tis-revalidation-documentdb-cdc-preprod-dlq Prod queue for doctorsForDB: tis-revalidation-documentdb-cdc-prod-doctorsfordb Prod queue for recommendation: tis-revalidation-documentdb-cdc-prod-recommendation Prod dead letter queue: tis-revalidation-documentdb-cdc-prod-dlq


    The ID of the changelog from the DB revalidation.cdcResumeToken

  4. The messages in queues will be consumed by the Integration service for populating DocumentDB changes into MasterIndex

 

Reference:
Capture changes from Amazon DocumentDB via AWS Lambda and publish them to Amazon MSK
Using change streams with Amazon DocumentDB - Amazon DocumentDB

 

Related content

Recommendations CDC documentDB to ElasticSearch
Recommendations CDC documentDB to ElasticSearch
More like this
GMC Overnight Sync (GetDoctorsForDB) - Soon to be out of date
GMC Overnight Sync (GetDoctorsForDB) - Soon to be out of date
More like this
TIS -> Reval updates
TIS -> Reval updates
More like this
Elastic Search Rebuild Sync Job
Elastic Search Rebuild Sync Job
Read with this
2022-12-06 Reval recommendation AllDoctors/UnderNotice lists not updated
2022-12-06 Reval recommendation AllDoctors/UnderNotice lists not updated
More like this
Programmes - Programme Membership and Curriculum Membership Bulk Tools
Programmes - Programme Membership and Curriculum Membership Bulk Tools
Read with this