GMC (DocumentDB) Change Data Capture
Overview
Both the Connections and Recommendations features of the Revalidation Application rely on Elasticsearch to drive their list views and allow complex filtering and sorting. The Elasticsearch indices are aggregations of data from TIS, GMC and our own Recommendations service.
In order to keep these Elasticsearch indices up to date, we must capture changes in these data sources and propagate them to the various indices.
This page describes the process of capturing data changes from GMC and Recommendations via the tis-revalidation-recommendation service.
Process
To start capturing data changes from DocumentDB, we needed to enable change stream of the DocumentDB collection. We can did this via a command in Mongo Shell after we logged in to the DB.
Here for Reval CDC, collectionsrecommendations
anddoctorsForDB
are enabled
For exampledb.adminCommand({modifyChangeStreams: 1, database: "revalidation", collection: "doctorsForDB", enable: true});
The Eventbridge rules (schedule) are created to trigger the Lambda for polling of changelogs every 2 minutes
Preprod: rule-scheduling-preprod-revalidation-documentdb-cdc-processor Prod: rule-scheduling-prod-revalidation-documentdb-cdc-processorThe Lambda functions are responsible for the polling the changelogs from DocumentDB and publishing them to the SQS queue
It checks the resume token (the id of the last changelog) from the DBrevalidation.cdcResumeToken
collection and starts polling the changelogs after the resume token
We can see this setting from the Lambda code:stream = collection.watch(full_document='updateLookup', resume_after=resumeToken)
The Lamda will then publish the changelogs to the SQS queues (with names of the queues as follow):
Preprod queue for doctorsForDB: tis-revalidation-documentdb-cdc-preprod-doctorsfordb Preprod queue for recommendation: tis-revalidation-documentdb-cdc-preprod-recommendation Preprod dead letter queue: tis-revalidation-documentdb-cdc-preprod-dlq Prod queue for doctorsForDB: tis-revalidation-documentdb-cdc-prod-doctorsfordb Prod queue for recommendation: tis-revalidation-documentdb-cdc-prod-recommendation Prod dead letter queue: tis-revalidation-documentdb-cdc-prod-dlq
The ID of the changelog from the DBrevalidation.cdcResumeToken
The messages in queues will be consumed by the Integration service for populating DocumentDB changes into MasterIndex
Reference:
Capture changes from Amazon DocumentDB via AWS Lambda and publish them to Amazon MSK
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
Related pages
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213