Elastic Search Rebuild Sync Job
Past Issues:
04/07/2022 Post deployment review / release washup
Running the sync job
The reval sync job can be run manually from the API Gateway for prod and preprod at /api/sync. (This may show a timeout error or similar when clicking the “test” button, this is usually fine. You can check the integration logs or use ElasticVue (first step is emptying masterdoctorindex) to confirm that it is working, do not trigger the sync multiple times!
IMPORTANT! THE DISCREPANCIES ALIAS CURRENTLY NEEDS MANUALLY RE-ADDING
This can be done with the request shown below, this should be available to re-run in the Rest “History” section of ElasticVue
Services used
tis-revalidation-integration (ECS)
tis-revalidation-recommendation (ECS)
tis-revalidation-connection (ECS)
TIS-TCS (ECS)
ElasticSearch (Amazon OpenSearch Service)
RabbitMQ (Amazon MQ)
SQS (SQS)
Revalidation DocumentDB
Sync job data flow
When
/sync
endpoint called in Integration service, the Master index is cleared and rebuilt with alias of current_connections (which is used by connection summary page). (Note: discrepancies alias needs adding manually, as described in red above)“syncStart“ Message sent to TCS to start extracting data
Rabbit Queue: exchange: reval.exchange queue: reval.queue.connection.syncstart routingKey: reval.connection.syncstart
In https://hee-tis.atlassian.net/browse/TIS21-4555 , we decided to filter out doctors with UNKNOWN and null gmc numbers in the SQL in tcs-persistence to send to Reval ES.
TCS send trainee data with latest programme/curriculum information of the trainee one by one via the queue. “syncEnd“ signal will be sent following the last trainee.
Rabbit Queue: exchange: reval.exchange queue: reval.queue.connection.syncdata routingKey: reval.connection.syncdata
Integration service process to insert trainee data from TCS to the Master Index
When the “syncEnd” message is received by Integration, “gmcSyncStart” message is sent to Recommendation to get GMC data from DocumentDB.
Rabbit Queue: exchange: reval.exchange queue: reval.queue.recommendation.syncstart routingKey: reval.recommendation.syncstart
Recommendation get trainee data from “DoctorsForDB” and send it over to Integration one by one via the queue. “syncEnd“ signal will be sent following the last trainee.
Integration service process to insert/update (if exist) trainee data from Recommendation (DoctorsForDb data + the latest recommendation’s gmc outcome) to the Master Index.
When the “syncEnd“ message is received by Integration, it backup the current elasticsearch
recommendationindex
and reindex frommasterdoctorindex
to a newrecommendationindex
, which ensure all updates inmasterdoctorindex
are synchronised torecommendationindex
.
As for how to trigger the reindexing, please refer to Reindex from masterdoctorindex to recommendation index .
To be continued
Currently, we use aliases current_connections
and discrepancies
based on masterdoctorindex
for connection summary page.
However, this sync process deletes and then rebuilds masterdoctorindex
entirely. So when the sync process starts, Connections won’t be available for almost 6 hours. We need to resolve this in the next step.
[Old version 1]
[Old version 2]
Sync job RabbitMQ configuration
Related pages
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213