Elastic Search Rebuild Sync Job

Past Issues:

04/07/2022 Post deployment review / release washup

Running the sync job

The reval sync job can be run manually from the API Gateway for prod and preprod at /api/sync. (This may show a timeout error or similar when clicking the “test” button, this is usually fine. You can check the integration logs or use ElasticVue (first step is emptying masterdoctorindex) to confirm that it is working, do not trigger the sync multiple times!

IMPORTANT! THE DISCREPANCIES ALIAS CURRENTLY NEEDS MANUALLY RE-ADDING

This can be done with the request shown below, this should be available to re-run in the Rest “History” section of ElasticVue

 

Services used

  • tis-revalidation-integration (ECS)

  • tis-revalidation-recommendation (ECS)

  • tis-revalidation-connection (ECS)

  • TIS-TCS (ECS)

  • ElasticSearch (Amazon OpenSearch Service)

  • RabbitMQ (Amazon MQ)

  • SQS (SQS)

  • Revalidation DocumentDB

Sync job data flow

  1. When /sync endpoint called in Integration service, the Master index is cleared and rebuilt with alias of current_connections (which is used by connection summary page). (Note: discrepancies alias needs adding manually, as described in red above)

  2. “syncStart“ Message sent to TCS to start extracting data

    Rabbit Queue: exchange: reval.exchange queue: reval.queue.connection.syncstart routingKey: reval.connection.syncstart

    In https://hee-tis.atlassian.net/browse/TIS21-4555 , we decided to filter out doctors with UNKNOWN and null gmc numbers in the SQL in tcs-persistence to send to Reval ES.

  3. TCS send trainee data with latest programme/curriculum information of the trainee one by one via the queue. “syncEnd“ signal will be sent following the last trainee.

    Rabbit Queue: exchange: reval.exchange queue: reval.queue.connection.syncdata routingKey: reval.connection.syncdata

    Integration service process to insert trainee data from TCS to the Master Index

  4. When the “syncEnd” message is received by Integration, “gmcSyncStart” message is sent to Recommendation to get GMC data from DocumentDB.

    Rabbit Queue: exchange: reval.exchange queue: reval.queue.recommendation.syncstart routingKey: reval.recommendation.syncstart

     

  5. Recommendation get trainee data from “DoctorsForDB” and send it over to Integration one by one via the queue. “syncEnd“ signal will be sent following the last trainee.

    Integration service process to insert/update (if exist) trainee data from Recommendation (DoctorsForDb data + the latest recommendation’s gmc outcome) to the Master Index.

  6. When the “syncEnd“ message is received by Integration, it backup the current elasticsearch recommendationindex and reindex from masterdoctorindex to a new recommendationindex, which ensure all updates in masterdoctorindex are synchronised to recommendationindex.
    As for how to trigger the reindexing, please refer to Reindex from masterdoctorindex to recommendation index .

To be continued

Currently, we use aliases current_connections and discrepancies based on masterdoctorindex for connection summary page.

However, this sync process deletes and then rebuilds masterdoctorindex entirely. So when the sync process starts, Connections won’t be available for almost 6 hours. We need to resolve this in the next step.

 

[Old version 1]

[Old version 2]

Sync job RabbitMQ configuration