Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
 

Date

Authors

Reuben Roberts

Status

In Progress

Summary

Duplicate Trainees

Impact

Resolved

Summary

PersonElasticSearchSyncJob ran simultaneously on both blue and green production servers (HEE-TIS-VM-PROD-APPS-BLUE and HEE-TIS-VM-PROD-APPS-GREEN). This caused duplicate Person entries in Elasticsearch.

Manually rerunning the PersonElasticSearchSyncJob resolved the issue.

Impact

 Users observed duplicate entries in the list of people.

  • Root Cause(s)

  • Trigger

  • Resolution

  • Detection

  • Action Items

  • Timeline

Root Cause(s)

...

  • Job ran in parallel, one on each of the servers.

  • The ‘locking’ to prevent this only takes account of scheduled runs

  • Container restarted in the ~10 minute window where there would be a problem

  • (any idea why it restarted? I’m assuming all the jobs were overlapping during that time, but any data changes were overwriting the same records in the same way, only the elasticsearch job creates new records in the elastic search index…?

Trigger

  • Teams Notifcationnotification

Resolution

  • T

Detection

  • DeSlack notification

    Image Added

Resolution

Detection

  • A user reported in Teams Support Channel. The issue was also raised in the TIS tis-dev-team Slack channel.

  • The overlapping jobs could be viewed in the server logs

    Image Added


    and also in the monitoring-prod Slack channel (started 1:29 AM and 1:33 AM):

    Image Added

Timeline

  • 8 08:21 - Notifcation Notification on Teams

  • 09:25 - Job run again

Action Items

Action Items

Owner

Lessons Learned

  • The ‘locking’ to prevent the job running in parallel only takes account of scheduled runs. Any container restarts or manually running the job can cause duplication if it overlaps with the job running on the other server instance.