Date |
|
Authors | Reuben Roberts Joseph (Pepe) Kelly Marcello Fabbri (Unlicensed) |
Status | Resolved |
Summary |
Manually rerunning the |
Impact | Users observed duplicate entries in the list of people. |
...
Non-technical Description
An edge case sequence of events meant that the synchronisation of data for 'Person' ran on both load-balanced servers in the overnight jobs, creating duplicates. This is meant to be prevented by a locking mechanism to ensure the jobs only run on one of the servers. The team are investigating what contributed to the edge case scenario occurring, in order to mitigate against it reoccurring, and strengthening the logic governing the locking mechanism.
...
Trigger
Teams notification
Slack notification
...
Detection
A user reported in Teams Support Channel. The issue was also raised in the TIS
tis-dev-team
Slack channel.The overlapping jobs could be viewed in the server logs
and also in themonitoring-prod
Slack channel (started 1:29 AM and 1:33 AM):
...
Action Items | Owner |
---|---|
| |
After a review of previous incidents?
|
...
The ‘locking’ to prevent the job running in parallel only takes account of scheduled runs. Any container restarts or manually running the job can cause duplication if it overlaps with the job running on the other server instance.
We need a more robust solution for preventing duplication of jobs running.