Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Non-technical Description

An edge case sequence of events meant that the synchronisation of data for 'Person' ran on both load-balanced servers in the overnight jobs, creating duplicates. This is meant to be prevented by a locking mechanism to ensure the jobs only run on one of the servers. The team are investigating what contributed to the edge case scenario occurring, in order to mitigate against it reoccurring, and strengthening the logic governing the locking mechanism.

...

Trigger

  • Teams notification

  • Slack notification

    Image RemovedImage Added

...

Detection

  • A user reported in Teams Support Channel. The issue was also raised in the TIS tis-dev-team Slack channel.

  • The overlapping jobs could be viewed in the server logs


    and also in the monitoring-prod Slack channel (started 1:29 AM and 1:33 AM):

...

Action Items

Owner

  • Investigate and resolve Out of Memory errors

Reuben Roberts

After a review of previous incidents?

  • Improve locking mechanism to make it more robust

Reuben Roberts
  • , i.e. locking that includes runs that aren’t part of the @Scheduled configuration

Marcello Fabbri (Unlicensed)

...