Job ran in parallel, one on each of the servers.
The ‘locking’ to prevent this only takes account of scheduled runs
Container restarted in the ~10 minute window where there would be a problem(any idea why it restarted? I’m assuming all the jobs were overlapping during that time, but any data changes were overwriting the same records in the same way, only the elasticsearch job creates new records in the elastic search index…?this problem would occur

This was triggered by an OutOfMemoryError:

Code Block

2021-02-04 00:21:19.677  INFO 1 --- [onPool-worker-2] s.j.PersonPlacementEmployingBodyTrustJob : Querying with lastPersonId: [49403] and lastEmployingBodyId: [287]
2021-02-04 00:21:27.760  INFO 1 --- [onPool-worker-2] u.n.t.s.job.TrustAdminSyncJobTemplate    : Time taken to read chunk : [12.95 s]
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof ...
Unable to create /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof: File exists
Terminating due to java.lang.OutOfMemoryError: Java heap space

Trigger

Teams notification
Slack notification
Image RemovedImage Added

Resolution

TIS Team manually re-ran the Person Sync job from the Sync administration panel: https://apps.tis.nhs.uk/sync/

Detection

A user reported in Teams Support Channel. The issue was also raised in the TIS tis-dev-team Slack channel.
The overlapping jobs could be viewed in the server logs

and also in the monitoring-prod Slack channel (started 1:29 AM and 1:33 AM):
Image RemovedImage Added

Timeline

04 Feb 2021 08:21 - Notification on Teams
04 Feb 2021 09:25 - Job run again

...

Versions Compared

Old Version 2

New Version 3

Key

Trigger

Resolution

Detection

Timeline

Page Comparison

Versions Compared

Old Version 2

New Version 3

Key

Trigger

Resolution

Detection

Timeline