Date |
|
Authors | |
Status | Resolved |
Summary |
Manually rerunning the |
Impact | Users observed duplicate entries in the list of people. |
...
Root Cause(s)
...
Trigger
...
Resolution
...
Detection
...
Action Items
...
Table of Contents |
---|
Trigger
Teams notification
Slack notification
...
Detection
A user reported in Teams Support Channel. The issue was also raised in the TIS
tis-dev-team
Slack channel.The overlapping jobs could be viewed in the server logs
and also in themonitoring-prod
Slack channel (started 1:29 AM and 1:33 AM):
...
Resolution
TIS Team manually re-ran the Person Sync job from the Sync administration panel: https://apps.tis.nhs.uk/sync/
...
Timeline
00:21 - Out of Memory Error on HEE-TIS-VM-PROD-APPS-BLUE
01:29 - PersonElasticSearchSyncJob : Sync [Person sync job] started on HEE-TIS-VM-PROD-APPS-GREEN
01:33 - PersonElasticSearchSyncJob : Sync [Person sync job] started on HEE-TIS-VM-PROD-APPS-BLUE
08:21 - Notification on Teams
09:25 - Job run again
...
Root Cause(s)
Job ran in parallel, one on each of the servers.
The ‘locking’ to prevent this only takes account of scheduled runs
Container restarted in the ~10 minute window where this problem would occur
This was triggered by an OutOfMemoryError:
Code Block 2021-02-04 00:21:19.677 INFO 1 --- [onPool-worker-2] s.j.PersonPlacementEmployingBodyTrustJob : Querying with lastPersonId: [49403] and lastEmployingBodyId: [287] 2021-02-04 00:21:27.760 INFO 1 --- [onPool-worker-2] u.n.t.s.job.TrustAdminSyncJobTemplate : Time taken to read chunk : [12.95 s] java.lang.OutOfMemoryError: Java heap space Dumping heap to /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof ... Unable to create /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof: File exists Terminating due to java.lang.OutOfMemoryError: Java heap space
Trigger
Teams notification
Slack notification
Resolution
TIS Team manually re-ran the Person Sync job from the Sync administration panel: https://apps.tis.nhs.uk/sync/
Detection
A user reported in Teams Support Channel. The issue was also raised in the TIS
tis-dev-team
Slack channel.The overlapping jobs could be viewed in the server logs
and also in themonitoring-prod
Slack channel (started 1:29 AM and 1:33 AM):
Timeline
08:21 - Notification on Teams
09:25 - Job run again
...
Action Items
Action Items | Owner |
---|---|
|
...
Lessons Learned
The ‘locking’ to prevent the job running in parallel only takes account of scheduled runs. Any container restarts or manually running the job can cause duplication if it overlaps with the job running on the other server instance.