Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Job ran in parallel, one on each of the servers.

  • The ‘locking’ to prevent this only takes account of scheduled runs

  • Container restarted in the ~10 minute window where there would be a problem(any idea why it restarted? I’m assuming all the jobs were overlapping during that time, but any data changes were overwriting the same records in the same way, only the elasticsearch job creates new records in the elastic search index…?this problem would occur

  • This was triggered by an OutOfMemoryError:

    Code Block
    2021-02-04 00:21:19.677  INFO 1 --- [onPool-worker-2] s.j.PersonPlacementEmployingBodyTrustJob : Querying with lastPersonId: [49403] and lastEmployingBodyId: [287]
    2021-02-04 00:21:27.760  INFO 1 --- [onPool-worker-2] u.n.t.s.job.TrustAdminSyncJobTemplate    : Time taken to read chunk : [12.95 s]
    java.lang.OutOfMemoryError: Java heap space
    Dumping heap to /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof ...
    Unable to create /var/log/apps/hprof/sync-2021-01-19-11:29:40.hprof: File exists
    Terminating due to java.lang.OutOfMemoryError: Java heap space

Trigger

  • Teams notification

  • Slack notification

    Image RemovedImage Added

Resolution

Detection

  • A user reported in Teams Support Channel. The issue was also raised in the TIS tis-dev-team Slack channel.

  • The overlapping jobs could be viewed in the server logs


    and also in the monitoring-prod Slack channel (started 1:29 AM and 1:33 AM):

    Image RemovedImage Added

Timeline

  • 08:21 - Notification on Teams

  • 09:25 - Job run again

...