Date | |
Authors | |
Status | In Progress |
Summary | Dependency change?. The overnight jobs failed to run so some information was 1 day “stale” in TIS and downstream systems (NDW) |
Impact | For a couple of hours on the morning of 26th Jan, some data reflected the position as of 25th Jan. |
Non-technical Description
An automatically generated Pull Request was approved to update a component used in the reference service.
Trigger
Detection
Routine check at 07:40 on
Resolution
Re-ran sync jobs and then NDW
Timeline
?
07:40 - Unusual number of notifications in slack monitoring channel
08:00 - Investigation revealed that the triggering mechanism within a synchronization service failed.
08:50 - Users informed that jobs had completed and TIS operating as normal
09:17-09:34 - Breaking change reverted
09:30 - NDW ETL finishes. NDW team informed.
Root Cause(s)
No messages were received in Slack
Major version upgrade of a dependency passed CI tests
Action Items
Action Items | Owner |
---|---|
Introduce testing of the scheduled components / Tests that verify the job runs |
|
? Use an external scheduler / verifier | |
|
|
|
|
Add Comment