Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Date

Authors

Joseph (Pepe) Kelly

Status

In Progress

Summary

Dependency change?. The overnight jobs failed to run so some information was 1 day “stale” in TIS and downstream systems (NDW)

Impact

For a couple of hours on the morning of 26th Jan, some data reflected the position as of 25th Jan.

Non-technical Description

  • An automatically generated Pull Request was approved to update a component used in the reference service.


Trigger

Detection

  • Routine check at 07:40 on


Resolution

  • Re-ran sync jobs and then NDW


Timeline

  • ?

  • 07:40 - Unusual number of notifications in slack monitoring channel

  • 08:00 - Investigation revealed that the triggering mechanism within a synchronization service failed.

  • 08:50 - Users informed that jobs had completed and TIS operating as normal

  • 09:17-09:34 - Breaking change reverted

  • 09:30 - NDW ETL finishes. NDW team informed.


Root Cause(s)

  • No messages were received in Slack

  • Major version upgrade of a dependency passed CI tests


Action Items

Action Items

Owner

Introduce testing of the scheduled components / Tests that verify the job runs

? Use an external scheduler / verifier

 

 

 

 


Lessons Learned

  •  

  • No labels