Date | |
Authors | Joseph (Pepe) Kelly (plus those mentioned on the page) |
Status | Verifying in Stage & Prod?Done |
Summary | Dependency change ?broke the scheduling. The overnight jobs failed to run so some information was several days “stale” in TIS and downstream systems (NDW) |
Impact | For several days, some data reflected the state as of 21st Jan. |
...
Action Items | Owner |
---|---|
| Enhancement of tests ticket is created here |
Look at how we do scheduling across all the TIS stuff, possibly:
| Reuben Roberts |
Review responsibilities around checking jobs/slack, e.g.:
| Marcello Fabbri (Unlicensed) Yafang Deng Reuben Roberts Jayanta Saha |
Has the daily check for “completed” messages stopped running? | This Ansible tool is probably not worth resuscitating, as it was apparently not very polished, and would need tobe extended to cover missed messaging. |
| |
| |
Tidy up definitions for ECS clusters (services with instance count = 0) |
...
Lessons Learned
Group review for RCA and identifying action items from the root causes is very useful.