Action Items | Owner |
---|
| Jayanta Saha |
Look at how we do scheduling across all the TIS stuff, possibly: Use an external scheduler / verifier (e.g.CloudWatch Events) send “start/failed to start” slack message earlier and “completed/errored” slack message at a later point to pick up more exceptions, with specific codes.
| Reuben Roberts |
Review responsibilities around checking jobs/slack, e.g.: Sharing what people look out for Reminding team of norms / expectations about checking application health How to quickly find what is running where? Have named people per week to check?
| Marcello Fabbri (Unlicensed) Yafang Deng Reuben Roberts Jayanta Saha |
Has the daily check for “completed” messages stopped running? | Reuben Roberts This Ansible tool is probably not worth resuscitating, as it was apparently not very polished, and would need tobe extended to cover missed messaging. Discussions with John Simmons (Deactivated) led to this ticket: https://hee-tis.atlassian.net/browse/TIS21-2621 |
Move all logging / ship all logs to CloudWatch
| |
Have a documented place for where everything runs, e.g. handbook, Infra diagrams?
| |
Tidy up definitions for ECS clusters (services with instance count = 0) | Marcello Fabbri (Unlicensed) |