Date |
|
Authors | |
Status | In progressDone |
Summary | Some CoJs signed in TIS Self-Service between the evening of 23rd July and the morning of 24th July were not correctly loaded into TIS, and hence not visible to Local Offices. |
Impact | Local Offices saw some trainees as not having signed their CoJs when they had in fact signed them. |
...
Action Items | Owner | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
CoJ audit to identify data discrepancies | DONE:
| |||||||||
Manual patch to restore data integrity | DONE:
| |||||||||
Improve TIS Self-Service messaging code to detect failures | IN PROGRESS:
| |||||||||
Add monitoring for idle queues (where messages are available, but not being consumed or without a listener) | TODO | |||||||||
Add monitoring for Rabbit broker health |
|
...
Lessons Learned
We need to check and handle errors from ‘infrastructure’ more carefully
The lack of alerting on failures and/or automated data consistency checks meant we were not aware of the problem until notified by users, which is poor.