Non-technical Description
Notifications are usually sent to ESR every day, describing changes and updates to the “now & next” people in positions.
The service did not generate the files for 11th - 19th Aug
ESR and TIS data would have been exchanged, so this was not impacted.
No warnings received as RabbitMQ not doing
Trigger
Detection
We noticed a lack of “Confirmation” messages from ESR
Resolution
Upsized the resources to enable to exporting jobs to run.
Timeline
Shovel setup but was persistent rather than temporary.
14:10 Last successful notification file generation.
14:00 Repeated failures prevent .
We noticed the delay to notification confirmations beyond ‘normal’ delays
11:30 resized cluster, and it took 12mins
11:50 deleted the shovel which was sending errors to a temporary queue
12:10 and around indication on Metabase that the files for notifications have been created.
15:27 - Received confirmation files
5 Whys (or other analysis of Root Cause)
We didn’t receive DCC conformation files because we didn’t send any files for ESR to confirm receipt of.
The attempts to build notifcation files failed (errors were logged as warnings but not reported via Sentry)
Database transactions timed out.
The database didn’t have the resources to complete the transaction within the time limit.
Action Items
Action Items | Owner | Comments |
---|---|---|
See also:
Lessons Learnt
RabbitMQ awareness.
Add Comment