Date | ||
Authors | ||
Status | In progressDocumenting | |
Summary | Database used fro exchanging information with ESR failed | |
Impact | Files with information from ESR weren’t processed for several hours |
Non-technical Description
ESR had another period of failing to send files on the day they were generated, this meant a greater number of files, generated between Friday 29th July and Monday 1st August were all sent in a short space of time.
This is usually handled by application but this time, the database stopped responding. The services that store information failed and a number of files were not processed. The built in alerting notified the team and after verifying the status of a number of failed individual transactions, we resolved the immediate problem and resent the instructions to process the files listed below.
...
Trigger
Exceptions reported via Slack
...
Sentry alerting
...
Resolution
Force stopped the database server and restarted it, then requested processing of a number of files
...
Timeline
BST unless otherwise stated
2022-08-01 16:11 ESR processing failed messages start appearing on Slack #monitoring-esr channel
2022-08-01 16:30ish ESR processes on Prod blue and green stopped
2022-08-01 16:32ish Prod MongoDB server stopped
2022-08-01 18:24 Prod MongoDB server started
2022-08-01 20:43 All ESR processes restarted in defined order
2022-08-01 20:36-21:21 Failed and missed RMC files processed in order defined below
...
Root Cause(s)
...
Action Items
...