Date | ||
Authors | ||
Status | In progress | |
Summary | ||
Impact |
Non-technical Description
Trigger
Exceptions reported via Slack
Detection
Sentry alerting
Resolution
Timeline
BST unless otherwise stated
2022-08-01 16:11 ESR processing failed messages start appearing on Slack #monitoring-esr channel
2022-08-01 16:30ish ESR processes on Prod blue and green stopped
2022-08-01 16:32ish Prod MongoDB server stopped
Root Cause(s)
Action Items
Action Items | Owner |
---|---|
Look at list of CSV files that were received (as per esr.queue.csv.invalid and any others subsequent to stopping the ESR services that would have not been processed at all because the services were stopped) | |
Review RabbitMQ esr.dlq.all messages [to identify any issues (such as?)] | |
Generate | |
Once turned-on MongoDB - check which files exported to ESR today, that they were processed before 16:11 failure | |
Manually resend rabbit messages for reprocessing (can also search neo4j for message IDs to check if they were processed) | |
Restart ESR services (instructions for sequencing for this should be in the dev-ops repo) | |
Consider: for the position, placement and post queues, it is possible that create+delete messages will be processed in the incorrect order. Is there a way to check this? | |
Work out how to retrigger file processing (as per list of CSV files to be reprocessed that will be found in S3 bucket esr-sftp-prod, and others that were not processed at all and were received after 16:11) DE_SEV_RMC_20220730_00001157.DAT DE_SEV_RMC_20220731_00001158.DAT DE_WES_RMC_20220729_00003589.DAT DE_WES_RMC_20220730_00003590.DAT DE_WES_RMC_20220731_00003591.DAT DE_WMD_RMC_20220729_00003421.DAT DE_WMD_RMC_20220730_00003422.DAT DE_WMD_RMC_20220731_00003423.DAT DE_YHD_RMC_20220729_00003512.DAT DE_YHD_RMC_20220730_00003513.DAT DE_YHD_RMC_20220731_00003514.DAT DE_SEV_RMC_20220729_00001156.DAT DE_EMD_RMC_20220801_00003116.DAT | |
0 Comments