Date |
| ||||||||
Authors | |||||||||
Status | Production no longer impacted | ||||||||
Summary |
| ||||||||
Impact | The recommendations search page was not being updated for a number of hours through the day |
...
BST unless otherwise stated
02:26 Earliest identifiable point of “something going wrong” - still unknown
02:26 to 08:07 - Queue to recommendation for ‘doctor view’ update built steadily to ~83K
08:21 - First report in user channel
12:07 - Picked up for investigation
12:07 to 14:00ish - Checked database & ElasticSearch index
13:00 - Checked the return list of GMC for north west
14:00ish - Found messages in reval.queue.masterdoctorview.updated.recommendation didn’t get consumed
16:00ish - Force a new start of recommendation service
16:00ish - Identified that rabbitMq was reporting 0 consumers
17:00ish to 18:00ish - Identified error in queue declaration. Raised, merged and pushed a PR
18:00ish - Noticed that messages are briefly consumed on startup but number of consumers quickly drops to 0
18:40ish - Final redeploy of recommendation service cleared out backlog and appeared to restore consumers stably
18:40ish - Identified that there was still some discrepancies in the data between masterdoctorindex and recommendation index, decided to wait until after overnight doctor sync to do quick reindex
09:08 - Informed users of reindex (brief downtime expected)
11:09 - Reindex complete, service restored
...
Doctors reported as not showing in the search list
ElasticSearch Index for Recommendation Service isn’t being updated
Large backlog of messages stuck on a queue for updating the index
Message Consumers disappeared but after the final
aws ecs update-service --force-new-deployment
dropped to one before going back up to 3?
...
Action Items
Action Items | Comments | Owner |
---|---|---|
Monitoring for queue depth, consumption or some other combined metric to say whether messages are being processed ‘acceptably’. | ||
...