Date |
| ||||||||
Authors | |||||||||
Status | Production no longer impactedDone | ||||||||
Summary |
| ||||||||
Impact | The recommendations search page was not being updated for a number of hours through the day |
...
BST unless otherwise stated
02:26 Earliest identifiable point of “something going wrong” - still unknown
02:26 to 08:07 - Queue to recommendation for ‘doctor view’ update built steadily to ~83K
08:21 - First report in user channel
12:07 - Picked up for investigation
12:07 to 14:00ish - Checked database & ElasticSearch index
13:00 - Checked the return list of GMC for north west
14:00ish - Found messages in reval.queue.masterdoctorview.updated.recommendation didn’t get consumed
14:14ish - Having found that there were “recommendation status checks” being processed from overlapping runs, we reduced the frequency of checking for updates on submitted recommendations
16:00ish - Force a new start of recommendation service
16:00ish - Identified that rabbitMq was reporting 0 consumers
17:00ish to 18:00ish - Identified error in queue declaration. Raised, merged and pushed a PR
18:00ish - Noticed that messages are briefly consumed on startup but number of consumers quickly drops to 0
18:40ish - Final redeploy of recommendation service cleared out backlog and appeared to restore consumers stably
18:40ish - Identified that there was still some discrepancies in the data between masterdoctorindex and recommendation index, decided to wait until after overnight doctor sync to do quick reindex
~07:30 - Checked for doctors that were mentioned and found both were appearing
09:08 - Informed users of reindex (brief downtime expected)
11:09 - Reindex complete, service restored
...
Doctors reported as not showing in the search list
ElasticSearch Index for Recommendation Service isn’t being updated
Large backlog of messages stuck on a queue for updating the index
Message Consumers disappeared but after the final
aws ecs update-service --force-new-deployment
dropped to one before going back up to 3This may have been related to load & thread starvation.
...