...
An small number of doctors*** were connected outside of TIS Revalidation but did not appear so until Friday 3rd November. //TODO Check log, the first sync after releasing a fix. While investigating the problem on 2nd November, we came across an intermittent issue loading pages. This was resolved in part initially and fully within 3 hours of being noticed by the team and 2 hours of being apparent to users.
...
The connections service sometimes didn’t respond in the allowable time, possibly because ??? it couldn’t be reached ??? it broke ??? it ran out of resources (connections, CPU/mem, thread). There were no indications of a HTTP 504 in our services? We note that the correlation with “Unhealthy Routing Flow Count” for 2 of our 3 availability zones. This is why some actions were successful.
...
Action Items
Action Items | Owner | |
---|---|---|
Alert on “Unhealthy Routing Flow Count” | Story | |
Could the error be more friendly… e.g. Timeouts “retry & contact if it keeps happening” | Conversation facilitated bycatherine.odukale (Unlicensed) | Refine/ Possible Story |
Extend/Improve reach of X-Ray service to better detect the location of failures | Story | |
Review Sentry and mark issues appropriately so we are alerted | Now… | |
Use Mapstruct through Reval services | Story |
...