Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Joseph (Pepe) Kelly Cai Willis

Status

Resolved

Summary

Unable to see connected doctors on the “current connections” page

Impact

  1. A small number of doctors were not being updated from the GMC when they were connected outside of TIS Revalidation between 6th Oct 2nd November

  2. Errors were sometimes being returned for search lists and the details page for a doctor for up to 3 hours of Thurs 2nd Nov

Table of Contents

Non-technical Description

An small number of doctors*** were connected outside of TIS Revalidation but did not appear so until Friday 3rd November. //TODO Check log. While investigating the problem on 2nd November, we came across an intermittent issue loading pages. This was resolved in part initially and fully within 3 hours of being noticed by the team and 2 hours of being apparent to users.

e.g. what are we doing to fix it.

...

Trigger

  • Updating under notice

  • Service Degradation?

Detection

  • User query on Teams

Resolution

  • Bugfix released 2nd Nov pm

  • Redeploying ?recommendation? service: tasks that were taking a long time to respond

...

Timeline

All times in BST unless indicated

  • : Change released to correct data and processing for whether doctors are “under notice”

  • ~00:10 : Logs show that there are approx. 2 “Null Pointer Exception”s that mean we don’t 2 doctors as connected.

  • : 18:03 User queries connection status - Trainee is showing connected to us according to the on GMC but isn't in programme?Connect but not TIS Revalidation.

  • : 10:47 First responder requested when they were last connected via. GMC Connect?

  • : 10:57 Reported that trainee are not even in the connections list on TIS but are on GMC connect - 18/10/2023

  • : 11:04 reported that 7562969 was connected yesterday but remains on the discrepancy list? is this a different issue to the one listed above?

  • : 11:40 Responder notified that the last time their records in revalidation were updated were the 18/10 & yesterday(1/11/23)

  • : Unable to manual get doctors for debugging failed: Request was blocked, possibly because of earlier bad requests.

  • : Manual verification of GMC responses showed 1 doctor was not in the list of connected doctors but the other was.

  • : Errors Further debugging identified the cause.

  • : While investing, errors in the Reval app point to another issue: Gateway time outs timeouts some of the time

  • : Investigation

  • 11:35 : replaced recommendation tasks in production

  • 11:53 : replaced connection tasks in production

  • 12:06 : replaced integration & core tasks in production

  • :

Root Cause(s)

Doctors weren’t appearing as Connected because they were marked as existsInGmc=false and no connected Designated Body but were in GMC Connect as connected.

The data remained without a connection because the nightly sync could not update information from the GMC.

Updates from the GMC failed because there was a NullPointerException.

There was a NullPointerException because our service didn’t handle null values in under notice and possibly others.

It was unhandled because we had relied on the fields always being populated. We changed this as the data available when it changes externally can be outdated.

---

We (and then users) were unable load pages consistently because it sometimes came up with errors.

Errors were caused by “timeouts”.

The timeouts were reported by the API gateway because a downstream service did not respond within a maximum allowable time. API Gateway logs showed timeouts between 09:45 and 11:54 (HTTP 504) and latency

The connections service sometimes didn’t respond in the allowable time because ??? it couldn’t be reached ??? it broke ??? it ran out of resources (connections, CPU/mem, thread)

...

Action Items

Action Items

Owner

...