2018-01-11 - 19-01-2018 Ongoing GMC issues

Date2018-01-11
Authors
StatusIn progress
Summary

Issues with GMC API causing revalidation users not being able to submit to GMC

and daily import data missing.

ImpactProduction systems that interact with GMC services.

Root Cause

GMC have had significant problems with their API on an ongoing basis and were first notified of the issue once we brought it to their attention.

Aside to this we requested that specific IP's be added to the 'whitelist' they hold on their backend services. This was in an attempt to move away from the proxy but

traffic was not egressing through that IP at the time and through a public address added to the virtual machine. After this was corrected we put it as a non-issue related

to a bigger problem at GMC.

Trigger

The trigger for the issue is unknown as third party API stability is completely out of our control however at first we suspected the issue was ours and made all nessesary changes in

order to best debug the issue correctly

Resolution

Resolution is currenty ongoing but we've got in touch with GMC multiple times regarding issues we've had and have recieved various feedback. We also made sure to fix the issue with ingress and egress via the loadbalancer.

Detection

DevOps requested whitelisting changes which were made. We weren't able to access GMC via the proxy which should have still worked so we tried again after correcting the issue with traffic flow. As we recieved

the same response (Body Code 99: Bad username/password/IP) we got in touch with GMC who said they would look into it. They confirmed that there was indeed an issue and would look into its resolution.



Action Items

Below are suggesed Action Items:

Action ItemTypeOwnerIssue
Discuss SLA with GMCmitigate/prevent

Discuss requirement of GMC on our application in production


Mitigate risk of it going down by not indefinatly relying on third party

and making your app have single point of failure.




Establish better contact procedure


Timeline

2018-01-11 - DevOps requested Whitelisting of IP address

2018-01-12 - Users and Staff noticed issue with GMC after whitelist had been implemented and began attempting to fix at the same time as contacting GMC

to see if the whitelist was correctly implemented and did not overwrite the old proxied address.

We also fixed up an issue we saw with the outgoing IPs on the VMs. Typically egress traffic should pass through the Load balancer, like it ingresses. However, the method in which

it had been implemented meant that each VM had its own public ip which it could communicate on. See diagram below:

We still had issues once we hotfixed this issue.

We attempted to check if the issue was related to an outdated cipher which has been an issue we've had before as GMC use old TLS Ciphers so we tried the following which requests a reponse in a specific cipher.

curl --cipher 'TLS_RSA_WITH_3DES_EDE_CBC_SHA' -i -XPOST -H 'Content-Type: text/xml' -H 'SOAPAction: https://webcache.gmc-uk.org/GMCWebServices/GetDoctorsForDB' https://webcache.gmc-uk.org/GMCWebServices/WebService.asmx -d @get_all_drs_live.xml >output.xml


This did not work meaning we were out of debugging options from our side and needed a reponse back from GMC.

We spoke to Max Wellingham, Chris Evans and Imram (no last name given) who said that their engineers were looking into the issue.

2018-01-15 - After the weekend the issues still remained so we further contacted GMC via their helpdesk over the phone and via email, we got this response:


Hi Chris
 
I will ask Chris Evans, one of our Helpdesk Team Leaders, to look into this and get back to you to try and get to the bottom of this for you.
 
I should also make you aware that we have been experiencing some technical difficulties over the weekend, which have meant our API services are unavailable. Our technical team are working to resolve these as their top priority.
 
Thank you for your patience .
 
Max
 
 
Max Wellingham
IS Project and Portfolio Coordinator
General Medical Council
After speaking to Chris Evans who is the head of the help desk we found out that other trusts were having the same response issues and had reported it later than we did. He then said that it would be corrected by the end of the day and rang me later to confirm that we should test it.
2018-01-18 - We contacted GMC again regarding failed calls
Hey Chris,
Apologies for contacting you directly.
I wondered if you were still seeing any issues with the GMC API? We're getting a number of failed submissions today but nothing has changed within our application.
We're still using the same endpoint and IP addresses and it was working after we last spoke.
Cheers
And recieved this in response at the end of the day 
Hi Chris,  we found an issue on one of our servers.  Can you try again and let me know how it looks please?
 
Kind regards
Chris

Supporting Information


e.g. monitoring dashboards