2018-01-11 - 19-01-2018 Ongoing GMC issues
Date | 2018-01-11 |
Authors | |
Status | In progress |
Summary | Issues with GMC API causing revalidation users not being able to submit to GMC and daily import data missing. |
Impact | Production systems that interact with GMC services. |
Root Cause
GMC have had significant problems with their API on an ongoing basis and were first notified of the issue once we brought it to their attention.
Aside to this we requested that specific IP's be added to the 'whitelist' they hold on their backend services. This was in an attempt to move away from the proxy but
traffic was not egressing through that IP at the time and through a public address added to the virtual machine. After this was corrected we put it as a non-issue related
to a bigger problem at GMC.
Trigger
The trigger for the issue is unknown as third party API stability is completely out of our control however at first we suspected the issue was ours and made all nessesary changes in
order to best debug the issue correctly
Resolution
Resolution is currenty ongoing but we've got in touch with GMC multiple times regarding issues we've had and have recieved various feedback. We also made sure to fix the issue with ingress and egress via the loadbalancer.
Detection
DevOps requested whitelisting changes which were made. We weren't able to access GMC via the proxy which should have still worked so we tried again after correcting the issue with traffic flow. As we recieved
the same response (Body Code 99: Bad username/password/IP) we got in touch with GMC who said they would look into it. They confirmed that there was indeed an issue and would look into its resolution.
Action Items
Below are suggesed Action Items:
Action Item | Type | Owner | Issue |
---|---|---|---|
Discuss SLA with GMC | mitigate/prevent | ||
Discuss requirement of GMC on our application in production | |||
Mitigate risk of it going down by not indefinatly relying on third party and making your app have single point of failure. | |||
Establish better contact procedure |
Timeline
2018-01-11 - DevOps requested Whitelisting of IP address
2018-01-12 - Users and Staff noticed issue with GMC after whitelist had been implemented and began attempting to fix at the same time as contacting GMC
to see if the whitelist was correctly implemented and did not overwrite the old proxied address.
We also fixed up an issue we saw with the outgoing IPs on the VMs. Typically egress traffic should pass through the Load balancer, like it ingresses. However, the method in which
it had been implemented meant that each VM had its own public ip which it could communicate on. See diagram below:
We still had issues once we hotfixed this issue.
We attempted to check if the issue was related to an outdated cipher which has been an issue we've had before as GMC use old TLS Ciphers so we tried the following which requests a reponse in a specific cipher.
curl --cipher 'TLS_RSA_WITH_3DES_EDE_CBC_SHA' -i -XPOST -H 'Content-Type: text/xml' -H 'SOAPAction: https://webcache.gmc-uk.org/GMCWebServices/GetDoctorsForDB' https://webcache.gmc-uk.org/GMCWebServices/WebService.asmx -d @get_all_drs_live.xml >output.xml
This did not work meaning we were out of debugging options from our side and needed a reponse back from GMC.
We spoke to Max Wellingham, Chris Evans and Imram (no last name given) who said that their engineers were looking into the issue.
2018-01-15 - After the weekend the issues still remained so we further contacted GMC via their helpdesk over the phone and via email, we got this response:
General Medical Council
Supporting Information
e.g. monitoring dashboards
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213