2021-07-23 TIS-GMC ETL failure
Date | Jul 23, 2021 |
Authors | @Adewale Adekoya @John Simmons (Deactivated) @Jayanta Saha |
Status | Done |
Summary | A SSL/TLS Certificate at the GMC had been incorectly applied to a server blocking secure connections. |
Impact | No Users reported any problems during the incident. |
Non-technical Description
Not able to connect to the GMC servers to get the revalidation data we require
Trigger
GMC Replaced an SSL/TLS certificate on their server last night and it hadn't been applied correctly or hadn't been tested correctly. therefore blocking anybody from making secure connections to the service.
Detection
Slack notification from Ade @ 08:39 on 23/07/2021, pointing out that the GMC-sync-prod job had frailed again
Resolution
Sent an email to GMC stating that we had performed tests from our servers and from personal devices and that the results were the same for all connections.
Sent:
Reply:
Timeline
Jul 23, 2021: 01:05 - An alert in the #monitoring channel of ETL success but in the wrong time slot.
Jul 23, 2021: 08:39 - Ade alerted dev team
Jul 23, 2021: 08:51 - John started investigation
Jul 23, 2021: 10: 53 - Ade emailed GMC about problem
Jul 23, 2021: 11:15 - GMC responded that issue raised with their service desk
Jul 23, 2021: 12:25 - GMC confirmed certificate change was made 22/07/2021 which may be the reason for the issue
Root Cause(s)
A replacement SSL/TLS Certificate had been applied incorrectly to the Server at the GMC where we connect to get the GMC Data into TIS.
Lessons Learned
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213