2021-05-07 TIS not accessible due to expired certificates

Date

May 7, 2021

Authors

@Andy Dingley @Liban Hirey (Unlicensed)

Status

Done

Summary

The SSL certificates used by TIS were accidentally overwritten with outdated certificates

Impact

Users unable to access TIS

Non-technical Description

SSL certificates are used by websites to ensure that communication to/from the website is secure, each certificate is valid for a certain length of time.

The certificates used for TIS were due to expire on the 15th April and were replaced on the 6th April.

While making some unrelated changes the outdated certificates were accidentally deployed back to TIS.


Trigger

  • api-gateway playbook ran, removing manually updated certificates


Detection

  • Detected by dev team and then reported by users shortly afterwards




Resolution

  • @Liban Hirey (Unlicensed) managed to recover new certificates from the stage environment and copied them to the prod blue/green servers

  • forgotten branch containing the new certificates was pushed and merged to github

  • Prod tested and found to be working well.


Timeline

  • May 7, 2021: 14:55 BST - Ansible playbook ran to apply changes to api-gateway

  • May 7, 2021: 14:59 BST - Detected by dev team and users

  • May 7, 2021: 15:30 BST - Fix deployed

  • May 10, 2021: Discovered failures in reading files from ESR. There were 3 attempts to trigger processing the file DE_NWN_APC_20210506_00002693.DAT which all failed with an SSL Exception.

Root Cause(s)

  • Ansible playbook replaced latest certificates with outdated ones

  • The updated certificates were added manually, so the playbook didn’t know about them

  • branch containing the new certificates and not been pushed to GitHub


Action Items

Action Items

Owner

 

Action Items

Owner

 

Ensure certificates can be applied automatically by playbook

@John Simmons (Deactivated)

Done

Improve alerting from Lambdas (ESR) https://hee-tis.atlassian.net/browse/TIS21-1564

@Joseph (Pepe) Kelly

 

 

 

 

 

 

 


Lessons Learned

  • Make sure anything applied manually will be handled automatically or do it automatically from the start