2021-03-10 NIMDTA reference data not available
Date | Mar 10, 2021 |
Authors | @Andy Dingley |
Status | Done |
Summary | |
Impact |
|
Non-technical Description
There was an issue with the part of the TIS application which provides reference data (e.g. Gender, Title, Site), resulting in many dropdowns within TIS and User Management to be empty. A script that was intended to update some of the PermitToWork reference data failed, this caused that part of the application to fail to start until the data could be corrected.
Trigger
Reference service deployed to prod with a failing flyway migration script included.
Detection
Reported by NIMDTA user on slack
Resolution
Quick fix: Flag migration as successful and restart services to get application running again.
Full fix: Pre-existing values removed from NIMDTA PermitToWork migration script, migration deleted from schema version table to allow it to rerun the fixed migration.
Timeline
Mar 10, 2021 - 12:33 - Reference service deployed to NIMDTA production with failing flyway migration.
Mar 10, 2021 - 15:06 - Report from NIMDTA user of issues with TIS and User Management.
Mar 10, 2021 - 16:15 - Confirmed by POs/BAs and devs notified.
Mar 10, 2021 - 16:23 - Failed Flyway migration identified as the cause.
Mar 10, 2021 - 16:34 - Reference service brought back up without migrated data and user management service restarted.
Mar 10, 2021 - 16:38 - NIMDTA users notified of issue being resolved.
Mar 10, 2021 - 17:25 - Fix submitted to correct the migration script.
Mar 10, 2021 - 17:32 - Fix deployed to NIMDTA production.
Root Cause(s)
Flyway migration failed when trying to insert new PermitToWork values.
Some values already existed in the NIMDTA database and the script had no fallback for pre-existing values.
Script was copied from what was applied to HEE, but additional values added by the Intrepid to TIS migration (but did not exist in HEE) were not accounted for.
Action Items
Action Items | Owner |
---|---|
Set up monitoring on Reference service | |
Create NIMDTA stage environment |
Lessons Learned
Having monitoring on the reference service would have warned us about the issue ~2.5 hours before users reported it.
NIMDTA data does not match HEE data, even in unused reference tables!
A stage environment would have detected this issue before we broke the live application.
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213