2021-03-10 NIMDTA reference data not available

Date

Mar 10, 2021

Authors

@Andy Dingley

Status

Done

Summary

https://hee-tis.atlassian.net/browse/TIS21-1314

Impact

  • NIMDTA admins unable to use many aspects of the TIS application

Non-technical Description

There was an issue with the part of the TIS application which provides reference data (e.g. Gender, Title, Site), resulting in many dropdowns within TIS and User Management to be empty. A script that was intended to update some of the PermitToWork reference data failed, this caused that part of the application to fail to start until the data could be corrected.


Trigger

  • Reference service deployed to prod with a failing flyway migration script included.

 


Detection

  • Reported by NIMDTA user on slack

 


Resolution

  • Quick fix: Flag migration as successful and restart services to get application running again.

  • Full fix: Pre-existing values removed from NIMDTA PermitToWork migration script, migration deleted from schema version table to allow it to rerun the fixed migration.


Timeline

  • Mar 10, 2021 - 12:33 - Reference service deployed to NIMDTA production with failing flyway migration.

  • Mar 10, 2021 - 15:06 - Report from NIMDTA user of issues with TIS and User Management.

  • Mar 10, 2021 - 16:15 - Confirmed by POs/BAs and devs notified.

  • Mar 10, 2021 - 16:23 - Failed Flyway migration identified as the cause.

  • Mar 10, 2021 - 16:34 - Reference service brought back up without migrated data and user management service restarted.

  • Mar 10, 2021 - 16:38 - NIMDTA users notified of issue being resolved.

  • Mar 10, 2021 - 17:25 - Fix submitted to correct the migration script.

  • Mar 10, 2021 - 17:32 - Fix deployed to NIMDTA production.


Root Cause(s)

  • Flyway migration failed when trying to insert new PermitToWork values.

  • Some values already existed in the NIMDTA database and the script had no fallback for pre-existing values.

  • Script was copied from what was applied to HEE, but additional values added by the Intrepid to TIS migration (but did not exist in HEE) were not accounted for.


Action Items

Action Items

Owner

Action Items

Owner

Set up monitoring on Reference service

https://hee-tis.atlassian.net/browse/TIS21-1321

Create NIMDTA stage environment

https://hee-tis.atlassian.net/browse/TIS21-1322


Lessons Learned

  • Having monitoring on the reference service would have warned us about the issue ~2.5 hours before users reported it.

  • NIMDTA data does not match HEE data, even in unused reference tables!

  • A stage environment would have detected this issue before we broke the live application.