2021-03-16 ESR APP generation and TIS sync job failures

Date

Mar 16, 2021

Authors

@Andy Dingley

Status

Done

Summary

https://hee-tis.atlassian.net/browse/TIS21-1340

Impact

  • ESR APP generation failures and TIS overnight sync jobs failed

Non-technical Description

New functionality was released to allow Permit to Work to use values from the reference tables, instead of being a fixed set of outdated data. Some of our other applications/services, namely ESR and our overnight sync jobs, could not handle that change and began failing. Those projects were updated to allow them to correctly process the new Permit to Work data and we manually made “no change updates” to TIS records to generate missing applicants.


Trigger

  • TCS deployed with changes to the PermitToWork field, changing it from an enumeration to a string.


Detection

  • ESR failures detected by Sentry and notified on Slack

     

  • TIS-SYNC failures notified on slack during overnight sync jobs

 


Resolution

  • tcs-client and tcs-persistence version updated in ESR and TIS-SYNC projects.


Timeline

  • Mar 16, 2021 - 14:30 - TCS service deployed to production

  • Mar 16, 2021 - 14:42 - Error in ESR APP generation notified in Slack

  • Mar 16, 2021 - 14:44 - Issue picked up by devs

  • Mar 16, 2021 - 16:16 - Fix deployed for EsrAppRecordGeneratorService

  • Mar 16, 2021 - 16:32- Fix deployed for EsrNotificationGeneratorService

  • Mar 16, 2021 - 17:39 - Fix deployed for EsrInboundDataWriterService

  • Mar 16, 2021 - 18:48 - Fix deployed for TIS-EsrReconciliationService

  • Mar 17, 2021 - 00:09 - TIS overnight sync jobs failed

  • Mar 17, 2021 - 02:30 - Issue picked up by devs

  • Mar 17, 2021 - 03:55 - Fix deployed to stage for TIS-SYNC - decision made not to merge to prod at this time due to auto-start logic before 05:00

  • Mar 17, 2021 - 05:20 - TIS-SYNC fix deployed to production environment and sync jobs trigger - NIMDTA jobs didn’t fully trigger due to permissions(?)

  • Mar 17, 2021 - 10:45 - NIMDTA jobs re-ran and completed successfully

  • Mar 19, 2021 - 18:20 - Retriggered applicant generation based on trainees reported in Sentry errors

Root Cause(s)

  • TCS deployed with changes to the PermitToWork field, changing it from an enumeration to a string.

  • The (de)serialization of the RightToWork object in ESR and TIS-SYNC projects began to fail.

  • Outdated tcs-client and tcs-persistence used so those project still tried to treat PermitToWork as an enumeration.

  • Updating dependencies in those projects was missed as a search for usages of PermitToWorkType did not find them.

  • Those projects do not directly use PermitToWork, but do (de)serialize the Person object which has it nested.


Action Items

Action Items

Owner

Action Items

Owner

Ensure all devs have adequate permissions to run sync jobs

https://hee-tis.atlassian.net/browse/TIS21-1343

 

 


Lessons Learned

  • Need to be more wary of breaking changes and the effect on services calling the affected API.

  • Devs not having the correct permissions may have slowed down the full resolution.