2020-06-22 ESR ETL failed on applicant-export

Date
 
AuthorsJoseph (Pepe) Kelly
StatusMissing applicant file uploaded to ESR on 2020-06-23
Summary

ESR applicant-export failed.  The ETL received a connection exception.

ImpactNew Applicants for Yorkshire & Humberside were not received by ESR until  


Jira Live Defect

TISNEW-4856 - Getting issue details... STATUS

Root Causes

  • The ETL didn't re-attempt to make the call.
  • TCS was unavailable.
  • TCS was redeployed by a jenkins job.

Trigger

  • TCS was redeployed mid-job.

Resolution

  • Re-ran the jenkins job after 8pm (last scheduled job)
  • Moved missing file to the 'outbound' folder in Azure for today (23rd) ready to be processed at 18:00.

Detection / Timeline

  • 2020-06-22 1631: Job failed.
  • 2020-06-22 up to 2000: Further attempts to re-run the job failed because of later steps running.
  • 2020-06-22 1800: FTP Sync runs and picks up all but 1 file.
  • 2020-06-22 2001: Files produced again.
  • 2020-06-23 up to 1315: Found that there was no file for "Yorkshire & Humber Deanery" (YHD) created in Azure before the FTP sync ran. Copied the file to the next day's outbound folder.

Action Items

  • Review ticket to include retries (as one type of service resilience) for connection issues.

Lessons Learned

  • 401 (Unauthorized) HTTP Responses are not always due to the service or profile being down/restarting.
  • Slack notifications can be configured for keywords

What went well

  • It was simple to rectify the issue with this particular instance.

What went wrong

  • A transient connection problem caused the job to fail.
  • The jobs are very close to each other. The restarted job didn't complete before the next in the chain ran.