Date | |
Authors | Joseph (Pepe) Kelly |
Status | Missing applicant file uploaded to ESR on 2020-06-23 |
Summary | ESR applicant-export failed. The ETL received a connection exception. |
Impact | New Applicants for Yorkshire & Humberside were not received by ESR until |
Jira reference
- TISNEW-3001Getting issue details... STATUS
Root Causes
- The ETL didn't re-attempt to make the call.
- TCS was unavailable.
- TCS was redeployed by a jenkins job.
Trigger
- TCS was redeployed mid-job.
Resolution
- Re-ran the jenkins job after 8pm (last scheduled job)
Moved missing file to the 'outbound' folder in Azure for today (23rd) ready to be processed at 18:00.
Detection / Timeline
- 2020-06-22 1631: Job failed.
- 2020-06-22 up to 2000: Further attempts to re-run the job failed because of later steps running.
- 2020-06-22 1800: FTP Sync runs and picks up all but 1 file.
- 2020-06-22 2001: Files produced again.
- 2020-06-23 up to 1315: Found that there was no file for "Yorkshire & Humber Deanery" (YHD) created in Azure before the FTP sync ran. Copied the file to the next day's outbound folder.
Action Items
- Review ticket to include retries (as one type of service resilience) for connection issues.
Lessons Learned
- 401 (Unauthorized) HTTP Responses are not always due to the service or profile being down/restarting.
- Slack notifications can be configured for keywords
What went well
- It was simple to rectify the issue with this particular instance.
What went wrong
- A transient connection problem caused the job to fail.
- The jobs are very close to each other. The restarted job didn't complete before the next in the chain ran.
0 Comments