2019-07-13 ESR ETL failed on applicant-load
Date | |
Authors | Joseph (Pepe) Kelly, Ashley Ransoo |
Status | Awaiting clean file from ESR |
Summary | ESR applicant-load failed on 13th July 2019 due to bad data in an RMF file received from ESR. The ETL received a 500 error. |
Impact | New positions expected for WMD Derby Hospitals NHS Foundation Trust were not successfully loaded as a result. James Harris reported that there were 85 new positions expected to be received |
Jira reference
- TISNEW-3152Getting issue details... STATUS
- TISNEW-3153Getting issue details... STATUS
Impact
Applicants for West Midlands have not been fully processed by TIS; the PositionReconciliationRecord has not been created/updated for all of the relevant positions. This has the impact of not having the correct status in TIS, which may cause a partial set of notifications/applicants being sent to ESR.
Details of the scheduled jobs are here: ESR Schedules.
Root Causes
- A 500 (Internal Server Error) response was received for a HTTP request from ESR-ETL to ESR.
- A MySQL error
Hibernate: insert into EsrOutboundPositionReconciliationRecord (createdDate, deaneryNumber, deleteChangeIndicator, esrLocation, esrOrganisation, hostLeadEmployerIndicator, lastModifiedDate, managingLocalOffice,
matched, odsEmployerCode, positionId, positionNumber, positionTitle, recordType, tisStatus, vpdCode) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2019-07-13 14:33:07.333 WARN 1 — [ XNIO-2 task-14] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 1406, SQLState: 22001
2019-07-13 14:33:07.334 ERROR 1 — [ XNIO-2 task-14] o.h.engine.jdbc.spi.SqlExceptionHelper : Data truncation: Data too long for column 'deaneryNumber' at row 1 Logging shows numerous whole rows included as a national post number 2019-07-13 14:30:43.540 INFO 1 --- [main] c.t.hee.tis.esr.processor.Processor : 1.1: Fetching Posts by NPN:...
- The applicant file 'DE_WMD_RMF_20190713_00002309.DAT' included specialties with a backslash (\) character.
Trigger
- File from ESR containing escape characters.
Resolution
- Check which files (if any) weren't processed (looking in logs for number of records saved, queries against the ESR database)
Requested ESR clean and reproduce the file.
- Once a clean file is received and TIS is notified of the same, it is expected to be processed along with other files BAU by the ETL.
Detection / Timeline
- 2019-07-13 1433: Ansible message to #esr_operations channel reporting failure.
- 2019-07-15 15:22: Flagged for further investigation.
- 2019-07-16 : Identified the problem was limited to the West Midlands file. Notified ESR about the problem with data.
Action Items
- Raise ticket to validate file contents before processing. (Joseph (Pepe) Kelly - need your help on this - TISNEW-3153Getting issue details... STATUS )
- RMF (Full) files are known to have bad data from experience. RMF Files are only sent on day 1 of a LO going live or upon request. Although ESR informed and asked for confirmation before sending this File to us, they did not wait for confirmation from our side. Comms need to be managed properly with ESR and making sure we are aware when RMF files are sent to TIS. ESR to ensure their data is validated for what they send in in the interface down to TIS. (cc Nazia AKHTAR)
Lessons Learned
- More than a reason to re-build a proper ESR interface from scratch with best-in-class interface technology
- It is intricate and time consuming to work out impacts on data and resolve accordingly
What went well
- It was simple to rectify the issue with this particular instance after a lot conversations to work it out
What went wrong
- The verbosity of the logs makes using them more difficult.
- We didn't start further investigation until Monday afternoon as the issue happened on a non-working day and more recent slack messages on #esr_operations had already made it not apparent.
Where we got lucky
- We did not have to re-run the applicant-load as the remaining RMC files processed successfully other than the WMD RMF file.
Supporting information
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213