ESR applicant-export load failed on 21st May due to a connection error. 13th July 2019 due to bad data in an RMF file received from ESR. The ETL received a 401 (Unauthorized) exception but we don't yet know why500 error.
Impact
New Applicants for Yorkshire & Humberside were not received by ESR until positions expected for WMD Derby Hospitals NHS Foundation Trust were not successfully loaded as a result. James Harris reported that there were 85 new positions expected to be received
Applicants for West Midlands have not been fully processed by TIS; the PositionReconciliationRecord has not been created/updated for all of the relevant positions. This has the impact of not having the correct status in TIS, which may in tern cause a partial set of notifications/applicants being sent to ESR.
...
Check which files (if any) weren't processed (looking in logs for number of records saved, queries against the ESR database)
Requested ESR clean and reproduce the file.
...
Validate file contents before processingOnce a clean file is received and TIS is notified of the same, it is expected to be processed along with other files BAU by the ETL.
Detection / Timeline
2019-07-13 1433: Ansible message to #esr_operations channel reporting failure.
2019-07-15 15:22: Flagged for further investigation.
2019-07-16 : Identified the problem was limited to the West Midlands file. Notified ESR about the problem with data.
Action Items
Raise ticket to include retries (as one type of service resilience) for connection issues, e.g. a configurable list of HTTP status codes.
Lessons Learned
.validate file contents before processing. (Joseph (Pepe) Kelly - need your help on this
Jira Legacy
server
System JIRA
serverId
4c843cd5-e5a9-329d-ae88-66091fcfe3c7
key
TISNEW-3153
)
RMF (Full) files are known to have bad data from experience. RMF Files are only sent on day 1 of a LO going live or upon request. Although ESR informed and asked for confirmation before sending this File to us, they did not wait for confirmation from our side. Comms need to be managed properly with ESR and making sure we are aware when RMF files are sent to TIS. ESR to ensure their data is validated for what they send in in the interface down to TIS. (cc Nazia AKHTAR)
Lessons Learned
More than a reason to re-build a proper ESR interface from scratch with best-in-class interface technology
It is intricate and time consuming to work out impacts on data and resolve accordingly
What went well
It was simple to rectify the issue with this particular instance .after a lot conversations to work it out
What went wrong
The verbosity of the logs make makes using them more difficult.
We didn't start further investigation until Monday afternoon as the issue happened on a non-working day and more recent slack messages on #esr_operations had already made it not apparent.
Where we got lucky
We did not have to re-run the applicant-load as the remaining RMC files processed successfully other than the WMD RMF file.