Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Date
 
AuthorsJoseph (Pepe) KellyAshley Ransoo
StatusAwaiting clean file from ESR
Summary

ESR applicant-export load failed on 21st May due to a connection error.  13th July 2019 due to bad data in an RMF file received from ESR. The ETL received a 401 (Unauthorized) exception but we don't yet know why500 error.

ImpactNew Applicants for Yorkshire & Humberside were not received by ESR until  positions expected for WMD Derby Hospitals NHS Foundation Trust were not successfully loaded as a result. James Harris reported that there were 85 new positions expected to be received 

Table of Contents

Jira reference

...

Jira Legacy
serverSystem JIRAcolumnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTISNEW-3152
 

Jira Legacy
serverSystem JIRA
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTISNEW-30013153

Impact

Applicants for West Midlands have not been fully processed by TIS; the PositionReconciliationRecord has not been created/updated for all of the relevant positions.  This has the impact of not having the correct status in TIS, which may in tern cause a partial set of notifications/applicants being sent to ESR.

...

  • Check which files (if any) weren't processed (looking in logs for number of records saved, queries against the ESR database)
  • Requested ESR clean and reproduce the file.

  • ...
  • Validate file contents before processingOnce a clean file is received and TIS is notified of the same, it is expected to be processed along with other files BAU by the ETL.

Detection / Timeline

  • 2019-07-13 1433: Ansible message to #esr_operations channel reporting failure.
  • 2019-07-15 15:22: Flagged for further investigation.
  • 2019-07-16 : Identified the problem was limited to the West Midlands file.  Notified ESR about the problem with data.

Action Items

  • Raise ticket to include retries (as one type of service resilience) for connection issues, e.g. a configurable list of HTTP status codes.

Lessons Learned

  • .validate file contents before processing. (Joseph (Pepe) Kelly - need your help on this
    Jira Legacy
    serverSystem JIRA
    serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
    keyTISNEW-3153
    )
  • RMF (Full) files are known to have bad data from experience. RMF Files are only sent on day 1 of a LO going live or upon request. Although ESR informed and asked for confirmation before sending this File to us, they did not wait for confirmation from our side. Comms need to be managed properly with ESR and making sure we are aware when RMF files are sent to TIS. ESR to ensure their data is validated for what they send in in the interface down to TIS. (cc Nazia AKHTAR)

Lessons Learned

  • More than a reason to re-build a proper ESR interface from scratch with best-in-class interface technology
  • It is intricate and time consuming to work out impacts on data and resolve accordingly

What went well

  • It was simple to rectify the issue with this particular instance .after a lot conversations to work it out

What went wrong

  • The verbosity of the logs make makes using them more difficult.
  • We didn't start further investigation until Monday afternoon as the issue happened on a non-working day and more recent slack messages on #esr_operations had already made it not apparent.

Where we got lucky

  • We did not have to re-run the applicant-load as the remaining RMC files processed successfully other than the WMD RMF file.

Supporting information