Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Joseph (Pepe) Kelly , Ashley Ransoo, Andy Dingley

Status

In Progress

Resolved

Summary

ESR sent loads of files and it looks like we haven’t captured everything from. In fact we had but a discrepancy between ESR’s specification and their implementation meant that they rejected the notifications we sent them.

Jira Legacy
serverSystem JIRA
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTIS21-1265

Impact

St. Helen’s & Knowsley are unable to make the correct changes to their pay systems.

Table of Contents

Non-technical Description

ESR sent through a number of FULL FILES on 1st of March that did not load or fully load/reconcile . There was a false assumption that they didn’t fully load or “reconcile” to then send applicants against subsequently for St Helens & Knowsley Trust (StH&K). The data had all been processed but the VPD (Virtual Private Directory) identifier for StH&K was sent as 96 rather than the expected “096”. This meant a delay in sending across Applicant Files to ESRthat the Medical Rotation (MEDROT) spreadsheet was not correct and there is the significant risk that trainee pay has not been updated to reflect those starting/finishing placements.

This Live defect was originally raised on 1st March as a response to the overwhelming number of files ESR sent to us. ESR told us previously that the maximum number of full files they could send us was 3 per week. On w/c 1st of March, they sent through 9 files for 8 distinct “Deaneries”.

...

Trigger

Unexpected number of FULL FILES (RMF) causing an overload on the TIS-ESR interface.

Code Block
DE_EMD_RMF_20210301_00002598.DAT
DE_KSS_RMF_20210301_00002788.DAT
DE_EOE_RMF_20210301_00002798.DAT
DE_MER_RMF_20210301_00002766.DAT
DE_OXF_RMF_20210301_00002759.DAT
DE_NWN_RMF_20210301_00002766.DAT
DE_LDN_RMF_20210301_00003186.DAT
DE_WMD_RMF_20210301_00002906.DAT
DE_WMD_RMF_20210302_00002907.DAT

...

Detection

  • Alerting in our monitoring channel

  • An issue raised on Teams regarding missing applicants from Liam Lofthouse (NWM Data Lead)

  • Image Removed

...

  • Image Added

...

Resolution

  • Filter the notification files for those that St. Helens needed (VPD: 96) and modify the VPD to be 3 digits

  • Modified the part of the system that writes to CSV to ensure the VPD is at least 3 digits.

...

Timeline

  • - 17:28-17:38 - 7 RMF files received (EMD, KSS, EOE, MER, OXF, NWN, LDN)

  • - 17:45 - MongoDB down

  • - 18:09 - MongoDB manually restarted

  • - 18:10 - MongoDB up

  • - 19:10 - MongoDB down

  • - 19:20 - MongoDB up

  • - 21:36 - MongoDB down

  • - 22:31 - MongoDB up

  • - 15:19 - Live defect https://hee-tis.atlassian.net/browse/

    Jira Legacy
    serverSystem JIRA
    serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
    keyTIS21-1265
    created

  • - 16:57 - WMD RMF received

  • - 18:30 - WMD RMF received

  • - 15:21 - Query raised on teams about unreceived data

  • - 13:28-16:52 - One-off processes to generate and send files with alterred “VPD”.

  • - 11:00 - Daily call identifying that the MEDROT has not been updated.

  • - 11:00 - Daily call identifying that the MEDROT has now been updated.

  • - 11:00 - Daily call identifying that West Midlands (WMD) positions are not showing.

  • - 11:00 - Daily call where we raised that (WMD) positions hadn’t been received by TIS.

...

Root Cause(s)

  • We sized Mongo on the basis we wouldn’t get many full files.

  • There isn’t anything to signal that Mongo is struggling before a fatal VM failure.

  • The VM doesn’t stop and start in a reasonable time.

  • The replica set isn’t as ‘highly available’ as it should be (not on separate VMs)

  • For the MEDROT we were sending values for their “Virtual Private Directory” “96” rather than the “096” they wanted

  • ESR treats the number 96 as being different to 096

  • The TIS integration followed ESRs specification that this was a “variable length number”, up to 3 digits long.

...

Action Items

Action Items

Owner

n/a

...

Test loading full files with a larger instance

Joseph (Pepe) Kelly … good evidence that the larger instance is processing 6 files, steadily and at a reasonable rate

[not critical] Clean up Applicant records (for past placements) marked as TO_EXPORT, e.g. ESRExporter - ProdGeneratedapprecord•60117906459cf5418067a1e8

Jira Legacy
serverSystem JIRA
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTIS21-1324

Calendar of releases/milestones would be useful.

Andy Nash (Unlicensed) has started this in the team calendar

...

Lessons Learned

  • Time to crack on with Tech Improvement work (Switch to a DBaaS)