Date	03 Mar 2021
Authors	Joseph (Pepe) Kelly , Ashley Ransoo, Andy Dingley
Status	Resolved
Summary	ESR sent loads of files and it looks like we haven’t captured everything from. In fact we had but a discrepancy between ESR’s specification and their implementation meant that they rejected the notifications we sent them. https://hee-tis.atlassian.net/browse/TIS21-1265
Impact	It would appear that applicants would be missing or not sent to ESR.

Non-technical Description

ESR sent through a number of FULL FILES on 1st of March that did not load or fully load/reconcile to then send applicants against subsequently for St Helens & Knowsley Trust. This meant a delay in sending across Applicant Files to ESR.

ESR told us previously that the maximum number of full files they would send us was 3 per week. On w/c 1st of March, they sent through 9 files, 8 distinct with 1 additional.

Trigger

Unexpected number of FULL FILES (RMF) causing an overload on the TIS-ESR interface.

DE_EMD_RMF_20210301_00002598.DAT
DE_KSS_RMF_20210301_00002788.DAT
DE_EOE_RMF_20210301_00002798.DAT
DE_MER_RMF_20210301_00002766.DAT
DE_OXF_RMF_20210301_00002759.DAT
DE_NWN_RMF_20210301_00002766.DAT
DE_LDN_RMF_20210301_00003186.DAT
DE_WMD_RMF_20210301_00002906.DAT
DE_WMD_RMF_20210302_00002907.DAT

Detection

Alerting in our monitoring channel

An issue raised on Teams regarding missing applicants from Liam Lofthouse (NWM Data Lead)

Resolution

Filter the notification files for those that St. Helens needed (VPD: 96) and modify the VPD to be 3 digits
Modified the part of the system that writes to CSV to ensure the VPD is at least 3 digits.

Timeline

01 Mar 2021 - 17:28-17:38 - 7 RMF files received (EMD, KSS, EOE, MER, OXF, NWN, LDN)
01 Mar 2021 - 17:45 - MongoDB down
01 Mar 2021 - 18:09 - MongoDB manually restarted
01 Mar 2021 - 18:10 - MongoDB up
01 Mar 2021 - 19:10 - MongoDB down
01 Mar 2021 - 19:20 - MongoDB up
01 Mar 2021 - 21:36 - MongoDB down
01 Mar 2021 - 22:31 - MongoDB up
02 Mar 2021 - 15:19 - Live defect https://hee-tis.atlassian.net/browse/TIS21-1265 created
02 Mar 2021 - 16:57 - WMD RMF received
02 Mar 2021 - 18:30 - WMD RMF received
03 Mar 2021 - 15:21 - Query raised on teams about unreceived data

Root Cause(s)

We sized Mongo on the basis we wouldn’t get many full files.
There isn’t anything to signal that Mongo is struggling before a fatal VM failure.
The VM doesn’t stop and start in a reasonable time.
The replica set isn’t as ‘highly available’ as it should be (not on separate VMs)

Action Items

Action Items	Owner
Test loading full files with a larger instance	Joseph (Pepe) Kelly
[not critical] Clean up Applicant records (for past placements) marked as `TO_EXPORT`, e.g. ESRExporter - Prod•Generatedapprecord•60117906459cf5418067a1e8
Calendar of releases/milestones would be useful.	Andy Nash (Unlicensed) has started this in the team calendar

Lessons Learned

Time to crack on with Tech Improvement work

2021-03-03 Missing Applicants and unreconciled positions