2023-11-20 Form R NDW load failure
Date | Nov 21, 2023 |
Authors | @Andy Dingley@Nazia AKHTAR |
Status | Done |
Summary | A data issue caused the NDW’s downstream Form R load operation to fail |
Impact | Tableau’s Form R data was stale for a total of 21 days |
Non-technical Description
A Form R Part A submission was submitted with no WTE value, this is usually picked up by validation but in this case that was somehow bypassed.
As the WTE is a mandatory field, the downstream NDW load process expects there to be a value. As there as no value given, the data transformation failed and the overnight sync job was disabled.
The job could not be re-enabled until the data issue was resolved, as each attempt would have simply failed at the time point otherwise.
It took some time to liaise with the LO to contact the PGDiT requesting them to re-submit their form. Due to lack of response the decision was made to delete the form to unblock the NDW job. The PGDiT will need to resubmit their form from scratch.
Trigger
A PGDiT submitted a Form R Part A with no WTE value.
Detection
The overnight failure was flagged by the data team’s monitoring, they then emailed us the following morning to report the failure.
Resolution
The problematic form was deleted and the NDW job re-enabled.
Timeline
All times in BST unless indicated
Nov 20, 2023: 09:02 - PGDiT submits Form R Part A with no WTE value.
Nov 20, 2023: ??:?? - Overnight NDW job fails.
Nov 21, 2023: 11:23 - Email received informing Trainee Team (Lead Dev + Data Analyst) of the failure.
Nov 22, 2023: 10:51 - Having identified the cause, PM takes the action to liaise with LO.
Nov 28, 2023: 16:23 - Stale report data first reported by Admins.
Nov 30, 2023: 10:34 - Form unsubmitted by LO and PGDiT contacted
Dec 5, 2023: 16:41 - PGDiT resubmits the form with a valid WTE value. LO either not aware or did not pass info on to TSS team.
Dec 5, 2023: 016.41 - Decision taken to delete the form, as PGDiT has not been responsive.
Dec 11, 2023: ??:?? - Confirmation that the form has been deleted, however we now know that not to be the case. It was re-submitted.
Dec 11, 2023: 09:47 - NDW team asked to re-enable the job.
Dec 11, 2023: 11:51 - NDW team confirmed the job ran successfully.
Root Cause(s)
A PGDiT submitted a form with a mandatory field missing.
The validation was somehow skipped.
The validation is front end only, so client side weirdness can cause data quality issues.
Backend validation was skipped as draft forms can be essentially blank. However, it is needed upon submission but is not in place.
Action Items
Action Items | Owner |
|
---|---|---|
Backend validation | Done | |
|
| Done |
Lessons Learned
We need better communications between the TSS team and LO Admins, in both directions. LOs were not informed about the stale data issue until after they reported the problem themselves (over a week after the initial event).
Front end validation is useless on its own.
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213