2023-11-20 Form R NDW load failure

Date

Nov 21, 2023

Authors

@Andy Dingley@Nazia AKHTAR

Status

Done

Summary

A data issue caused the NDW’s downstream Form R load operation to fail

Impact

Tableau’s Form R data was stale for a total of 21 days

Non-technical Description

A Form R Part A submission was submitted with no WTE value, this is usually picked up by validation but in this case that was somehow bypassed.

As the WTE is a mandatory field, the downstream NDW load process expects there to be a value. As there as no value given, the data transformation failed and the overnight sync job was disabled.

The job could not be re-enabled until the data issue was resolved, as each attempt would have simply failed at the time point otherwise.

It took some time to liaise with the LO to contact the PGDiT requesting them to re-submit their form. Due to lack of response the decision was made to delete the form to unblock the NDW job. The PGDiT will need to resubmit their form from scratch.


Trigger

  • A PGDiT submitted a Form R Part A with no WTE value.


Detection

  • The overnight failure was flagged by the data team’s monitoring, they then emailed us the following morning to report the failure.


Resolution

  • The problematic form was deleted and the NDW job re-enabled.


Timeline

All times in BST unless indicated

  • Nov 20, 2023: 09:02 - PGDiT submits Form R Part A with no WTE value.

  • Nov 20, 2023: ??:?? - Overnight NDW job fails.

  • Nov 21, 2023: 11:23 - Email received informing Trainee Team (Lead Dev + Data Analyst) of the failure.

  • Nov 22, 2023: 10:51 - Having identified the cause, PM takes the action to liaise with LO.

  • Nov 28, 2023: 16:23 - Stale report data first reported by Admins.

  • Nov 30, 2023: 10:34 - Form unsubmitted by LO and PGDiT contacted

  • Dec 5, 2023: 16:41 - PGDiT resubmits the form with a valid WTE value. LO either not aware or did not pass info on to TSS team.

  • Dec 5, 2023: 016.41 - Decision taken to delete the form, as PGDiT has not been responsive.

  • Dec 11, 2023: ??:?? - Confirmation that the form has been deleted, however we now know that not to be the case. It was re-submitted.

  • Dec 11, 2023: 09:47 - NDW team asked to re-enable the job.

  • Dec 11, 2023: 11:51 - NDW team confirmed the job ran successfully.

Root Cause(s)

  • A PGDiT submitted a form with a mandatory field missing.

  • The validation was somehow skipped.

  • The validation is front end only, so client side weirdness can cause data quality issues.

  • Backend validation was skipped as draft forms can be essentially blank. However, it is needed upon submission but is not in place.


Action Items

Action Items

Owner

 

Action Items

Owner

 

Backend validation

https://hee-tis.atlassian.net/browse/TIS21-5507

Done

 

 

Done


Lessons Learned

  • We need better communications between the TSS team and LO Admins, in both directions. LOs were not informed about the stale data issue until after they reported the problem themselves (over a week after the initial event).

  • Front end validation is useless on its own.