Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Joseph (Pepe) Kelly

Status

Documenting

Summary

ETL failed after modifying the configuration

Impact

Data in NDW was 1 day out of date for a short perion period out-of-hours

Non-technical Description

...

Notifications in the #monitoring-ndw channel.

Notification in the #monitoring-prod channel confirmed the ETL ran at the wrong time and took too long to complete.

Data lead and NDW team member enquiries in shared #tis-ndw-etl channel Slack channel.

...

TIS Data Manager confirmed no downstream processes were affected.

...

Resolution

Re-ran in fully configured part of network.

Long-term:

  1. Altered configuration to run on the same subnet as the database.

  2. Removed unused/partially functional subnet

...

Timeline

: 02:30 - Failure message in the #monitoring-ndw channel

...

Change applied with assumptions about correctness of IaC definitions.

Lessons Learned

  • Just because it looks right, it doesn’t mean it is.

Action Items

Action Items

Owner

Status

Update the ETL timings page. Use wider list of ETLs, i.e. image with swim-lanes above.

Ashley Ransoo

Page updated. Diagram with swimlanes/showing full dependency graph still needs to be updated.