Date |
|
Authors | |
Status | Documenting |
Summary | ETL failed after modifying the configuration |
Impact | Data in NDW was 1 day out of date for a short perion period out-of-hours |
Non-technical Description
...
Notifications in the #monitoring-ndw channel.
Notification in the #monitoring-prod channel confirmed the ETL ran at the wrong time and took too long to complete.
Data lead and NDW team member enquiries in shared #tis-ndw-etl channel Slack channel.
...
TIS Data Manager confirmed no downstream processes were affected.
...
Resolution
Re-ran in fully configured part of network.
Long-term:
Altered configuration to run on the same subnet as the database.
Removed unused/partially functional subnet
...
Timeline
: 02:30 - Failure message in the #monitoring-ndw channel
...
Change applied with assumptions about correctness of IaC definitions.
Lessons Learned
Just because it looks right, it doesn’t mean it is.
Action Items
Action Items | Owner | Status |
---|---|---|
Update the ETL timings page. Use wider list of ETLs, i.e. image with swim-lanes above. | Page updated. Diagram with swimlanes/showing full dependency graph still needs to be updated. |