Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Joseph (Pepe) Kelly Jayanta Saha

Status

In Progress

Summary

ETL failed after an initial attempt at moving it into our new AWS ECS environment

Impact

Downstream NDW and Hicom processes failed - data in NDW is now was 1 day out of date (this has now been fixed)

Non-technical Description

...

No alerts in the #monitoring-ndw channel (alerts had been moved to the #monitoring-prod channel).

Alerts in the #monitoring-prod channel confirmed the ETL ran at the wrong time and took too long to complete.

Data lead and NDW team member enquiries in shared #tis-ndw-etl channel Slack channel.

...

Resolution

Revert to a manual reuse of the Jenkins ETL in order that downstream processes can be completed later today.

Re-work the new ECS ETLs to start after the TIS Sync jobs on which they depend have finished (i.e. after 02.40), and complete before the NDW and Hicom processes that depend on the ETLs begin (i.e. before 03.45).

Put a process in place to ensure the successful running of TIS NDW ETLs, after 0245 and before 03:45 each morning, over the weekend/Monday morning.

...

Timeline

: 03:15 - No alert in the #monitoring-ndw channel at the expected time (c.03.15 - 03:35)

...