Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Non-technical Description

There was an interruption that stopped TIS data being sent to Hicom via the NDW.

...

Trigger

...

Detection

  • Alerting in our monitoring channel

...

  • - 04:41 & 05:02 - Alerts in the NDW monitoring channel

  • - 09:06 - NDW succesfuly re-run on stage and prod

  • - ~midday - Agreed with HICOM less disruption by waiting for tomorrow rather than re-running jobs

...

Root Cause(s)

  • WThe Ansible job timed

  • The ETL kept retrying a failed chunk

  • All connections in the to NDW got closed

...

Action Items

Action Items

Owner

Lessons Learned

...

Improve Alerting from NDW ETLs (probably using Sentry)

Marcello Fabbri (Unlicensed) Edward Barclay

Configure persistent logs from NDW ETLs

Edward Barclay Marcello Fabbri (Unlicensed)

Improve Connection Pools: validation, eviction etc.

...

Lessons Learned