Date | |
Authors | |
Status | |
Summary | ndw-etl-prod task was not run on Apr 11 |
Impact | Caused a failure of the User Refresh job on NDW. |
Non-technical Description
The push of TIS data to NDW failed on Apr 11.
Trigger
Detection
Email from NDW team in the morning of 11/04/2023
In #monitoring-ndw channel on Slack, no notifications found for ndw-etl-prod task:
Resolution
Contacted GMC support and technical contact at the GMC
Resolved by GMC
Timeline
BST unless otherwise stated
04:03 - Since then, no more notifications had been received for the overnight ndw etl jobs.
12:07 - James redirected the email that push of TIS data into NDW failed
12:40 - Yafang found no logs found on
ndw-etl-prod
in the midnight13:02 - Andy D found the event bridge triggered the task and the service considered itself healthy, but no record for the task on prod
14:26 - Yafang & Jay triggered the task
ndw-etl-prod
manually14:56 - The task has been run and exited on etl-prod successfully
15:11 - James let Guy know and NDW started their jobs.
Root Cause(s)
We expect the
ndw-etl-prod
job to be triggered by the AWS eventbridge rule every day at 2am UTC.From the metrics, the everntbridge rule was triggered on Apr 11, but there’re no logs found on Cloudwatch. And from the ECSStoppedTasksEvent, we can also find the
ndw-etl-prod
task was not started.
Action Items
Action Items | Comments | Owner |
---|---|---|
0 Comments