Date | 13 May 2021 |
Authors | |
Status | In ProgressDocumenting |
Summary | |
Impact | NDW was a day out of date for some records. Revalidation and GMC Connections data was out-of-date |
...
???
: 15:16 BST - Jenkins raising exceptions
: 00:17 BST - Jenkins stopped logging
: 06:55 BST - Super-Scrum master flagged not only downtime but also additional consequences
There was an unrelated failure mentioned (STAGE PersonSync job)
: 07:25 BST - Jenkins restarted
: 07:30 BST - NDW jobs restarted
: 08:24 BST - Question about Revalidation jobs raised on MS Teams
: 08:25 BST - NDW jobs finished
: ??:?? 08:50- BST - gmc-sync jobs rerun
: ??09:?? 27 BST - confirmed with reval users data had been refreshed
: 10:27 BST - Downstream NDW ETLs finished
...
Action Items | Owner | Ticket ref |
---|---|---|
Establish how to manage dependabot PRs:
This might be covered best to cover in the Dev Handbook. | ||
Change the dependabot config: don’t auto-rebase (Existing dependabot Tech Improvement)
| ||
Move nightly jobs away from Jenkins/build server i.e. ECS | ||
...
Lessons Learned
We’ve done going further in RCA over a more complete RCA compared to when a similar outage happened: https://hee-tis.atlassian.net/wiki/spaces/NTCS/pages/1936687204/2020-08-03+TIS+NDW+ETLs+didn+t+run