2023-10-10 Failing Tableau refreshes on TIS PID Data site
Date | Oct 13, 2023 |
Authors | @Stan Ewenike (Unlicensed) @James Harris |
Status | Documenting |
Summary | |
Impact | Data in some of the TIS reports were stale and reflected what was on TIS from the last successful refresh |
Non-technical Description
An issue was discovered (on 11th Oct 2023) where TIS Data Extracts' refresh were failing since 2 days prior. The impact of this was the data Data in some of the TIS reports were stale and reflected what was on TIS from the last successful refresh.
Trigger
Refer to timeline point:
Oct 11, 2023 09:42
Detection
Email notifications in Site Owner’s inbox
Resolution
Re-application of missing permisssions on the NDW service account used to refresh data extracts on Tableau Server.
Re-run refresh post-fix
Timeline
BST unless otherwise stated
DateTime | Activty | Supporting documentation (if any) | |
---|---|---|---|
1 | Oct 11, 2023 09:42 | Site Owner raises the alarm on the number of email notifications in own inbox about failed tableau refreshes on TIS PID Data site | Sample of one of the email notifications |
2 | Oct 11, 2023 10:15 | PO and team informed. Brief team discussion on priority and delegation of the ‘coordinator’ role for the incident |
|
3 | Oct 11, 2023 11:00 | Quick analysis on number of data sources with failed refreshes and dates of last successful refresh established. |
|
4 | Aug 11, 2023 11:56 | Tableau Administrator/Data Service team got notified |
|
5 | Oct 11, 2023 13:00 | Further analysis to establish scope/coverage of the problem and extent of stale data |
|
6 | Oct 11, 2023 14:09 | Tableau Administrator/Data Service team confirms identification of the root cause of the problem | |
7 | Oct 12, 2023 07:21 | Requested progress update on permanent fix from Tableau Administrator/Data Service team |
|
8 | Oct 12, 2023 07:29 | Updated PO on state of affairs at the time via an MS teams call |
|
9 | Oct 12, 2023 09:33 | Received confrmation of permanent fix in progres from Tableau Administrator/Data Service team |
|
10 | Oct 12, 2023 11:35 | Communication sent out on MS Teams channels to stakeholders. This was sent out on the following channels:
|
|
11 | Oct 12, 2023 14:23 | Received notification of completion of permanent fix. Verified data sources now refreshed |
|
12 | Oct 12, 2023 14:38 | Updated stakeholders on MS Teams channels |
Root Cause(s)
Modifications made to the permissions on the NDW service account used to refresh data extracts on Tableau Server.
Refer to timeline point:
Oct 11, 2023 14:09
Action Items
Action Items | Owner | Status |
---|---|---|
Best practice suggestion: Tableau Administrator/Data Service should consider updating distribution list to include more recipients of error/failure notifiations. Potential inclusions are: PO, Lead developer, one analyst from each TIS team |
|
|
Best practice suggestion: Reduce frequency of refreshes for non-critical data where possible and adjdust refresh schedule accordingly for spread | All Tableau Creators and Report Developers |
|
Lessons Learned
We were fortunate that the Site Owner was working.
Possible update of distribution list to include more recipients of error/failure notifiations. Suggestions for potential inclusions are: PO, Lead developer, one analyst from each team
Reduce frequency of refreshes for non-critical data where possible and adjdust refresh schedule accordingly for spread
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213