2023-10-10 Failing Tableau refreshes on TIS PID Data site

Date

Oct 13, 2023

Authors

@Stan Ewenike (Unlicensed) @James Harris

Status

Documenting

Summary

https://hee-tis.atlassian.net/browse/TIS21-5196

Impact

Data in some of the TIS reports were stale and reflected what was on TIS from the last successful refresh

Non-technical Description

An issue was discovered (on 11th Oct 2023) where TIS Data Extracts' refresh were failing since 2 days prior. The impact of this was the data Data in some of the TIS reports were stale and reflected what was on TIS from the last successful refresh.

Trigger

Refer to timeline point:

  • Oct 11, 2023 09:42


Detection

Email notifications in Site Owner’s inbox


Resolution

  • Re-application of missing permisssions on the NDW service account used to refresh data extracts on Tableau Server.

  • Re-run refresh post-fix


Timeline

BST unless otherwise stated

DateTime

Activty

Supporting documentation (if any)

DateTime

Activty

Supporting documentation (if any)

1

Oct 11, 2023 09:42

Site Owner raises the alarm on the number of email notifications in own inbox about failed tableau refreshes on TIS PID Data site

Sample of one of the email notifications

2

Oct 11, 2023 10:15

PO and team informed. Brief team discussion on priority and delegation of the ‘coordinator’ role for the incident

 

3

Oct 11, 2023 11:00

Quick analysis on number of data sources with failed refreshes and dates of last successful refresh established.

 

4

Aug 11, 2023 11:56

Tableau Administrator/Data Service team got notified

 

5

Oct 11, 2023 13:00

Further analysis to establish scope/coverage of the problem and extent of stale data

 

6

Oct 11, 2023 14:09

Tableau Administrator/Data Service team confirms identification of the root cause of the problem

7

Oct 12, 2023 07:21

Requested progress update on permanent fix from Tableau Administrator/Data Service team

 

8

Oct 12, 2023 07:29

Updated PO on state of affairs at the time via an MS teams call

 

9

Oct 12, 2023 09:33

Received confrmation of permanent fix in progres from Tableau Administrator/Data Service team

 

10

Oct 12, 2023 11:35

Communication sent out on MS Teams channels to stakeholders. This was sent out on the following channels:

  • TIS Reporting, WTE Data & Analytics

  • General, TIS Support Channel

 

11

Oct 12, 2023 14:23

Received notification of completion of permanent fix. Verified data sources now refreshed

 

12

Oct 12, 2023 14:38

Updated stakeholders on MS Teams channels

Root Cause(s)

Modifications made to the permissions on the NDW service account used to refresh data extracts on Tableau Server.

Refer to timeline point:

  • Oct 11, 2023 14:09


Action Items

Action Items

Owner

Status

Action Items

Owner

Status

Best practice suggestion: Tableau Administrator/Data Service should consider updating distribution list to include more recipients of error/failure notifiations. Potential inclusions are: PO, Lead developer, one analyst from each TIS team

 

 

Best practice suggestion: Reduce frequency of refreshes for non-critical data where possible and adjdust refresh schedule accordingly for spread

All Tableau Creators and Report Developers

 


Lessons Learned

  • We were fortunate that the Site Owner was working.

  • Possible update of distribution list to include more recipients of error/failure notifiations. Suggestions for potential inclusions are: PO, Lead developer, one analyst from each team

  • Reduce frequency of refreshes for non-critical data where possible and adjdust refresh schedule accordingly for spread