Implement Auto-ScalingDateDate |
| ||||||||
Authors | Rob Pink Joseph (initialPepe) Kelly | ||||||||
Status | |||||||||
Summary | Bulk upload not uploading and experiencing timeout error (401) | ||||||||
Impact | User unable to use bulk upload. Unclear whether more than one user, but it persisted Bulk upload was unavailable for c. 3-4 working hours before resolution.
|
...
User attempting to do a bulk upload received a timeout error (401). On investigation, it was found that an out-of-date piece of configuration information that was released prior to the issue being experienced. This impacted the bulk upload process. Once the up-to-date configuration was loaded, the problem was addressed.
...
Trigger
…Deploying / Approving a deployment
...
Detection
User alerted via Teams
...
Resolution
Synchronised infrastructure definition from IaC repository used by the build process and reran the CICD pipeline.
...
Timeline
All times BST unless otherwise indicated.
“Infrastructure definitions” left out of date.
1444 user reported problem on Teams “Bulk Upload Module - Hi Team, the bulk uploader page keeps reloading with server took too long message”.
0904 acknowledged
0926 reported on Teams by TIS - “There was unfortunately an out-of-date piece of configuration information that was released yesterday afternoon, this meant that the bulk upload application was unable to check your permissions to use it were valid. I have deployed the latest information and it is accessible again.” ~12:02-13:30 The configuration used for deploying was manually edited and the pipeline executed. It was then released to production.
14:44 User reported problem.
09:10 - 09:26 The Infrastructure Code definitions were updated where they are used by the build process, the pipeline was run and users notified.
5 Whys (or other analysis of Root Cause)
The problem is that the user is receiving 401 errors when attempting to upload data via bulk upload
...
page was refreshing because API calls returned 401 errors.
401 errors were being returned, probably, because bulk upload could not communicate with other services. Lack of logs from the service defined prevent us saying so with certainty
The bulk upload service was using out-of-date information.
Actions for an earlier Live Defect had not been completed and this meant that builds were using an earlier copy of our infrastructure definition.
...
Action Items
Action Items | Owner | |
---|---|---|
“Unresolve” Build Server card until it has been fully resolved | ||
Run bulk upload on a Cloud Native service | ||
Repair persistent logging for bulk upload | ||
Investigate modifying response codes when services are unavailable |
See also:
...