...
Non-technical Description
TIS is made up of multiple “microservices”, small components with individual responsibilities, which work together to provide the full TIS application. One such example is the “bulk upload” microservice which provides all of TIS’s bulk upload/update/create functionality.
These microservices connect to each other in order to perform their tasks, for example the bulk upload microservice extracts data from the speadsheet and sends it to another microservice which is capable of handling the person/placement/assessment/etc. data.
Before a microservice can connect to another microservice it must authenticate (log in) to gain access, in a similar way to how users log in to Admins UI.
We experienced a configuration issue which stopped those authentication requests from being sent, as a result the microservice could not “log in” and any subsequent connection to other microservices would have been denied.
In the case of bulk upload this meant that we were unable to process the uploaded spreadsheets as the extracted data could not be sent to a microservice capable of handling it.
...
Trigger
Apache/Docker updated on the Stage and Prod VMs.
Detection
Generic Upload: reported by user on Teams
ESR: detected by Sentry?
...
Resolution
Fix the hosts files across each service/environment
...
Timeline
BST unless otherwise stated
14:xx - Message to James Harris querying whether it was possible to do bulk uploads, no indication that they were encountering an issue.
09:44 - TIS Admin reported, via TIS Support Channel, that they were getting an “Unknown Server Error” when performing bulk uploads.
10:47 - Users informed on TIS Support Channel that we were aware of a bulk upload issue affecting all users and were investigating.
12:16 - Users informed on TIS Support Channel that we had deployed a fix for bulk upload.
TODO: insert ESR timeline entries from Sentry etc.
...
Root Cause(s)
The services were no longer able to access Keycloak via the public URL
The hosts file was no configured correctly
The Apache/Docker upgrade caused some unexpected configuration changes
???
...
Action Items
Action Items | Owner |
---|---|
...