Page Comparison

...

This was resolved by giving the servers a way of finding the authentication sevice service (and any other service) that can be found from inside a single server or across multiple servers. Therefore the requests should always be able to find the correct route.

...

Generic Upload: reported by a user on Teams
ESR: detected by alertmanager but seen by Joseph (Pepe) Kelly when pausing alerts for work that was already being undertaken

...

29 Apr 2022 - Blue stage server upgraded. TIS continued to function normally after this upgrade. Used to develop a sequence for the upgrade of other ec2 instances.
05 May 2022 10:30 to 15:13 - Green stage and monitoring server were upgraded. The build server was partially upgraded.
05 May 2022 14:xx - Message to James Harris querying whether it was possible to do bulk uploads, no indication that they were encountering an issue.
05 May 2022 17:15 - Paused production monitoring and begun applying upgrades to blue production.
05 May 2022 17:20 - Hit similar issues on production servers of packages remaining from images migrated to AWS but sorted out.
05 May 2022 18:20 - Prod inaccessible while being upgraded.
05 May 2022 19:04 - Validated that upgrade working on blue server and upgrade of green server began.
05 May 2022 19:42 - Prod appeared fully accessible with upgraded components, monitoring re-enabled.
06 May 2022 09:44 - TIS Admin reported, via TIS Support Channel, that they were getting an “Unknown Server Error” when performing bulk uploads.
06 May 2022 10:47 - Users were informed on TIS Support Channel that we were aware of a bulk upload issue affecting all users and were investigating.
06 May 2022 12:16 - Users were informed on TIS Support Channel that we had deployed a fix for bulk upload.
06 May 2022 15:06 - Networking change applied to production and workaround hotfix removed (after validating in stage environment).
06 May 2022 17:06 to 18:40 - ESR integration services re-enabled and monitored for processing. 1 container definition required modified networking
09 May 2022 11:55 - Verified that there were no files pending export and files had been produced

...

The services were no longer able to access Keycloak via the public URL, which resolved to a loopback address
The hosts file was not configured correctly (not so much of a root cause, but defiantly something that needed to be corrected once found)
The Apache/Docker upgrade caused/required some unexpected configuration changes
The change in major OS version (Ubuntu 16.04 to Ubuntu 18.04) looks like it reset the custom DNS settings we were using originally, reverting re-applying these have made the apps run as expected.

...

Versions Compared

Old Version 7

New Version 8

Key