Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Apache/Docker updated on the Stage, Prod and Prod management VMs.

Detection

  • Generic Upload: reported by user on Teams

  • ESR: detected by Sentry?

...

BST unless otherwise stated

  • - Blue stage server upgraded. TIS continued to function normally after this upgrade. Used to develop a sequence for the upgrade of other ec2 instances.

  • 10:30 to 15:13 - Green stage and monitoring server were upgraded. The build server was partially upgraded.

  • 14:xx - Message to James Harris querying whether it was possible to do bulk uploads, no indication that they were encountering an issue.

  • 17:15 - Paused production monitoring and begun applying upgrades to blue production.

  • 17:20 - Hit similar issues on production servers of packages remaining from images migrated to AWS but sorted out.

  • 18:20 - Prod inaccessible while being upgraded.

  • 19:04 - Validated that upgrade working on blue server and upgrade of green server began.

  • 19:42 - Prod appeared fully accessible with upgraded components, monitoring re-enabled.

  • 09:44 - TIS Admin reported, via TIS Support Channel, that they were getting an “Unknown Server Error” when performing bulk uploads.

  • 10:47 - Users informed on TIS Support Channel that we were aware of a bulk upload issue affecting all users and were investigating.

  • 12:16 - Users informed on TIS Support Channel that we had deployed a fix for bulk upload.TODO: insert

  • ESR timeline entries from Sentry etc. 15:06 - Networking change applied to production and workaround hotfix removed (after validating in stage environment).

  • 17:06 to 18:40 - ESR integration services re-enabled and monitored for processing. 1 container definition required modified networking

...

Root Cause(s)

  • The services were no longer able to access Keycloak via the public URL, which resolved to a loopback addressi

  • The hosts file was no configured correctly

  • The Apache/Docker upgrade caused/required some unexpected configuration changes

  • ???

...

Action Items

Owner

...

Lessons Learned

  • Service Dependencies