Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Date

10 May 2021

Authors

Reuben Roberts

Status

Implementing

Summary

https://hee-tis.atlassian.net/browse/TIS21-1555

Impact

Some users receive ‘Service unavailable’ message on TIS preventing them from accessing any of the site functionality. Clearing cookies / browser cache, or simply waiting a few minutes, would resolve the issue.

Non-technical Description

Access to TIS fails for some users, some of the time, with the browser message ‘Service unavailable’. TIS recovers without intervention a few minutes later; clearing the browser cache would also restore access.


Trigger


Detection

  • Teams notifications from users from 7 May 2021 16:17 onwards.



Resolution

  • Reset of OIDC (Open ID Connect) on both production servers.


Timeline

  • 7 May 2021 16:17 User reports on Teams that TIS is giving a ‘Service unavailable’ error

  • 8 May 2021 01:20 Marcello reports noticing the issue while checking that the nightly sync job has completed successfully

  • 10 May 2021 08:48 Various user reports of the same issue on Teams

  • 10 May 2021 10:19 HEE-TIS-VM-PROD-APPS-GREEN removed from EC2 load balancing cluster

  • 10 May 2021 10:24 HEE-TIS-VM-PROD-APPS-GREEN rebooted

  • 10 May 2021 10:33 HEE-TIS-VM-PROD-APPS-GREEN added back to EC2 load balancing cluster

  • 10 May 2021 10:46 HEE-TIS-VM-PROD-APPS-GREEN docker logging observed

Root Cause(s)

  • Users were seeing an error from Apache webserver ‘Service unavailable’

  • Logs showed that Apache was rejecting user requests. The user had too many session authentication tokens [TODO: get log message]

  • Apache is configured to allow one token, but inspection of the user machine showed they had three tokens.

  • The number of tokens arose from multiple simultaneous authentication attempts.

  • A configuration change was rolled-out just prior to the issue being observed.

  • The limit on tokens is set with API Gateway OIDCStateMaxNumberOfCookies 1 true ('true' flushes out any excess tokens), but this setting was needed to be added manually because the infrastructure configuration tool (Ansible) couldn’t cope with that setting), so users logging-in while that was not set would create multiple cookies. It is also possible that multiple logins across different browser sessions (within the same browser) would create multiple cookies. If the user’s session expires due to inactivity, and the user then logs in again, this new log-in will also create a duplicate cookie.


Action Items

Action Items

Owner

Investigate Ansible upgrade / recheck current version to permit full OIDCStateMaxNumberOfCookiesconfiguration without manual changes required

John Simmons (Deactivated)

Add comment to Ansible script to highlight any required manual amendments

John Simmons (Deactivated)

Check NI Apache configuration template for consistency (OAuth2.conf.j2)

John Simmons (Deactivated)


Lessons Learned

  • Not all infrastructure as code is coded

  • Not always possible to be certain problem will not arise again but needs to be weighed-up against effort

  • No labels