Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Date

Authors

Andy Dingley

Status

Done

Summary

TIS services could not access reference data

Impact

Some TIS functionality may was unavailable during the downtime

Non-technical Description

Several of the TIS functions rely on access to reference data, such as Site, Title, Gender.
The service which provides the reference data was migrated to a new location, but not all of the existing TIS functions were informed of the new location.


Trigger

  • The TIS-REFERENCE service was stopped on stage, prod and NIMDTA environments (with the expectation that the ECS instance was already being used).

Detection

  • Sentry error received in #sentry-esr that the ESR-NOTIFICATIONGENERATOR service could not access trust codes.


Resolution

  • TIS-REFERENCE restarted on prod blue to restore functionality.

  • Affected services re-deployed (ESR-NOTIFICATIONGENERATOR, TIS-CONCERNS, TIS-PROFILE and TIS-SYNC).

  • TIS-REFERENCE stopped again on prod blue.


Timeline

  • 16:00 - TIS-REFERENCE stopped on all environments

  • 16:03 - ESR notification generator exception picked up by Sentry and sent as slack notification.

  • 16:16 - TIS-REFERENCE restarted on prod blue.

  • 16:19 - Redeployments underway for ESR-NOTIFICATIONGENERATOR, TIS-CONCERNS, TIS-PROFILE and TIS-SYNC

  • 18:15 - Last of the deployments completed

  • 12:30 - TIS-REFERENCE stopped on prod blue


Root Cause(s)

  • The load balancer config was updated to point to the new ECS instance of TIS-REFERENCE

  • The load balancer changes were not deployed to ESR-NOTIFICATIONGENERATOR, TIS-CONCERNS, TIS-PROFILE and TIS-SYNC

  • Stopping the TIS-REFERENCE service on blue/green then meant that the service could not be found instead of using the ECS instance.


Action Items

Action Items

Owner

Include a more descriptive sub-task(s) for deploying load balancer changes on future ECS migration tickets

Andy Dingley


Lessons Learned

  • Ensure services aren’t missed when deploying changes

  • No labels