Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Status

Resolved

Summary

Impact

TCS down

Table of Contents

Non-technical Description

...

Trigger

...

Detection

...

Resolution

...

Timeline

...

  • : 14:16 BST - First AuthenticationFailureException thrown

  • : 14:18 BST - Notification of TCS Health Check failure on Slack (#monitoring-prod)

  • : 14:18 BST - Users start flagging the problem on Teams

  • : 14:24 BST - Issue identified as a Rabbit authentication error

  • : 14:30 BST - Typo in password rectified and TCS redeployed

  • : 14:30 BST - TCS stable again

Root Cause(s)

...

Action Items

Document how to investigate Flyway migration issues.

Action Items

Owner

...

Lessons Learned