Date |
|
Authors | |
Status | ResolvedDone |
Summary | TCS service down due a rabbitMQ config error |
Impact | TCS down |
Table of Contents |
---|
Non-technical Description
The TCS service fell over due an authentication error on rabbitMQ which was caused by an incorrect configuration value.
...
Trigger
A typing error when saving the password of the Reval rabbitMQ user in our parameter store.
...
Detection
Notification sent to #monitoring-prod.
...
Resolution
Updated the value of the Reval rabbitMQ user’s password in parameter store.
...
Timeline
...
: 14:16 BST - First
AuthenticationFailureException
thrown.: 14:18 BST - Notification of TCS Health Check failure on Slack (#monitoring-prod).
: 14:18 BST - Users start flagging the problem on Teams.
: 14:24 BST - Issue identified as a Rabbit authentication error.
: 14:30 BST - Typo in password rectified and TCS redeployed.
: 14:30 BST - TCS stable again.
Root Cause(s)
Incorrect password set for the Reval rabbitMQ user in the parameter store
...
Action Items
Action Items | Owner | |
---|---|---|
Document how to investigate Flyway migration issues.n/a | ||
...
Lessons Learned
Double check the config values being entered in parameter store or anywhere else.