Date |
|
Authors | |
Status | Done |
Summary | TCS service down due a rabbitMQ config error |
Impact | TCS down |
Non-technical Description
The TCS service fell over due an authentication error on rabbitMQ which was caused by an incorrect configuration value.
Trigger
A typing error when saving the password of the Reval rabbitMQ user in our parameter store.
Detection
Notification sent to #monitoring-prod.
Resolution
Updated the value of the Reval rabbitMQ user’s password in parameter store.
Timeline
: 14:16 BST - First
AuthenticationFailureException
thrown.: 14:18 BST - Notification of TCS Health Check failure on Slack (#monitoring-prod).
: 14:18 BST - Users start flagging the problem on Teams.
: 14:24 BST - Issue identified as a Rabbit authentication error.
: 14:30 BST - Typo in password rectified and TCS redeployed.
: 14:30 BST - TCS stable again.
Root Cause(s)
Incorrect password set for the Reval rabbitMQ user in the parameter store
Action Items
Action Items | Owner | |
---|---|---|
n/a |
| |
| ||
Lessons Learned
Double check the config values being entered in parameter store or any other area.
Add Comment