Date |
|
Authors | Reuben Roberts, Joseph (Pepe) Kelly, John Simmons (Deactivated), Marcello Fabbri (Unlicensed), Doris.Wong, Cai Willis |
Status | Documenting |
Summary | ElasticSearch’s utilization spiked and made it unresponsive to TCS’s requests |
Impact | Users cannot couldn’t use TIS for a period of 20mins or so. |
Table of Contents |
---|
Non-technical Description
...
: 13:51 BST - CloudWatch shows a spike in memory and CPU utilisation
: 13:57 BST - Slack notification about a FAILING Health Check on TCS Prod
: 14:00 BST - Identified that TCS’s issue regarded a failing connection to ElasticSearch
: 14:01 BST - Users noticed being unable to use TIS, as the main screen keeps updating
: 14:15 BST~ish - A security update’s been run as a way to restart the servers (as they clusters can’t be restarted manually)
: 14:17 BST - Slack notification about a SUCCESSFUL Health Check on TCS Prod
Root Cause(s)
...