Date |
|
Authors | |
Status | Documenting |
Summary | Prod Green server went down |
Impact | Bulk Upload unavailable |
Non-technical Description
One of the servers that the TIS application is load-balanced between became unavailable.
Trigger
Server instance state was “stopped” when looking at it in AWS
Appears to have been an Amazon Machine Images (AMI) issue as the following error is displaying in the instance details:
EC2 can't retrieve the name because the AMI was either deleted or made private
Detection
Slack Alert at 4:02 AM on
Resolution
The server was restarted and started functioning accordingly however the AMI error message is concerning therefore will look at recreating the server with a new AMI
Timeline
~04:00 - Components started shutting down
04:02 - Alerts triggered on slack
08:50 - VM restarted in cloud console. Generic Upload available again.
Root Cause(s)
The AMI used by the instance was deleted
Could this be due to the recent log4j vulnerability?
Action Items
Action Items | Owner |
---|---|
Recreate EC2 instance with a new AMI | |
Further investigate as multiple EC2 instances are showing the same AMI deleted/made private message | |
Investigate what triggered this change on the AMI so as to mitigate it reoccurring | |
Lessons Learned
.
Add Comment