Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Date

Authors

Liban Hirey (Unlicensed)

Status

Documenting

Summary

Prod Green server went down

Impact

Bulk Upload unavailable

Non-technical Description

  • One of the servers that the TIS application is load-balanced between became unavailable.


Trigger

  • Server instance state was “stopped” when looking at it in AWS

  • Appears to have been an Amazon Machine Images (AMI) issue as the following error is displaying in the instance details:

    EC2 can't retrieve the name because the AMI was either deleted or made private

Detection

  • Slack Alert at 4:02 AM on


Resolution

  • The server was restarted and started functioning accordingly however the AMI error message is concerning therefore will look at recreating the server with a new AMI


Timeline

  • ~04:00 - Components started shutting down

  • 04:02 - Alerts triggered on slack

  • 08:50 - VM restarted in cloud console. Generic Upload available again.


Root Cause(s)

  • The AMI used by the instance was deleted

  • Could this be due to the recent log4j vulnerability?


Action Items

Action Items

Owner

Recreate EC2 instance with a new AMI

Further investigate as a number of our EC2 instances are showing the same AMI deleted/made private message

Investigate what triggered this change on the AMI so as to mitigate it reoccurring


Lessons Learned

  • .

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.