Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Liban Hirey (Unlicensed)

Status

Documenting

Summary

Prod Green server went down

Impact

Bulk Upload unavailable

Non-technical Description

  • .

Trigger

  • .

Detection

  • .

Resolution

  • .

...

  • The Prod Green server went down and it’s instance state was “stopped” when looking at it in AWS

...

Trigger

  • Appears to have been an AMI issue as the following error is displaying in the instance details:

    Code Block
    EC2 can't retrieve the name because the AMI was either deleted or made private

...

Detection

  • Slack Alert at 4:02 AM on

...

Resolution

  • The server was restarted and started functioning accordingly however the AMI error message is concerning therefore will look at recreating the server with a new AMI

...

Timeline

  • ~04:00 - Components started shutting down

  • 04:00 02 - Alerts triggered on slack03

  • 08:50 - VM restarted in cloud console. Generic Upload available again.

...

Root Cause(s)

  • .The AMI used by the instance was deleted

...

Action Items

Action Items

Owner

Recreate EC2 instance with a new AMI

Further investigate as a number of our EC2 instances are showing the same AMI deleted/made private message

...

Lessons Learned

  • .