Date	Jan 3, 2022
Authors	@Liban Hirey (Unlicensed)
Status	Documenting
Summary	Prod Green server went down due to a scheduled retirement of the instance
Impact	Bulk Upload unavailable

Non-technical Description

One of the servers that the TIS application is load-balanced between became unavailable.
After contacting AWS Support it turns out the server was shut down due to a scheduled retirement of the instance caused by underlying hardware issues.
We did not receive any emails from AWS informing us of this scheduled retirement as the address they send emails to (caaa@hee.nhs.uk) is not managed by our team

Trigger

Jan 3, 2022 ~04:00 - Components started shutting down
Jan 3, 2022 04:02 - Alerts triggered on slack
Jan 4, 2022 08:50 - VM restarted in cloud console. Generic Upload available again.
Jan 6, 2022 10:35 - Ticket opened with AWS Support
Jan 6, 2022 10:55 - Response received from AWS Support

The AMI used by the instance was deleted
Could this be due to the recent log4j vulnerability?
AWS notifies us that the instance was stopped due to scheduled retirement caused by an “unrecoverable issue with the underlying hardware”.

Action Items	Owner

Action Items	Owner
Recreate EC2 instance with a new AMI
Further investigate as multiple EC2 instances are showing the same AMI deleted/made private message
Investigate what triggered the server to go down	https://hee-tis.atlassian.net/browse/TIS21-2532
Mitigate this happening again by making sure we receive emails from AWS	https://hee-tis.atlassian.net/browse/TIS21-2533

Make sure we receive emails from AWS instead of it going to an account not managed by our team