Date	06 Jan 2022
Authors	Reuben Roberts
Status	Documenting
Summary	MongoDB cluster went down: https://hee-tis.atlassian.net/browse/TIS21-2535
Impact	No end user impact. The database that holds the information for communicating with ESR was unavailable for approximately 30min and the integration was paused during that time.

Non-technical Description

The MongoDB database that supports the dialogue with ESR failed. When the services resumed, the pending events from TIS, e.g. updates to personal details, were processed.

...

Trigger

Currently unknown: presumably, the database service became overloaded, though no out-of-memory errors were logged.

...

...

06 Jan 2022 13:13 - Alert on Slack: AWS Service 10.170.0.151:18080 is down.
06 Jan 2022 13:17:21 - Docker reports mongo2 container is unhealthy (syslog: Jan 6 13:17:21 ip-10-170-0-151 dockerd[497]: time="2022-01-06T13:13:04.991689589Z" level=warning msg="Health check for container 971e3085ffb867b27e4909c42281e79bacff535976c05463ff5674b43d97b683 error: context deadline exceeded")
06 Jan 2022 13:22 and 13:28 respectively - Docker reports mongo1 and mongo3 containers are unhealthy, as per above.
06 Jan 2022 ~13:37 - Server rebooted
06 Jan 2022 13:38:01 - Server Mongo instances log that ‘MongoDB starting’
06 Jan 2022 13:38 - Alert on Slack that connection is restored

...

...

Action Items	Owner
Done: Use the same (T3) EC2 instances for Production as are currently used for Staging MongoDB	John Simmons (Deactivated)

...