Date	18 Nov 2021
Authors	Andy Dingley
Status	Done
Summary	https://hee-tis.atlassian.net/browse/TIS21-2349
Impact	TIS running at reduced capacity

Non-technical Description

TIS is split across two different servers, blue and green, requests are balanced across these two servers for performance and resiliance reasons.

The “blue” server ran out of disk space, causing several of our services to stop functioning.

Trigger

18 Nov 2021 00:12 UTC - Notification in #monitoring-prod that TCS and Reference services were down on the blue server
18 Nov 2021 08:34 UTC - Issue identified as low disk space
18 Nov 2021 08:47 UTC - Issue resolved by deleting old unused docker images
18 Nov 2021 10:11 UTC - Preventative action taken on green server to reduce similar disk usage

Action Items	Owner	Status
Add monitoring for disk/storage space	https://hee-tis.atlassian.net/browse/TIS21-1383

We need better monitoring to pre-emptively warn us before disk space limitations cause downtime.