Date	Nov 18, 2021
Authors	@Andy Dingley
Status	Done
Summary	https://hee-tis.atlassian.net/browse/TIS21-2349
Impact	TIS running at reduced capacity

Non-technical Description

TIS is split across two different servers, blue and green, requests are balanced across these two servers for performance and resiliance reasons.

The “blue” server ran out of disk space, causing several of our services to stop functioning.

Trigger

Nov 18, 2021 00:12 UTC - Notification in #monitoring-prod that TCS and Reference services were down on the blue server
Nov 18, 2021 08:34 UTC - Issue identified as low disk space
Nov 18, 2021 08:47 UTC - Issue resolved by deleting old unused docker images
Nov 18, 2021 10:11 UTC - Preventative action taken on green server to reduce similar disk usage

Action Items	Owner	Status

Action Items	Owner	Status
Add monitoring for disk/storage space	https://hee-tis.atlassian.net/browse/TIS21-1383
Review old triggers and get them working again

We need better monitoring to pre-emptively warn us before disk space limitations cause downtime.