Date | |||||||||
Authors | |||||||||
Status | DocumentingDone | ||||||||
Summary | The reference service became unstable on one of the production servers
| ||||||||
Impact | TIS running at reduced capacity |
...
TIS is split across two different servers, blue and green, requests are balanced across these two servers for performance and resiliance resilience reasons.
The “blue” server ran out of disk space, causing the reference service to stop functioning.
...
Detection
Slack monitoring alert.
Logging
Code Block |
---|
2022-02-06 15:55:53.067 INFO 1 --- [ AsyncThread-1] c.h.t.e.e.facade.FileProcessorFacade : Processing [in/DE_EMD_RMC_20220206_00002940.DAT] 2022-02-06 15:55:53.067 INFO 1 --- [ AsyncThread-1] c.h.t.e.e.service.FileTransferService : Downloading [in/DE_EMD_RMC_20220206_00002940.DAT] from S3 bucket [esr-sftp-prod] 2022-02-06 15:55:53.185 INFO 1 --- [anager-worker-5] c.a.s.s3.transfer.DownloadCallable : Retry the download of object in/DE_EMD_RMC_20220206_00002940.DAT (bucket esr-sftp-prod) com.amazonaws.SdkClientException: Unable to store object contents to disk: No space left on device at com.amazonaws.services.s3.internal.ServiceUtils.downloadToFile(ServiceUtils.java:314) at com.amazonaws.services.s3.transfer.DownloadCallable.retryableDownloadS3ObjectToFile(DownloadCallable.java:282) |
...
Resolution
Removed unneeded stack dump files from /var/log/apps
Identified files received but without confirmation of success and
POST
ed a request to the data reader, as a Lambda function would have done yesterday.
...
Timeline
17:34 - Reference service failure alert on Slack
18:34 - Reference service recovery alert on Slack
18:44 - Reference service failure alert on Slack
These failures continue periodically until 07:34.
09:10 - Reran GMC Sync - for connections.
10:15-11:50 - Identified ESR files which may not have processed and re-ran import.
...
Root Cause(s)
Blue server ran out of disk space
We have no process to clean unneeded files (old logs, stack dumps, etc.)
We have inadequate monitoring on disk/storage usage
...