Date |
|
Authors | |
Status | In progress |
Summary | Logging has been mis-configured in some services to generate far too many logs |
Impact | This uses too much space, and therefore has an impact on cost. Specifically of CloudWatch |
Non-technical Description
Logging is a good thing that helps us pinpoint problems occurring with any of our services and therefore helps us fix them as quickly as possible. However, we’re generating too much logging on some of our services resulting in the logs (text files) taking up too much space on CloudWatch and incurring unnecessary costs.
Trigger
.
Detection
Ad-hoc review of CloudWatch.
Resolution
Re-configure logging on several services
Errors in one service of
Cannot delete entity with id 'null'
. This is because when gmcReferenceNumber of trainee from rabbit is 'null', but it is trying to delete record from Elastic Search.We can change to not using gmcReferenceNumber as ID as it can be ‘null’
And we should have some code for ‘null’ handling
Timeline
: 17:00 - Noticed the over-sized logging when doing an ad-hoc review of CloudWatch.
: 10:00 - Raised issue in Stand up where it was classed as a LiveDefect (although no users were affected, it is a current issue with an impact on monthly costs).
: 11:00 - Identified fixes that could be put in place within one of the services.
: 11:10 - Started implementing the fixes...
Root Cause(s)
Misconfiguration of logs on some services.
Action Items
Action Items | Owner |
---|---|
| |
| |
| |
Lessons Learned
Exercise caution when configuring logging, and test that the logs generated are logging only what is intended
Potential team sharing on the correlation between logging configuration and log sizes (over time) and costs
Monitoring on CloudWatch would be useful to highlight these kinds of issues before they result in high costs
Add Comment