Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
In progress

Date

Authors

John Simmons (Deactivated) Doris.Wong Andy Dingley

Status

Done

Summary

Logging has been mis-configured in some services to generate far too many logs

Impact

This uses too much space, and therefore has an impact on cost. Specifically of CloudWatch

...

  • Re-configure logging on several services

  • Errors in one service of Cannot delete entity with id 'null'. This is because when gmcReferenceNumber of trainee from rabbit is 'null', but it is trying to delete record from Elastic Search.

  • We can change to not using gmcReferenceNumber as ID as it can be ‘null’

  • And we should have some code for ‘null’ handling

  • Disable .

  • Resolved error caused by null values of gmcReferenceNumber being passed to Elastic Search.

  • Disabled tis-revalidation-connections cloudwatch logging until log output can be fully reviewed.

...

Timeline

  • : 17:00 - Noticed the over-sized logging when doing an ad-hoc review of CloudWatch.

  • : 10:00 - Raised issue in Stand up where it was classed as a LiveDefect (although no users were affected, it is a current issue with an impact on monthly costs).

  • : 11:00 - Identified fixes that could be put in place within one of the services.

  • : 11:10 - Started implementing the fixes...

  • : 14:12 - Fix deployed moving ElasticSearch query logging to the DEBUG level (and handled null values)

  • : 15:10 - Fix deployed to remove Spring Boot debug logging

  • : 13:00 - Change deployed to disable cloudwatch logging for tis-revalidation-connection

Root Cause(s)

  • Misconfiguration of logs on some services.

...

Action Items

Owner

  • Change from using gmcReferenceNumber as ID as it can be ‘null’

Doris.Wong

  • And we should have some code for ‘null’ handling

Doris.Wong

  • Remove “debug: true” from conditions evaluation report application.yml

Doris.Wong

  • Monitoring on CloudWatch to send alerts based on cost threshold (monthly average + 10%)?

John Simmons (Deactivated) https://hee-tis.atlassian.net/browse/TIS21-1384

  • Re-enable tis-revalidation-connections cloudwatch logging (recreate log groups)

Andy Dingley https://hee-tis.atlassian.net/browse/TIS21-1382

  • Dashboard to disable display storage quotas (S3, Cloudwatch logs, EFS, EBS)

Andy Dingley https://hee-tis.atlassian.net/browse/TIS21-1383

...

Lessons Learned

  • Exercise caution when configuring logging, and test that the logs generated are logging only what is intended

  • Potential team sharing on the correlation between logging configuration and log sizes (over time) and costs

  • Monitoring on CloudWatch would be useful to highlight these kinds of issues before they result in high costs