Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Authors

Cai WillisDoris.Wong Jayanta Saha Joseph (Pepe) Kelly Adewale Adekoya

Status

Patched, Root Cause Found, Solution to Root Cause in ProgressResolved

Summary

Jira Legacy
serverSystem JIRA
serverId4c843cd5-e5a9-329d-ae88-66091fcfe3c7
keyTIS21-2653

Impact

Large numbers of logs generated requiring ~5 minutes downtime of recommendations. Application was becoming slow beforehand due to message processing.

...

  • 9:03 - Cai Willis reported the errors

  • 9:35 - Investigation started

  • 9:40 - issue reported to users and Recommendation paused for 5 minutes

  • 9:50 - Temporary fix made

  • 9: 50- Comms sent to users that recommendation was back

  • 10: 40- Preventative measure deployed to recommendation service (prevent requeuing)

  • 12: 05- Likely root cause discovered

  • 13: 15- Root cause solution deployed to production environment

...

Root Cause(s)

  • Poor handling of null values in deferral reasons

  • Default behaviour of requeuing messages when exception thrown

...