Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Our error notification system (Sentry) led us to investigate logs this morning.reported an error at 11pm the previous night.

    • Further investigation showed this error to have occurred multiple times since 4pm the previous day

  • awslogs-prod-tis-revalidation-recommendation logs were up to 1.2GB, we were getting lots of Execution of Rabbit message listener failed . . . Caused by: java.lang.NullPointerException errors.

  • Investigation of RabbitMq console app revealed a single endlessly requeuing message.

...

  • 9:03 - Cai Willis reported the errors

  • 9:35 - Investigation started

  • 9:40 - issue reported to users and Recommendation paused for 5 minutes

  • 9:50 - Temporary fix made

  • 9: 50- Comms sent to users that recommendation was back

  • 10: 40- Preventative measure deployed to recommendation service (prevent requeuing)

  • 12: 05- Likely root cause discovered

  • 13: 15- Root cause solution deployed to production environment

...

Root Cause(s)

  • Some unexpected data got on reval.queue.recommendationstatuscheck.updated.recommendation

  • Poor handling of null values in deferral reasons

  • Default behaviour of requeuing messages when exception thrown

...