Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There were 573 messages in the dlq until we got the notification in #monitoring-prod Slack channel and shovelled them.

...

Resolution

  • Processed the files a second time.

...

  • 14:58 noticed the alert: “RabbitMQ PROD too many messages in the DLQ” in #monitoring-prod channel, then shovelled them to another queue “esr.dlq.2023.09.01“

  • 10:32 had a huddle and looked into the messages in queue “esr.dlq.2023.09.01“

  • 12 01 Processed files from 1st Sept (moved messages into the normal flow for processing)

Root Cause(s)

...

  • The quantity of messages in the dead letter queue showed a variety of types of messages failed. Messages which relied on enrichment from information in TCS seem to be the ones which failed.

  • TCS was busy but still functional. There were timeouts when TCS requested information about the user making the request.

  • Profile was experiencing a short spike in CPU usage.

...