Monitoring and Alerts - Are we drowning

Actions

Session

Description

When / Links etc

Session

Description

When / Links etc

Review Alerts - ETL

Go through the alerts for the ETLs

  • NDW

  • ESR

NDW Alert

Review Alerts - Infrastructure

  • AWS Notifications e.g. Storage / usage / uptime alerts

  • Any other infra alerts?

 

Review Alerts - Tests / Pipelines

  • Jenkins

  • Github Actions

 

Sync Jobs

  • Person Sync

  • Post Sync

 

Rabbit Messages

  • Dead letter queues

  • Errors

 

Logging Standards

Format of logs, what to log, what not to log, detail

Cloudwatch / Docker / App Logs

Tools

Looking at all the tools we currently have for notifications

Should we keep or sack it off

Graphana

Prometheus

Sentry

Uptime robot

Summary of Breakout Sessions

  • Review tools after lift and Shift move to AWS

  • A lot of the existing alerts - team members not sure what they mean and they could be improved

  • Logs - standards of where they are / what we logged need to be improved

Team Blue