Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Date2017-07-31
Authors
StatusComplete
SummaryAn ad-hoc system update restarted the Docker process on the production application server which meant that all services were unavailable until the restart completed.
ImpactRevalidation service wasn't available nationally for 2-3 minutes

Table of Contents

Root Cause

...

We received alerts in the #monitoring channel from Prometheus.

Action Items

Action ItemTypeOwnerIssue
Platform updates will only be run out of hours or on non-active nodes.mitigate

Timeline


Supporting Information

https://monitoring.tis.nhs.uk/grafana/dashboard/db/tis-services?panelId=1&fullscreen&edit&orgId=1&tab=metrics&from=15011579646181501496454274&to=15011608798041501497850844&var-service=revalidation-health