Date | 2017-07-31 |
Authors | |
Status | Complete |
Summary | An ad-hoc system update restarted the Docker process on the production application server which meant that all services were unavailable until the restart completed. |
Impact | Revalidation service wasn't available nationally for 2-3 minutes |
Table of Contents |
---|
Root Cause
...
We received alerts in the #monitoring channel from Prometheus.
Action Items
Action Item | Type | Owner | Issue |
---|---|---|---|
Platform updates will only be run out of hours or on non-active nodes. | mitigate |
Timeline
Supporting Information
https://monitoring.tis.nhs.uk/grafana/dashboard/db/tis-services?panelId=1&fullscreen&edit&orgId=1&tab=metrics&from=15011579646181501496454274&to=15011608798041501497850844&var-service=revalidation-health