Date	09 Apr 2018
Authors	Chris Mills (Unlicensed)
Status	In progress/Complete
Summary	Failures across infrastructure causing issues with multiple services.
Impact	Development and other ETL runs may/may not have run.

Root Cause

To be investigated.

Trigger

To be investigated.

05:29 - Restarted the VM. Looks like it's back.

05:32 - Restarted Intrepid ETLs that failed.

05:34 - Started recovery of 10.150.0.137/8's docker.

05:41 - Earlier ETLs started failed due to docker issue.

To be discovered.

Failures on a number of services over the weekend:

intrepid-extract-clean - #56 Failure after 20 min (Open)

site-dev - #524 Failure after 22 min (Open)

devops - #2258 Failure after 4 min 6 sec (Open)

service-registry - #95429 Failure after 2.8 sec (Open)

sshd not reponding on 10.140.0.136

Action Item	Type	Owner	Issue
Use docker from their apt rather than ubuntu packaged (docker-ce rather than docker.io)	mitigate/prevent

e.g. monitoring dashboards

We REALLY shouldn't be using the default docker from apt..

We shouldn't be running docker like this on the host