Date	09 Apr 2018
Authors	Chris Mills (Unlicensed)
Status	In progress/Complete
Summary	Failures across infrastructure causing issues with multiple services.
Impact	Development and other ETL runs may/may not have run.

Root Cause

To be investigated.

Trigger

To be investigated.

05:29 - Restarted the VM. Looks like it's back.

05:32 - Restarted Intrepid ETLs that failed.

05:34 - Started recovery of 10.150.0.137/8's docker.

To be discovered.

Failures on a number of services over the weekend:

intrepid-extract-clean - #56 Failure after 20 min (Open)

site-dev - #524 Failure after 22 min (Open)

devops - #2258 Failure after 4 min 6 sec (Open)

service-registry - #95429 Failure after 2.8 sec (Open)

sshd not reponding on 10.140.0.136

e.g. monitoring dashboards

We REALLY shouldn't be using the default docker from apt..

We shouldn't be running docker like this on the host