Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Reinitialising the data directory for the failed node and allowing a full synchronisation to take place.

  • Restarting the machine the database was on following the synchronisation

...

Timeline

  • 16:30 BST - Backup runstarted

  • 16:44 BST - Disk full

  • 17:46 BST - After investigating several options, began a full re-sync of the node

  • 19:19 BST - Node transitioned to secondary but the the VM was still non-responsive, EC2 status check started failing

  • 21:44 -22:10 BST - Restarted the machine and checked replicaset reported as healthy

  • 00:04 BST - All integration services restarted and a problem message cleared from the message broker

Root Cause(s)

  • Backup written to the same device used for data & transaction logs

...

Action Items

Action Items

Owner

Increase storage available to Mongo

Create a repeatable process for copying data to stage environment

...

Lessons Learned

  • Just never output to the same device that data is written on.