2016-12-19 Elasticsearch snapshots deleted on production

Date

 

AuthorsGrante Marshall (Unlicensed) Graham O'Regan (Unlicensed)
StatusComplete
Summary

Elasticsearch snapshots being deleted in Production

Impactdidn't affect service

Root Cause

The configuration settings for Curator on production were set to only retain a single snapshot. When the snapshot process failed Curator removed the only remaining snapshot in Azure Blobstorage.

Trigger

The nightly snapshotting jobs from Jenkins failed because existing snapshots already existed. Curator then ran and deleted the only remaining snapshot.

Resolution

The number of snapshots was increased to 5, one per day.

Detection

There are Jenkins jobs that run the Elasticsearch snapshots which failed and the failures were reported to the #dev channel in Slack.

Action Items

Action ItemTypeOwnerIssue
Increase the number of snapshots retainedpreventGrante Marshall (Unlicensed)

TISDEV-1459 - Getting issue details... STATUS

Timeline

  • 11pm Jenkins job ran
  • Jenkins sent notification to Slack
  •  N (Unlicensed) changed the Curator settings to retain 5 days of snapshots.

Supporting Information