Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Date

 

AuthorsGraham O'Regan (Unlicensed)
StatusComplete
SummaryElasticsearch indexing appeared to fail because it exceeded the 30 second timeout. However, the snapshot was created so the job couldn't complete if run again.
ImpactNone

Table of Contents

Root Cause

The timeout for the task was the default 30secs but taking the snapshot and copying it to Blob Storage took longer.

Trigger

The nightly ES index snapshotting job on Jenkins

Resolution

The timeout was set to 20mins to allow more time for the snapshot to complete before raising an error.

Detection

The Jenkins job alerts to the #dev Slack channel.

Action Items

Action ItemTypeOwnerIssue
Increased timeout to 20minspreventGraham O'Regan (Unlicensed)

Timeline

  • Nightly job failed and alerted #dev in Slack
  • An attempt to run the snapshot manually with curl also failed with a different exception because the snapshot did exist.

Supporting Information

Link to initial failed task.

https://build-hee.transformcloud.net/jenkins/job/elasticsearch-snapshot-prod/25/