/
2017-01-06 ES Snapshotting error

2017-01-06 ES Snapshotting error

Date

 

AuthorsGraham O'Regan (Unlicensed)
StatusComplete
SummaryElasticsearch indexing appeared to fail because it exceeded the 30 second timeout. However, the snapshot was created so the job couldn't complete if run again.
ImpactNone

Root Cause

The timeout for the task was the default 30secs but taking the snapshot and copying it to Blob Storage took longer.

Trigger

The nightly ES index snapshotting job on Jenkins

Resolution

The timeout was set to 20mins to allow more time for the snapshot to complete before raising an error.

Detection

The Jenkins job alerts to the #dev Slack channel. Graham O'Regan (Unlicensed) checked to make sure that the container was running on the prod server and ran the commands manually with curl to see what the underlying problem was.

Action Items

Action ItemTypeOwnerIssue
Increased timeout to 20minspreventGraham O'Regan (Unlicensed)

Timeline

  • Nightly job failed and alerted #dev in Slack
  • An attempt to run the snapshot manually with curl also failed with a different exception because the snapshot did exist.

Supporting Information

Link to initial failed task.

https://build-hee.transformcloud.net/jenkins/job/elasticsearch-snapshot-prod/25/

Related content

2016-12-19 Elasticsearch snapshots deleted on production
2016-12-19 Elasticsearch snapshots deleted on production
More like this
2019-07-30 ElasticSearch Sync Job failed
2019-07-30 ElasticSearch Sync Job failed
More like this
beta-005
More like this
2021-02-12 Person Search Sync Failed
2021-02-12 Person Search Sync Failed
More like this
2016-12-21 Intrepid ETL Failure
2016-12-21 Intrepid ETL Failure
More like this
2023-07-19 TIS person search list - unable to find some doctors
2023-07-19 TIS person search list - unable to find some doctors
More like this