/
2017-01-06 ES Snapshotting error
2017-01-06 ES Snapshotting error
Date |
|
Authors | Graham O'Regan (Unlicensed) |
Status | Complete |
Summary | Elasticsearch indexing appeared to fail because it exceeded the 30 second timeout. However, the snapshot was created so the job couldn't complete if run again. |
Impact | None |
Root Cause
The timeout for the task was the default 30secs but taking the snapshot and copying it to Blob Storage took longer.
Trigger
The nightly ES index snapshotting job on Jenkins
Resolution
The timeout was set to 20mins to allow more time for the snapshot to complete before raising an error.
Detection
The Jenkins job alerts to the #dev Slack channel. Graham O'Regan (Unlicensed) checked to make sure that the container was running on the prod server and ran the commands manually with curl to see what the underlying problem was.
Action Items
Action Item | Type | Owner | Issue |
---|---|---|---|
Increased timeout to 20mins | prevent | Graham O'Regan (Unlicensed) |
Timeline
- Nightly job failed and alerted #dev in Slack
- An attempt to run the snapshot manually with curl also failed with a different exception because the snapshot did exist.
Supporting Information
Link to initial failed task.
https://build-hee.transformcloud.net/jenkins/job/elasticsearch-snapshot-prod/25/
, multiple selections available,
Related content
2016-12-19 Elasticsearch snapshots deleted on production
2016-12-19 Elasticsearch snapshots deleted on production
More like this
2019-07-30 ElasticSearch Sync Job failed
2019-07-30 ElasticSearch Sync Job failed
More like this
beta-005
beta-005
More like this
2021-02-12 Person Search Sync Failed
2021-02-12 Person Search Sync Failed
More like this
2016-12-21 Intrepid ETL Failure
2016-12-21 Intrepid ETL Failure
More like this
2023-07-19 TIS person search list - unable to find some doctors
2023-07-19 TIS person search list - unable to find some doctors
More like this
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213