Date

08 Jan 2024

Authors

Status

Documenting

Summary

Trusts were unable to find a number of trainees in their search results. We narrowed in on there being an issue with the copy of information that gets searched and reran the job that builds it.

Jira Legacy

server	System JIRA
serverId	4c843cd5-e5a9-329d-ae88-66091fcfe3c7
key	TIS21-5610

Impact

It wasn’t immediately obvious that some records were not showing in the person search

...

All times in GMT unless indicated

08 Jan 2024 01: - Other Jobs ran for longer than usual and ran beyond the start of the Person ?ES? Job
08 Jan 2024 01:?? - Job ran for xx ?? minutes, when it usually completes in ~15.
08 Jan 2024
08 Jan 202412:01 - Message on Teams about Trust users not finding their trainees in the search.
08 Jan 202413:37 - Started debugging and confirming the cause / that there were no other data related issues.
08 Jan 202414:15 - Confirmed that other regions are affected. A reindex was scheduled.
08 Jan 202415:45 & 16:00 - Confirmed that records were visible as expected.08 Jan 2024

As part of building the timeline, we didn’t identify an earlier occurrence of this defect so we have not sought to extensively reproduce and remedy this issue.

Root Cause(s)

N.B. We have developed a reasonable but not definitive explanation of what has happened.

Users in more than one region/Local Office couldn’t find trainees they were expecting because the search index didn’t have all the records it should have but we believe it did have many/most of the trainees they expected.
The ElasticSearch job completed but ran for longer than expected, as did other jobs
The ElasticSearch Job is dependent on on other jobs running successfully, roughly before it starts.
ElasticSearch, & other jobs work through pages of ids so where jobs overlap, this can lead to partial information being used instead of complete information

...

Action Items

Action Items

Owner

Alert when jobs (or just this job) runs outside the “normal”/”expected”/”acceptable” bounds, e.g.

Longer than expected
Beyond the start / end of jobs that are dependent on each other

Joseph (Pepe) Kelly

Jira Legacy

server	System JIRA
serverId	4c843cd5-e5a9-329d-ae88-66091fcfe3c7
key	TIS21-5628

Space jobs out more to allow more time for each to run

We could: rebuild as a batch job but won’t right now as it would be a significant piece of work

...

Version	Old Version 4	New Version 5
Changes made by	catherine.odukale	Joseph (Pepe) Kelly
Saved on	19 Jan 2024	23 Jan 2024

Versions Compared

Key

Root Cause(s)

Action Items

Lessons Learned

Page Comparison

Versions Compared

Key

Root Cause(s)

Action Items

Lessons Learned