/
2022-02-18 Person Search List failed to refresh

2022-02-18 Person Search List failed to refresh

Date

Feb 18, 2022

Authors

@Joseph (Pepe) Kelly @John Simmons (Deactivated) @Yafang Deng @Reuben Roberts @Jayanta Saha @Edward Barclay

Status

Resolved

Summary

Person search sync job failed

Impact

Up to 1,000 person records (out of 290,000) that weren’t findable on the Person search page from 01.41 - 03.24

Non-technical Description

We run a number of sync jobs overnight. This one failed (see ‘Impact', above) - another process was taking place that prevented it from successfully running.

We re-ran the job shortly afterwards and it completed successfully.

We investigated what tripped it up and will work to mitigate a recurrence.


Trigger

  • garbage collection activity taking longer than expected and eating into the sync job schedule.

Detection


Resolution

  • re-running the job as soon as it was noticed.


Timeline

  • 2022-02-18|01:41: Sync [Person sync job] failed with exception…” message in Slack monitoring-prod channel.

  • 2022-02-18|03:13: Team member restarted the job when they notice the issue.

  • 2022-02-18|03:24: Rerun job completed successfully.


Root Cause(s)


Action Items

Action Items

Owner

Action Items

Owner

Refactor the sync job to be robust enough to retry on error - a spike ticket to look at the options?
- task executor like we do elsewhere in TIS?
- REST client retry?
- Spring component for retrying method calls (configurable) - example in Reval (thanks Uzair)?

@Reuben Roberts https://hee-tis.atlassian.net/browse/TIS21-2698


Lessons Learned

  • Its good to retry when you fail!

  • Even highly available systems have issues.

  • Task-based components could do with a bit more defensive development (around retries, consider things other than the ‘happy path’).

  • Our monitoring works nicely (for anyone who’s an insomniac).

Related content

2022-03-03 TV list of doctors did not sync for Under notice and All doctors
2022-03-03 TV list of doctors did not sync for Under notice and All doctors
Read with this
2020-05-04 PersonOwner job failed affecting local office filters
2020-05-04 PersonOwner job failed affecting local office filters
More like this
2022-02-09 Error generated by some bad data being returned by the hourly recommendation status check from GMC.
2022-02-09 Error generated by some bad data being returned by the hourly recommendation status check from GMC.
Read with this
2023-07-19 TIS person search list - unable to find some doctors
2023-07-19 TIS person search list - unable to find some doctors
More like this
2021-03-03 Person Search Sync Failed [Unable to parse response body]
2021-03-03 Person Search Sync Failed [Unable to parse response body]
More like this
2021-09-08 Person Placement Employing Body Trust sync job failed affecting Person Search
2021-09-08 Person Placement Employing Body Trust sync job failed affecting Person Search
More like this