Date |
|
Authors | |
Status | In ProgressResolved |
Summary | Person Search Sync Failed |
Impact | Person search page was not showing some data between 0701:35 30 and 0809:00 |
Table of Contents |
---|
Non-technical Description
The overnight sync procedure for TIS was unable to run. This meant only some trainees were being shown on the person search page.We increased the resources for the affected component
Person details to be synchronized are sent in batches/pages, one of the pages contained too much data to be processed correctly.
To resolve this the page size used during the synchronization has been reduced from 8000 to 5000.
...
Trigger
...
...
Detection
The issue was detected when errors were noted in the ‘Sync [Person sync job] started’ Slack notification, and the ‘Sync [Person sync job] finished’ notification failed to appear.
...
...
Resolution
Created a new production elasticsearch cluster to use based on the terraform description (
instance_type = t3.medium.elasticsearch
up frominstance_type = t3.small.elasticsearch
)Reduced the page size from 8000 to 5000 to bring the request size down.Enabled the page size to be set by environmental variable on deploy.
Manually triggered the Person sync job to rebuild the Person elasticsearch index.
...
01:46 - Sync job failed
08:31 - Sync job failed when re-run
08:34 - Fix submitted
08:42 - Fix deployed
08:53 - Sync job completed successfully
...
Root Cause(s)
The nightly sync job failed.
The request size with a page size of 8000 was too large, our instance size limits us to 10mb per request.
...
Action Items
Action Items | Owner |
---|---|
n/a |
...
Lessons Learned
ManConfigurable property allows easier tweaking in future.