Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
In Progress

Date

Authors

Joseph (Pepe) Kelly Reuben Roberts

Status

Resolved

Summary

Person Search Sync Failed

Impact

Person search page was not showing some data between 0701:35 30 and 0809:00

Table of Contents

Non-technical Description

The overnight sync procedure for TIS was unable to run. This meant only some trainees were being shown on the person search page.We increased the resources for the affected component

Person details to be synchronized are sent in batches/pages, one of the pages contained too much data to be processed correctly.
To resolve this the page size used during the synchronization has been reduced from 8000 to 5000.

...

Trigger

...

...

Detection

  • The issue was detected when errors were noted in the ‘Sync [Person sync job] started’ Slack notification, and the ‘Sync [Person sync job] finished’ notification failed to appear.

...

...

Resolution

  • Created a new production elasticsearch cluster to use based on the terraform description (instance_type = t3.medium.elasticsearch up from instance_type = t3.small.elasticsearch)Reduced the page size from 8000 to 5000 to bring the request size down.

  • Enabled the page size to be set by environmental variable on deploy.

  • Manually triggered the Person sync job to rebuild the Person elasticsearch index.

...

Timeline

  • 01:46 - Sync job failed

  • 08:31 - Sync job failed when re-run

  • 08:34 - Fix submitted

  • 08:42 - Fix deployed

  • 08:53 - Sync job completed successfully

...

Root Cause(s)

  • The nightly sync job failed.

  • The request size with a page size of 8000 was too large, our instance size limits us to 10mb per request.

...

Action Items

Action Items

Owner

n/a

...

Lessons Learned

  • ManConfigurable property allows easier tweaking in future.