Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Date

Authors

Status

In progress

Summary

TIS21-4692 - Getting issue details... STATUS

Impact

Non-technical Description

An issue occurred with an overnight task which meant users were only seeing one trainee in the revalidation app.


Trigger


Detection


Resolution


Timeline

All times in BST unless indicated

  • :

  • :

  • 01:07 - 08:47 : The monitoring channel showed the task was stopping and being replaced.

  • 08:53 : User reported (on Teams) revalidation module is showing one person under notice

  • 09:30 : Stopped the 2-hourly checks of submitted recommendation, shortly after stopped the service temporarily to stop unhelpful logging

  • 09:41 : Moved sync start messages to new queues for debugging

  • 09:43 : Found logging to suggest incident started at 00:05 - around the time of the gmc sync job starting

  • ~ 9:45 : Stopped gmc-client task on prod

  • ~10:00 : Restarted gmc-client task on prod, observed the same debug logs (later appeared to be not relevant), task stopped again.

  • 10:15 : Changed Log level for gmc-client (set to debug) and pushed to preprod

  • 11:30 : Added JAVA_TOOL_OPTIONS in task definition, then updated memory from 512M to 2G. As part of deploying this change, the production issue became an issue for our preprod environment

  • ~ 11:30 Triggered GMC sync again on preprod. Failed due to memory error when making SOAP request to GMC

  • ~12:15 Triggered GMC sync again on preprod after increasing memory allocation, this time it worked

  • ~12:20 Identified separate issue with preprod regarding missing queues, reran jenkins build to restore them

  • ~12:20 Triggered GMC sync again on prod after increasing memory allocation, this time it worked

  • ~12:40 GMC sync appeared healthy on prod and doctors were appearing in connections

Root Cause(s)


Action Items

Action Items

Owner

Small tasks/tidy up:

  • Reset cron schedules

  • Make new (log level) parameters for environment specific


Lessons Learned

  • No labels