Date |
|
Authors |
|
Status | In progress |
Summary |
|
Impact |
An issue occurred with an overnight task which meant users were only seeing one trainee in the revalidation app.
All times in BST unless indicated
:
:
01:07 - 08:47 : The monitoring channel showed the task was stopping and being replaced.
08:53 : User reported (on Teams) revalidation module is showing one person under notice
09:30 : Stopped the 2-hourly checks of submitted recommendation, shortly after stopped the service temporarily to stop unhelpful logging
09:41 : Moved sync start messages to new queues for debugging
09:43 : Found logging to suggest incident started at 00:05 - around the time of the gmc sync job starting
~ 9:45 : Stopped gmc-client task on prod
~10:00 : Restarted gmc-client task on prod, observed the same debug logs (later appeared to be not relevant), task stopped again.
10:15 : Changed Log level for gmc-client (set to debug) and pushed to preprod
11:30 : Added JAVA_TOOL_OPTIONS
in task definition, then updated memory from 512M to 2G. As part of deploying this change, the production issue became an issue for our preprod environment
~ 11:30 Triggered GMC sync again on preprod. Failed due to memory error when making SOAP request to GMC
~12:15 Triggered GMC sync again on preprod after increasing memory allocation, this time it worked
~12:20 Identified separate issue with preprod regarding missing queues, reran jenkins build to restore them
~12:20 Triggered GMC sync again on prod after increasing memory allocation, this time it worked
~12:40 GMC sync appeared healthy on prod and doctors were appearing in connections
Action Items | Owner | |
---|---|---|
Small tasks/tidy up:
|