Date	22 Jun 2023
Authors
Status	In progress
Summary	TIS21-4692 - Getting issue details... STATUS
Impact

Non-technical Description

An issue occurred with an overnight task which meant users were only seeing one trainee in the revalidation app.

Trigger

As part of the overnight sync job, on calling the GMC’s SOAP endpoint GetDoctorsForDB our service gmc-client-service experienced an out of memory error and crashed
The message to trigger the sync job remained queued, and presumably kept re-triggering the error every time ECS spun up a new task

Detection

Resolution

Timeline

All times in BST unless indicated

22 Jun 2023 : 00:05 gmc-client-service crashes attempting to run the overnight sync job due to a lack of memory
22 Jun 2023 01:07 - 08:47 : The monitoring channel showed the task was stopping and being replaced.
22 Jun 2023 08:53 : User reported (on Teams) revalidation module is showing one person under notice
22 Jun 2023 09:30 : Stopped the 2-hourly checks of submitted recommendation, shortly after stopped the service temporarily to stop unhelpful logging
22 Jun 2023 09:41 : Moved sync start messages to new queues for debugging
22 Jun 2023 09:43 : Found logging to suggest incident started at 00:05 - around the time of the gmc sync job starting
22 Jun 2023 ~ 9:45 : Stopped gmc-client task on prod
22 Jun 2023 ~10:00 : Restarted gmc-client task on prod, observed the same debug logs (later appeared to be not relevant), task stopped again.
22 Jun 2023 10:15 : Changed Log level for gmc-client (set to debug) and pushed to preprod
22 Jun 2023 11:30 : Added JAVA_TOOL_OPTIONS in task definition, then updated memory from 512M to 2G. As part of deploying this change, the production issue became an issue for our preprod environment
22 Jun 2023 ~ 11:30 Triggered GMC sync again on preprod. Failed due to memory error when making SOAP request to GMC
22 Jun 2023 ~12:15 Triggered GMC sync again on preprod after increasing memory allocation, this time it worked
22 Jun 2023 ~12:20 Identified separate issue with preprod regarding missing queues, reran jenkins build to restore them
22 Jun 2023 ~12:20 Triggered GMC sync again on prod after increasing memory allocation, this time it worked
22 Jun 2023 ~12:40 GMC sync appeared healthy on prod and doctors were appearing in connections

Root Cause(s)

Sudden inability to handle response from GMC’s GetDoctorsForDB SOAP endpoint apparently due to a lack of memory

Action Items

Action Items

Owner

Contact GMC to verify nothing had changed with their endpoints (unlikely? but worth checking)

Small tasks/tidy up:

Reset cron schedules
Make new (log level) parameters for environment specific

Non-technical Description

Trigger

Detection

Resolution

Timeline

Root Cause(s)

Action Items

Lessons Learned

0 Comments

2023-06-22 GMC Client failed and not able to start

Non-technical Description

Trigger

Detection

Resolution

Timeline

Root Cause(s)

Action Items

Lessons Learned

0 Comments