...
:
:
01:07 - 08:47 : The monitoring channel showed the task was stopping and being replaced.
08:53 : User reported (on Teams) revalidation module is showing one person under notice
09:30 : Stopped the 2-hourly checks of submitted recommendation, shortly after stopped the service temporarily to stop unhelpful logging
09:41 : Moved sync start messages to new queues for debugging
09:43 : Found logging to suggest incident started at 00:05 - around the time of the gmc sync job starting
~ 9:45 : Stopped gmc-client task on prod
~10:00 : Restarted gmc-client task on prod, observed the same issuedebug logs (later appeared to be not relevant), task stopped again.
10:15 : Changed Log level for gmc-client (set to debug) and pushed to preprod
11:30 : Added
JAVA_TOOL_OPTIONS
in task definition, then updated memory from 512M to 2G. As part of deploying this change, the production issue became an issue for our preprod environment12 ~ 11:10 30 Triggered GMC sync again on preprod. Failed due to memory error when making SOAP request to GMC
~12:15 Triggered GMC sync again on preprod after increasing memory allocation, this time it worked
~12:20 Identified separate issue with preprod regarding missing queues, reran jenkins build to restore them
~12:20 Triggered GMC sync again on prod after increasing memory allocation, this time it worked
~12:40 GMC sync appeared healthy on prod and doctors were appearing in connections
Root Cause(s)
...
Action Items
Action Items | Owner | |
---|---|---|
Small tasks/tidy up:
|
...