2020-12-13 Reval Legacy/Old GMC Sync - GMC return code 98, GMC internal error

Date	14 Dec 2020
Authors	Philip Wilsdon (Unlicensed)
Status	In Progress
Summary	`gmc-sync-prod` didnt run - logs reporting `GMC return code 98, GMC internal error`
Impact	User could not manage connections

Non-technical summary

Reval was showing information that had not been updated from GMC, on further investigation it turned out that the jobs that run overnight to get the data from GMC and then add it into TIS had not run successfully. Once they had been fixed and rerun, Reval showed the correct information.

Timeline

14 Dec 2020 09:23 am
09:45	we re-ran gmc sync - returning 0 doctors per designated body
10:00 Ish	investigations found that `gmc-sync-prod` didnt run and that the logs are showing the error `GMC return code 98, GMC internal error` 2020-12-13 00:00:41.557 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : Get doctors for designated body 1-AIIDWA 2020-12-13 00:00:41.557 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : GMC Connect Url 2020-12-13 00:00:44.263 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcDoctorsService : 2982 Doctors has been found for 1-AIIDWA body 2020-12-13 00:00:44.756 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : Get doctors for designated body 1-AIIDVS 2020-12-13 00:00:44.756 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : GMC Connect Url 2020-12-13 00:00:48.562 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcDoctorsService : 4340 Doctors has been found for 1-AIIDVS body 2020-12-13 00:00:49.153 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : Get doctors for designated body 1-AIIDWI 2020-12-13 00:00:49.154 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : GMC Connect Url 2020-12-13 00:00:52.752 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcDoctorsService : 4022 Doctors has been found for 1-AIIDWI body 2020-12-13 00:00:52.847 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : Get doctors for designated body 1-AIIDSI 2020-12-13 00:00:52.847 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : GMC Connect Url 2020-12-13 00:00:55.954 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcDoctorsService : 3107 Doctors has been found for 1-AIIDSI body 2020-12-14 00:00:00.000 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcDoctorsService : Fetching doctors info for Designated Bodies [1-AIIDHJ, 1-AIIDMQ, 1-AIIDNQ, 1-AIIDMY, 1-AIIDQQ, 1-AIIDWT, 1-AIIDR8, 1-AIIDSA, 1-AIIDH1, 1-AIIDWA, 1-AIIDVS, 1-AIIDWI, 1-AIIDSI] 2020-12-14 00:00:00.001 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : Get doctors for designated body 1-AIIDHJ 2020-12-14 00:00:00.001 INFO 1 --- [ scheduling-1] u.n.h.t.g.c.service.GmcConnectService : GMC Connect Url 2020-12-14 00:00:00.103 ERROR 1 --- [ scheduling-1] o.s.s.s.TaskUtils$LoggingErrorHandler : Unexpected error occurred in scheduled task uk.nhs.hee.tis.gmc.client.exception.GMCResponseException: GMC return code 98, GMC internal error
10:15 AM	Emailed GMC
12:06	More correspondance with the GMC as they asked for calrifcation on envrionments
05:17 PM	Clarifcation of logs
7 PM	Logs from legacy/old production envrionment sent to GMC
15 Dec 2020 Reply from GMC	They patched their survers on the 13 Dec 2020 evening
10:01	Triggered off old reval New reval - gmc sync completed Suspect that GMC applied fixes early AM
10:03	gmc-sync ran sucsessfully and intredpid-reval-etl-all-prod run sucessfully
10:18	User confirmed everything looks ok
10:19	replied to GMC to let them know that both sync jobs on legacy and new reval ran successfully

Root Causes

Trigger

A user reported in Teams Support Channel that their connections had not been working correctly

Resolution

R

Detection

A user reported in Teams Support Channel

Actions

For new reval - we need add monitoring so we know if the sync job to get the data from GMC and if we build any ETL/Transformation service are run successfully or fail - we could use this ticket https://hee-tis.atlassian.net/browse/TISNEW-3264
Decision taken not to address the monitoring in the current reval application as the new one is pretty close to being live (December 2020)

Lessons Learned (Good and Bad)

Still limited knowledge within the existing teams about how the existing module works (which is why the rebuild is taking place)
Current monitoring requires more investment to get it to work more reliably - problems with set to fail on the first occurrence and alerting would need to be written into the individual apps rather than checking of logs
Jobs need to be started via Jenkins
check the jenkins jobs for 1. what they do, and 2. what the logs for that run said.