Date |
|
Authors | |
Status | In ProgressCompleted |
Summary |
|
Impact | User could not manage connections |
Table of Contents |
---|
Non-technical summary
Reval was showing information that had not been updated from GMC, on further investigation it turned out that the jobs that run overnight to get the data from GMC and then add it into TIS had not run successfully. Once they had been fixed and rerun, Reval showed the correct information.
Timeline
09:23 am | |||
09:45 | we re-ran gmc sync - returning 0 doctors per designated body | ||
10:00 Ish | investigations found that
| ||
10:15 AM | Emailed GMC | ||
12:06 | More correspondance with the GMC as they asked for calrifcation on envrionments | ||
05:17 PM | Clarifcation of logs | ||
7 PM | Logs from legacy/old production envrionment sent to GMC | ||
Reply from GMC | They patched their survers on the evening | ||
10:01 | Triggered off old reval New reval - gmc sync completed Suspect that GMC applied fixes early AM | ||
10:03 | gmc-sync ran sucsessfully and intredpid-reval-etl-all-prod run sucessfully | ||
10:18 | User confirmed everything looks ok | ||
10:19 | replied to GMC to let them know that both sync jobs on legacy and new reval ran successfully | ||
10:23 |
Root Causes
GMC server patching on
Trigger
A user reported in Teams Support Channel that their connections had not been working correctly
Resolution
- R
Resolution
GMC restarted their API servers on the morning of the
TIS team ran GMC sync jobs for legacy and new reval
Detection
A user reported in Teams Support Channel
Actions
For new reval - we need add monitoring so we know if the sync job to get the data from GMC and if we build any ETL/Transformation service are run successfully or fail - we could use this ticket https://hee-tis.atlassian.net/browse/TISNEW-3264
Decision taken not to address the monitoring in the current reval application as the new one is pretty close to being live (December 2020)GMC said they will add monitoring
TIS team will also add monitoring and slack notifications for new reval
Lessons Learned (Good and Bad)
Still limited knowledge within the existing teams Education for GMC about how the existing module works (which is why the rebuild is taking place)
Current monitoring requires more investment to get it to work more reliably - problems with set to fail on the first occurrence and alerting would need to be written into the individual apps rather than checking of logs
Jobs need to be started via Jenkins
check the jenkins jobs for 1. what they do, and 2. what the logs for that run said.
we retrieve the data by designated bodies