Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
In Progress

Date

Authors

Philip Wilsdon (Unlicensed)

Status

Completed

Summary

  • GMC Sync failed overnight

  • Reval users informed us they are not able to revalidate trainees

  • Checked logs - access forbidden to the GMC

  • Contacted GMC - they made some changes on Sunday (moving their services behind cloudflare) which caused the issue

  • Eventually we worked out it was the user agent being sent with the request that was being blocked by GMC cloudflare

  • Doesnt effect new reval as its update to date and therefore not being blocked by GMC Cloudflare

Impact

  • Reval data out of date

  • due to the GMC adding Cloudflare into their infrastructure setup on the

  • As part of this implementation, they blocked our requests based on rules they had setup - blocking based on the user agent.

  • Rule disabled until new reval module is live

Impact

  • Users unable to revalidate and manage connections from Monday morning to the morning of (however, they could still use GMC connect)

Table of Contents

Non-technical summary

  • GMC added a new security platform

  • The platform allows them to block incoming requests

  • They blocked ours but then disabled it

  • They will not block requests anymore until the new reval is live

  • Will not be an issue with new reval - it is written in a more up to date language version and is not blocked by the GMC filtering criteria

Timeline

08:08

Image RemovedImage Added

08:30

Created ticket and incident page

2021-03-01 Legacy Reval: unable to submit revalidation - User Agent/Cloud Flare Cloudflare GMC Issue

09:15

Rerun failed with the same error… Forbidden (HTTP 403)

09:33

Pinged users to let them know we have contacted GMC to check the issue

09:34 AM

Emailed GMC

10:43

Replied to GMC to confirm its the productionProduction/LIVE Envrionment Live environment and not our new Reval module

Image Modified

11:47

GMC have made some changes over the weekend, they are looking into it

16:10

Email to chase GMC

16:31

- 16:53

Confirming with the GMC what IPs we are hitting

17:50 PM

Email back from GMC

10:55

We found that with the new reval module, no issues, we are able to get data

However, the issue is with legacy/existing.

Need to check if the IPs/Servers that legacy reval runs on are still whitelisted

11:01 AM

11:46

Cheking Checking if authentication error

Image Modified

16:10

We confused the GMC

- 11:17

Call to investigate further - more fault analysis done,

  • review errors and look at gmc-sync repo to see where the problem might be

Some Findings

  • 403 in place in Prod as it is an authentication issue in our side.

  • We are able to curl the GMC end points which shows that the credentials are correct but the Java code is not able to get them due to some restrictions (?) in our Prod2.

  • We know 99 error in stage as the IPs are not white listed by GMC for our stage env3.

  • We want to trace logging in Java code so that we can see the credentials are fetched correctly

- 14:00

  • Call to review draft PR created to check authentication issues - what are GMC sending us - what response

  • 1st of the month and timings - nothing has changed for 6 months so why a change now - version updates?

- 14:11

  • tried to update java base - didnt didn't do anything

- 14:41

  • Updated PR waiting for review

- 15:17

  • PR reviewed

  • Ran gmc-sync-prod and it failed as expected

  • Investigating now with the extra logging

- 15:30

  • its getting the correct username and password but still failing

  • Before it gets to the gmc return code it throws exception

  • The app is failing in the SOAP api API call

- 16:33

Email to GMC

- 16:38

So just to sum up our thought process:

  • We can CURL Prod (Authentication is fine)

  • There have been no significant recent changes to the codebase

  • The Java app appears to be building the request correctly

  • The Java app appears to be using the correct credentials

  • We are still receiving 403 from GMC endpoint

Conclusion: most likely cause is a permissions error (Authorization issue) internally on GMC side

- 16:42

- 16:50

Updated users on teams with our conclusion

- 17:06

Clarification of IPs

- 05:47

Image RemovedImage Added

- 22:02

Email from the GMC - they have been able to re-create the 403 error

- 22:48 to 23:19

  • Cloudflare (now being used by the GMC) - Cloudflare Browser Integrity Check seems to block java Java 8 user agents by default.

  • GMC-Sync with java Java 9 and java Java 11 didnt didn't work (no surprise there)

  • Changed back to java Java 8 and its it's working now

- 00:05

  • 00:05 cron run of the GMC-Sync ran successfully, so after the intrepid etl ETL runs at 01:00 Reval should be working again

- 08:02

  • Slim-buster image (presumably with a later update of java8Java 8) - we don’t think the GMC modified any the rules in cloudflareCloudflare.

  • Using a recent maintained docker base image (as long as it works with the GMC config) - that would be using the slim-buster image.

- 08:08

  • User reports fixed

- 09:59

Trying to get some clarification from the GMC relating to the user agent

- 11:26

GMC disabled the security feature relating to the user agent so thats why it worked

We need to update the user agent

- 13:05

10:06

Chasing GMC - No reply regarding User agent / is there filtering back on?

14:18

Reply from GMC and some thoughts

  • What the LTS status of those two Java 8 versions

  • Assuming they're in LTS, are the GMC willing and able to tweak their Cloudflare config to allow in future?

  • New reval is Java 11, So just keep the filtering disabled until we switch over is an option?

Image Added

15:37

  • Java 8 has LTS until 2030 so GMC are blocking/filtering something that is supported

  • We don’t want to spend effort on the existing module

  • New module in Java 11 won’t have an issue so requested GMC just leave the filter then enabled for now turned off

Image Added

16:33

  • GMC ok to disable the Cloudflare filter until new reval developed

Image Added

Root Causes

  • Job failed

  • Request to “Get to GMC Doctors From GMC API” 403

  • The GMC Contacted the GMC (peter.mcnair@gmc-uk.org) - they moved the API behind some additional security over on the weekend - cloud flare - a Cloudflare CND, DDoS Platform

  • Requests were being sent with a user agent blocked by Cloudflare

Trigger

  • gmc-sync-prod alert in monitoring channel

  • A user reported in Teams Support Channel and slack message on Monday AM

  • Image Removed

Resolution

  • Sending a user agent with the request that is not blocked by GMCwere being blocked based on filtering rules they had set up - blocking the user agent

Trigger

  • GMC introducing Cloudflare which blocked our calls to their API

Resolution

  • GMC has turned off the filtering and will leave it disabled until the new reval module is deployed

Detection

  • Slack monitoring and user report in Teams Support Channel

Actions

  • New reval will replace, need to keep it up to date

  • Confirm with GMC if they have allowed our user agent or is this going to happen again?Inform GMC when only new reval module is live so

Lessons Learned (Good and Bad)

  • Cloudflare blocks certain adds additional security e.g. block requests based on certain rules such as user agents

  • Try and get updates from GMC when they are doing maintenance

  • Don’t have legacy applicationsGMC do server upgrades/maintenance etc on Sundays/over the weekend