2021-06-29 Trainee load balancers are internet facing
Date | Jun 29, 2021 |
Authors | @Andy Dingley |
Status | Done |
Summary | A misconfiguration in the trainee self-service infrastructure theoretically allowed unauthorized requests for data |
Impact | Trainee data was exposed to unauthorized access |
Non-technical Description
What is a load balancer?
A load balancer acts like a “traffic cop” ensuring that connections are distributed evenly between all available resources, it helps avoid overloading a single part of the system while others are under-utilised.
An every day life example of this sort of thing is when you reach the front of a single queue at the Post Office, and are then directed to the free Post Office employee, rather than the busy one.
What does “internet facing” mean?
In the context of our load balancers there are two types of visibility
Internal: the load balancer can only receive connections from within our infrastructure
Internet Facing: the load balancer can receive connections from anyone/anywhere
Internet facing load balancers are assigned an address, like a URL, which can be used to connect to it directly. It is not unusual for a load balance to be public, but in such cases they would have other techniques to authentication/authorize requests.
What was the issue?
The normal expected process of making a request follows these steps
The user makes a request to the API Gateway
The API Gateway checks that the user is logged in and authorized to perform that action using a product called Cognito
If authorized, the request is sent to the load balancer
The load balancer forwards the request to one of the service instances to perform the requested actions
However, due to the internet facing load balancer it was theoretically possible to bypass the API Gateway and, as a result, bypass the checks against Cognito for a user being logged in and authorized.
It was spotted internally and fixed quickly (within 24 hours). It wasn’t found during recent external penetration testing. And we have no reason to believe the vulnerability has been exploited. However, it is possible (unlikely, but not impossible) that, armed with the right knowledge (see below) someone could have made a request against TISSS to retrieve the data for any of the trainees we hold data for.
How likely is it someone was able to use this?
It’s hard to say without further analysis whether any unauthorized access actually occurred, but we can look at the combination of information an attacker would have needed.
Awareness of TISSS (public knowledge)
Understanding of how our services handle requests (public knowledge)
Knowing that we used a load balancer in this way (private knowledge)
Knowledge that our load balancer was internet facing (private knowledge)
Knowing the random address of the load balancer (private knowledge)
Knowing the ID of a target trainee (private knowledge) or trial and error/mass requests (no knowledge needed)
While the inner workings of our services are public knowledge we do not currently publish anything about our infrastructure. This would have made it very difficult for a TISSS targeted attack to take advantage of this issue and would have also made it difficult to trace the load balancer back to TISSS (and how it works) were an attacker to find the public load balancer directly.
Next steps?
Firstly, the identified issue has been fixed, the load balancer is now private and requests must go through the expected path.
Some work will be undertaken to try and identify whether there was any unauthorized access.
Trigger
Detected by member of the TIS Team
Detection
Existing infrastructure reviewed by a member of the TIS team in reaction to adjacent changes
Resolution
The load balancers were redeployed with the correct configuration
Timeline
Jun 29, 2021: 15:45 BST - Issue spotted by TIS team member and discussed between the lead developers to clarify whether it was an issue as suspected
Jun 29, 2021: 15:57 BST - Ops team brought in to the loop
Jun 29, 2021: 16:13 BST - Identified that the TISSS load balancer was using an outdated template
Jun 29, 2021: 16:23 BST - Fix ready
Jun 29, 2021: 16:44 BST - Notification sent on Teams of expected downtime to apply the fix
Jun 29, 2021: 16:54 BST - Attempt to apply the fix failed due to restricted ability to delete the existing resources
Jun 30, 2021: 10:49 BST - Started trying alternative techniques to apply the infrastructure changes
Jun 30, 2021: 12:11 BST - Fix deployed to pre-prod
Jun 30, 2021: 12:38 BST - Fix deployed to prod
Root Cause(s)
TISSS load balancers were public facing
The template was set to public facing at the time the infrastructure was created
The configuration the template was extracted from used public facing application load balancers
Public facing Application Load Balancers were originally used, instead of the current API Gateway (public) > Network Load Balancer (private) infrastructure
When the load balancer type was changed the visibility was not updated in the template (it has been done since)
Action Items
Action Items | Owner | Status |
---|---|---|
Investigate whether any unauthorized access occurred | Please see ticket for details of the investigation that was conducted. No clear evidence of unauthorised access was found. | |
Investigate how to restructure terraform modules to streamline future changes |
|
Lessons Learned
Double check all new infrastructure configuration (measure twice, cut once) and always update Terraform to reflect subsequent changes.
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213