2021-06-29 Trainee load balancers are internet facing

Date

Jun 29, 2021

Authors

@Andy Dingley

Status

Done

Summary

A misconfiguration in the trainee self-service infrastructure theoretically allowed unauthorized requests for data

Impact

Trainee data was exposed to unauthorized access

Non-technical Description

What is a load balancer?

A load balancer acts like a “traffic cop” ensuring that connections are distributed evenly between all available resources, it helps avoid overloading a single part of the system while others are under-utilised.

An every day life example of this sort of thing is when you reach the front of a single queue at the Post Office, and are then directed to the free Post Office employee, rather than the busy one.

What does “internet facing” mean?

In the context of our load balancers there are two types of visibility

Internal: the load balancer can only receive connections from within our infrastructure

Internet Facing: the load balancer can receive connections from anyone/anywhere

Internet facing load balancers are assigned an address, like a URL, which can be used to connect to it directly. It is not unusual for a load balance to be public, but in such cases they would have other techniques to authentication/authorize requests.

What was the issue?

The normal expected process of making a request follows these steps

  1. The user makes a request to the API Gateway

  2. The API Gateway checks that the user is logged in and authorized to perform that action using a product called Cognito

  3. If authorized, the request is sent to the load balancer

  4. The load balancer forwards the request to one of the service instances to perform the requested actions

However, due to the internet facing load balancer it was theoretically possible to bypass the API Gateway and, as a result, bypass the checks against Cognito for a user being logged in and authorized.

It was spotted internally and fixed quickly (within 24 hours). It wasn’t found during recent external penetration testing. And we have no reason to believe the vulnerability has been exploited. However, it is possible (unlikely, but not impossible) that, armed with the right knowledge (see below) someone could have made a request against TISSS to retrieve the data for any of the trainees we hold data for.

How likely is it someone was able to use this?

It’s hard to say without further analysis whether any unauthorized access actually occurred, but we can look at the combination of information an attacker would have needed.

  1. Awareness of TISSS (public knowledge)

  2. Understanding of how our services handle requests (public knowledge)

  3. Knowing that we used a load balancer in this way (private knowledge)

  4. Knowledge that our load balancer was internet facing (private knowledge)

  5. Knowing the random address of the load balancer (private knowledge)

  6. Knowing the ID of a target trainee (private knowledge) or trial and error/mass requests (no knowledge needed)

While the inner workings of our services are public knowledge we do not currently publish anything about our infrastructure. This would have made it very difficult for a TISSS targeted attack to take advantage of this issue and would have also made it difficult to trace the load balancer back to TISSS (and how it works) were an attacker to find the public load balancer directly.

Next steps?

Firstly, the identified issue has been fixed, the load balancer is now private and requests must go through the expected path.

Some work will be undertaken to try and identify whether there was any unauthorized access.


Trigger

  • Detected by member of the TIS Team


Detection

  • Existing infrastructure reviewed by a member of the TIS team in reaction to adjacent changes


Resolution

  • The load balancers were redeployed with the correct configuration


Timeline

  • Jun 29, 2021: 15:45 BST - Issue spotted by TIS team member and discussed between the lead developers to clarify whether it was an issue as suspected

  • Jun 29, 2021: 15:57 BST - Ops team brought in to the loop

  • Jun 29, 2021: 16:13 BST - Identified that the TISSS load balancer was using an outdated template

  • Jun 29, 2021: 16:23 BST - Fix ready

  • Jun 29, 2021: 16:44 BST - Notification sent on Teams of expected downtime to apply the fix

  • Jun 29, 2021: 16:54 BST - Attempt to apply the fix failed due to restricted ability to delete the existing resources

  • Jun 30, 2021: 10:49 BST - Started trying alternative techniques to apply the infrastructure changes

  • Jun 30, 2021: 12:11 BST - Fix deployed to pre-prod

  • Jun 30, 2021: 12:38 BST - Fix deployed to prod

Root Cause(s)

  • TISSS load balancers were public facing

  • The template was set to public facing at the time the infrastructure was created

  • The configuration the template was extracted from used public facing application load balancers

  • Public facing Application Load Balancers were originally used, instead of the current API Gateway (public) > Network Load Balancer (private) infrastructure

  • When the load balancer type was changed the visibility was not updated in the template (it has been done since)

Action Items

Action Items

Owner

Status

Action Items

Owner

Status

Investigate whether any unauthorized access occurred

https://hee-tis.atlassian.net/browse/TIS21-1816

Please see ticket for details of the investigation that was conducted. No clear evidence of unauthorised access was found.

Investigate how to restructure terraform modules to streamline future changes

https://hee-tis.atlassian.net/browse/TIS21-1820

 


Lessons Learned

Double check all new infrastructure configuration (measure twice, cut once) and always update Terraform to reflect subsequent changes.