AWS Standards

Friday 17th April was the day we came together to brainstorm at a high level what sort of standards we want for AWS in terms of networking, security, managed applications etc. This would then form the foundations of what would be knowledge to build infrastructure for the migration. It also served to share knowledge of what AWS would be like in comparison to Azure.

Networking

The first thing we’ve drawn out is what our standard structure of what a VPC (Virtual private cloud) will look like, we did this as a first step as a VPC is the container for all things.

Description:

  • When defining a VPC, would should define a network CIDR block as 172.0.x.x/16. 172 range was chosen as there would be no conflict with any existing HEE/NHS infra if we needed connect them. /16 would give us more than 64k IP addresses per VPC, more than plenty

  • Regions - there is a limit to these (5 VPC’s per region) but it can be increased. We are to target the EU-West2 London region to ensure any data is kept within the confines of the UK so that we keep within regulations

  • AZ (Availability Zones) These are locations within the selected region analogous to datacenters and we’ve chosen to use all of the available zones a/b/c within the London region

  • Both public and private subnets will be defined for each AZ, with the private subnets linked to NAT gateways in the public subnet via routing tables so that they have access to the internet but inbound connections from the internet is not possible

  • Public subnets are also accessible from the internet through an internet gateway attached via another routing table

  • Multiple security groups and NACL’s will need to be defined and chained

Notes/Questions:

  • The IP ranges for the subnets will need to be further discussed to ensure there is clarity in what is public/private and elasticity

  • Load balancers will need to be defined perhaps in a lower level diagram

  • Jumpbox/Bastion hosts may not be needed if we have the tools/monitoring

Security

There are many security considerations when working with AWS or any kind of infrastructure but the main benefit of using a cloud provider like AWS is that security is “baked in” and that it should provide features to make it easier.

An additional session was done on 22nd April to try a catalogue what security concerns/standards we would like in place for the migration.

Here are some of the high level things we’d want in terms of security

  • General

    • Reject by default - start with the least amount of permissions across the board (users, machines, services, access)

    • There should be no need for services to send traffic externally, this should be locked down

    • NACL’s to provide a second layer of security to ensure many mistakes with security groups don’t leave service wide open

    • Domain certificates to be per subdomain rather than wildcards. This will allow finer grain control to allow things like revokes if there were a breach

  • Encryption

    • In transit - this is relation to network traffic

    • At rest - when data is stored on disk (either as files or in a form of a database)

    • Network traffic within the VPC must be encrypted

    • Inbound network traffic on HTTP must be redirected to use HTTPS

  • Security groups

    • These behave like firewalls which limit (reject) certain types of traffic - can be attached to many types of resources

    • Many finely grained groups, named appropriately and not shared between VPC’s

    • Chained where it makes sense

  • WAF (Web application firewall)

    • Where it makes sense, to be placed in front of any application that receives traffic sourced externally

  • Controlled networking routes

    • Resources can only be reached via certain routes

    • No public IP’s

    • Applications must be stored in privates subnets with no direct accessible route from outside the VPC

  • Network traffic

    • Subnet to subnet traffic within a VPC will always allow to flow

    • VPC to VPC traffic would be heavily discouraged unless there was a valid reason and no other choice

    • VPC to on-prem/other cloud provider communication can be done via IPSEC VPN & BGP. This however is a large under taking and there may be better alternatives

Application

The applications we develop and deploy fit roughly in 3 groups

  • Front end applications, publicly available, typically served via a web server (in Azure)

  • Back end applications that are public accessible via HTTP once a user is authenticated

  • Back end applications that don't require public accessibility but run to serve other applications or do data transformation

With these broad types of applications, we need to find a suitable deployment strategy for each and seeing as this migration is greenfield, all is up for grabs.

Some investigations have already been kicked off around ECS and lambda but like previous sessions we should try and standardise before we invest into any one approach

Like the previous sessions, we’ve had another meeting to discuss the possible solutions for these broad types of applications. Here’s what we’ve come up with so far:

  • Frontend applications

    • S3

    • ECS

    • Cloudfront

    • VMs (EC2)

    • EKS

  • Backend applications

    • ECS

    • VM

    • EKS

    • Lambda

    • Lightsail / Elastic Beanstalk

  • Backend applications - non public accessible

    • Just like the BE list above

With this list, we went through a process of elimination to remove products that we thought are not suitable or not possible due to the time constraints or skills within the team. With that in mind, the following was removed for the following reasons:

  • Cloud front: requires a backing service to initially serve the content

  • VM’s: the current solution, requires too much moving parts and requires a lot of time and management

  • EKS: managed kubernetes cluster, although good we don’t have to skills in the team. It’s also possible “overkill” for the service we provide

  • LightSail/Elastic Beanstalk: to simplistic for our needs, would be a good usecase for basic landing page like products

So that leaves us with: S3 + ECS for the frontend and ECS + Lambda for the backend. We don’t know what is the best choice for each type so spikes will be created to investigate each option

Infrastructure as code

Like all other forms of infrastructure, AWS is prone to configuration drift (where configuration differ’s from environment to environment) and flakey highly customised unique resources that are treated like pets (See: The history of Pets and Cattle). To circumnavigate these issues, we already use tools such as Terraform and Ansible to build and manage infrastructure as well as configure them in a uniform way.

The TIS-OPS Github repository contains the source code for managing the infrastructure in AWS and conforms to a basic format

The two folders under terraform are what is of interest here

environments

Contain a number of sub folders that represent environments within AWS, within these environment folders consist of subsections for which the TIS team management, such as TIS, Revalidation and Trainee self service. Beneath these are services and networking configurations that have been applied to their respective areas e.g. VPC code

This folder also give an area for developers to create and experiment in their own area such that they wont conflict with existing infra or other developers (the local folder)

modules

This will contain the HEE developed terraform modules are allow developers to spin up whole environments / services with minimal copying and pasting while allowing to create resources in a standard manner

 

There will be times where creating infrastructure as code won’t be possible as constraints such as time or knowledge may force a developer to experiment using the AWS web console. This is ok as long as the developer takes those changes/learning and brings it back into code so that other can reuse and share

 

 

 

Other

With all this now “down on paper” we now have some guidance of where to do next. We’ll need to do several spikes and create a large set of tickets to enable the migration to AWS