AWS Standards

Friday 17th April was the day we came together to brainstorm at a high level what sort of standards we want for AWS in terms of networking, security, managed applications etc. This would then form the foundations of what would be knowledge to build infrastructure for the migration. It also served to share knowledge of what AWS would be like in comparison to Azure.

Networking

The first thing we’ve drawn out is what our standard structure of what a VPC (Virtual private cloud) will look like, we did this as a first step as a VPC is the container for all things.

Description:

When defining a VPC, would should define a network CIDR block as 172.0.x.x/16. 172 range was chosen as there would be no conflict with any existing HEE/NHS infra if we needed connect them. /16 would give us more than 64k IP addresses per VPC, more than plenty
Regions - there is a limit to these (5 VPC’s per region) but it can be increased. We are to target the EU-West2 London region to ensure any data is kept within the confines of the UK so that we keep within regulations
AZ (Availability Zones) These are locations within the selected region analogous to datacenters and we’ve chosen to use all of the available zones a/b/c within the London region
Both public and private subnets will be defined for each AZ, with the private subnets linked to NAT gateways in the public subnet via routing tables so that they have access to the internet but inbound connections from the internet is not possible
Public subnets are also accessible from the internet through an internet gateway attached via another routing table
Multiple security groups and NACL’s will need to be defined and chained

Notes/Questions:

The IP ranges for the subnets will need to be further discussed to ensure there is clarity in what is public/private and elasticity
Load balancers will need to be defined perhaps in a lower level diagram
Jumpbox/Bastion hosts may not be needed if we have the tools/monitoring

Security

There are many security considerations when working with AWS or any kind of infrastructure but the main benefit of using a cloud provider like AWS is that security is “baked in” and that it should provide features to make it easier.

An additional session was done on 22nd April to try a catalogue what security concerns/standards we would like in place for the migration.

Here are some of the high level things we’d want in terms of security

General
- Reject by default - start with the least amount of permissions across the board (users, machines, services, access)
- There should be no need for services to send traffic externally, this should be locked down
- NACL’s to provide a second layer of security to ensure many mistakes with security groups don’t leave service wide open
- Domain certificates to be per subdomain rather than wildcards. This will allow finer grain control to allow things like revokes if there were a breach
Encryption
- In transit - this is relation to network traffic
- At rest - when data is stored on disk (either as files or in a form of a database)
- Network traffic within the VPC must be encrypted
- Inbound network traffic on HTTP must be redirected to use HTTPS
Security groups
- These behave like firewalls which limit (reject) certain types of traffic - can be attached to many types of resources
- Many finely grained groups, named appropriately and not shared between VPC’s
- Chained where it makes sense
WAF (Web application firewall)
- Where it makes sense, to be placed in front of any application that receives traffic sourced externally
Controlled networking routes
- Resources can only be reached via certain routes
- No public IP’s
- Applications must be stored in privates subnets with no direct accessible route from outside the VPC
Network traffic
- Subnet to subnet traffic within a VPC will always allow to flow
- VPC to VPC traffic would be heavily discouraged unless there was a valid reason and no other choice
- VPC to on-prem/other cloud provider communication can be done via IPSEC VPN & BGP. This however is a large under taking and there may be better alternatives

Application

The applications we develop and deploy fit roughly in 3 groups

Front end applications, publicly available, typically served via a web server (in Azure)
Back end applications that are public accessible via HTTP once a user is authenticated
Back end applications that don't require public accessibility but run to serve other applications or do data transformation

With these broad types of applications, we need to find a suitable deployment strategy for each and seeing as this migration is greenfield, all is up for grabs.

Some investigations have already been kicked off around ECS and lambda but like previous sessions we should try and standardise before we invest into any one approach

Infrastructure as code

Like all other forms of infrastructure, AWS is prone to configuration drift (where configuration differ’s from environment to environment) and flakey highly customised unique resources that are treated like pets (See: The history of Pets and Cattle). To circumnavigate these issues, we already use tools such as Terraform and Ansible to build and manage infrastructure as well as configure them in a uniform way.

The TIS-OPS Github repository contains the source code for managing the infrastructure in AWS and conforms to a basic format

The two folders under terraform are what is of interest here

environments

Contain a number of sub folders that represent environments within AWS, within these environment folders consist of subsections for which the TIS team management, such as TIS, Revalidation and Trainee self service. Beneath these are services and networking configurations that have been applied to their respective areas e.g. VPC code

This folder also give an area for developers to create and experiment in their own area such that they wont conflict with existing infra or other developers (the local folder)

modules

This will contain the HEE developed terraform modules are allow developers to spin up whole environments / services with minimal copying and pasting while allowing to create resources in a standard manner