AWS Standards
Friday 17th April was the day we came together to brainstorm at a high level what sort of standards we want for AWS in terms of networking, security, managed applications etc. This would then form the foundations of what would be knowledge to build infrastructure for the migration. It also served to share knowledge of what AWS would be like in comparison to Azure.
Networking
The first thing we’ve drawn out is what our standard structure of what a VPC (Virtual private cloud) will look like, we did this as a first step as a VPC is the container for all things.
Description:
When defining a VPC, would should define a network CIDR block as 172.0.x.x/16. 172 range was chosen as there would be no conflict with any existing HEE/NHS infra if we needed connect them. /16 would give us more than 64k IP addresses per VPC, more than plenty
Regions - there is a limit to these (5 VPC’s per region) but it can be increased. We are to target the EU-West2 London region to ensure any data is kept within the confines of the UK so that we keep within regulations
AZ (Availability Zones) These are locations within the selected region analogous to datacenters and we’ve chosen to use all of the available zones a/b/c within the London region
Both public and private subnets will be defined for each AZ, with the private subnets linked to NAT gateways in the public subnet via routing tables so that they have access to the internet but inbound connections from the internet is not possible
Public subnets are also accessible from the internet through an internet gateway attached via another routing table
Multiple security groups and NACL’s will need to be defined and chained
Notes/Questions:
The IP ranges for the subnets will need to be further discussed to ensure there is clarity in what is public/private and elasticity
Load balancers will need to be defined perhaps in a lower level diagram
Jumpbox/Bastion hosts may not be needed if we have the tools/monitoring
Security
There are many security considerations when working with AWS or any kind of infrastructure but the main benefit of using a cloud provider like AWS is that security is “baked in” and that it should provide features to make it easier.
An additional session was done on 22nd April to try a catalogue what security concerns/standards we would like in place for the migration.
Here are some of the high level things we’d want in terms of security
General
Reject by default - start with the least amount of permissions across the board (users, machines, services, access)
There should be no need for services to send traffic externally, this should be locked down
NACL’s to provide a second layer of security to ensure many mistakes with security groups don’t leave service wide open
Domain certificates to be per subdomain rather than wildcards. This will allow finer grain control to allow things like revokes if there were a breach
Encryption
In transit - this is relation to network traffic
At rest - when data is stored on disk (either as files or in a form of a database)
Network traffic within the VPC must be encrypted
Inbound network traffic on HTTP must be redirected to use HTTPS
Security groups
These behave like firewalls which limit (reject) certain types of traffic - can be attached to many types of resources
Many finely grained groups, named appropriately and not shared between VPC’s
Chained where it makes sense
WAF (Web application firewall)
Where it makes sense, to be placed in front of any application that receives traffic sourced externally
Controlled networking routes
Resources can only be reached via certain routes
No public IP’s
Applications must be stored in privates subnets with no direct accessible route from outside the VPC
Network traffic
Subnet to subnet traffic within a VPC will always allow to flow
VPC to VPC traffic would be heavily discouraged unless there was a valid reason and no other choice
VPC to on-prem/other cloud provider communication can be done via IPSEC VPN & BGP. This however is a large under taking and there may be better alternatives
Application
The applications we develop and deploy fit roughly in 3 groups
Front end applications, publicly available, typically served via a web server (in Azure)
Back end applications that are public accessible via HTTP once a user is authenticated
Back end applications that don't require public accessibility but run to serve other applications or do data transformation
With these broad types of applications, we need to find a suitable deployment strategy for each and seeing as this migration is greenfield, all is up for grabs.
Some investigations have already been kicked off around ECS and lambda but like previous sessions we should try and standardise before we invest into any one approach
Like the previous sessions, we’ve had another meeting to discuss the possible solutions for these broad types of applications. Here’s what we’ve come up with so far:
Frontend applications
S3
ECS
Cloudfront
VMs (EC2)
EKS
Backend applications
ECS
VM
EKS
Lambda
Lightsail / Elastic Beanstalk
Backend applications - non public accessible
Just like the BE list above
With this list, we went through a process of elimination to remove products that we thought are not suitable or not possible due to the time constraints or skills within the team. With that in mind, the following was removed for the following reasons:
Cloud front: requires a backing service to initially serve the content
VM’s: the current solution, requires too much moving parts and requires a lot of time and management
EKS: managed kubernetes cluster, although good we don’t have to skills in the team. It’s also possible “overkill” for the service we provide
LightSail/Elastic Beanstalk: to simplistic for our needs, would be a good usecase for basic landing page like products
So that leaves us with: S3 + ECS for the frontend and ECS + Lambda for the backend. We don’t know what is the best choice for each type so spikes will be created to investigate each option
Infrastructure as code
Like all other forms of infrastructure, AWS is prone to configuration drift (where configuration differ’s from environment to environment) and flakey highly customised unique resources that are treated like pets (See: The history of Pets and Cattle). To circumnavigate these issues, we already use tools such as Terraform and Ansible to build and manage infrastructure as well as configure them in a uniform way.
The TIS-OPS Github repository contains the source code for managing the infrastructure in AWS and conforms to a basic format
The two folders under terraform are what is of interest here
environments
Contain a number of sub folders that represent environments within AWS, within these environment folders consist of subsections for which the TIS team management, such as TIS, Revalidation and Trainee self service. Beneath these are services and networking configurations that have been applied to their respective areas e.g. VPC code
This folder also give an area for developers to create and experiment in their own area such that they wont conflict with existing infra or other developers (the local folder)
modules
This will contain the HEE developed terraform modules are allow developers to spin up whole environments / services with minimal copying and pasting while allowing to create resources in a standard manner
There will be times where creating infrastructure as code won’t be possible as constraints such as time or knowledge may force a developer to experiment using the AWS web console. This is ok as long as the developer takes those changes/learning and brings it back into code so that other can reuse and share
Other
With all this now “down on paper” we now have some guidance of where to do next. We’ll need to do several spikes and create a large set of tickets to enable the migration to AWS
Related pages
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213