Starting the migration - technical decisions and strategy

So it's been quite a while since the initial “Moon on the stick” session and we’ve learnt quite a bit since then. It’s been decided that the migration to AWS would happen as soon as possible, with the intention of switching off Azure (In regards to TIS) and serving TIS from AWS. This has meant that a gradual move to AWS with the intention of redesigning parts of TIS won’t be possible as it will impact costs as we would be running in 2 environments concurrently. This subsequently means that an “As is” migration will happen, keeping things a static as possible with the intention of changing TIS via technical debt tickets

Where are we now?

The following is a diagram of what we have for TIS in both a hardware and software view.

The Devops team has done some preliminary Spikes/Investigations to gauge possible technical strategies. This has left us with a number of virtual machines (EC2 instances) and a large part of the TIS services already deployed in AWS.

Methodology/Strategy/Focus

TIS is a medium sized application with many developers working on it at any particular time. It also has downstream dependencies which we don’t control but may influence. With this in mind, we should try to adhere to the following principles.

Developer Flow

Whatever we do, we must bring the rest of the team with us by keeping them “in the know”. We’ll also need to make as little change to the environments and applications as possible. This is so that if anything happens, developers will not need to do anything “special” or at least they would know what they’d need to do to achieve what they need to in the new environment. tldr minimise hacks and keep things as close to what they are now.

Seamless

The migration should be completely seamless to a user. While browsing on TIS while on Azure, should look and feel no different to when it is deployed on AWS. This will mean a big bang approach and systems running concurrently until its ready to switch over

Downstream dependencies such as the NDW and GMC should also have no impact.

To infinity and beyond

Any work done for the migration should have some thought to migrating towards “The moon on a stick”. So any solution should make it easier and not harder

Infrastructure as code

The team as well as the individuals in the Devops team use both Ansible and Terraform to manage infrastructure as code. It has been decided that although it would be good to do the migration via these tools, it would probably take too long and would be wasteful as once we’ve completed this ‘As Is’ migration, we would be on our way to already be moving away from it, chipping it away and improving thing via tech debt tickets

The work

We currently see a number parts of this migration work. This consists of:

Data

We currently have MySQL holding the majority of the TIS data, as well as MongoDB holding data used for the ESR (new world) integration work.

Our intention is to get the data migrated into AWS “As Is” with MySQL vm’s (just like in Azure) as EC2 instances and have “DMS” (Database Migration Service) streaming updates from Azure into AWS. These DB VM’s will have the same network addresses and credentials so the current deployed applications will not need to be updated in any way (or have any different environment variables) in order for them to serve data.

These new DB VM’s will then have another “DMS” instance streaming data into the AWS managed service databases (RDS/Aurora). These databases will be the final destination of the data, where new systems such as Trainee UI and “Moon on a stick” TIS will be able to access data from.

This data strategy will allow us to have all services running in parallel (both aws and azure), destroy the services in Azure and have AWS serve all traffic and keep data for trainee ui fresh

 

Warning: with this strategy, it's very important that users that have access to the AWS version of TIS while it's still running in Azure should NOT be allowed to modify data. This would cause a disjunct of what the correct data is. So only only Azure has been decommissioned, should users be able to change data in AWS

We also have data stored in on attachable disks in Azure (Jenkins/Mongo/Rabbit). Some thought will need to be had around the type of data these systems hold and whether or not its required to migrate them.

Developers and Services

During the migration, we’re not going to stop developers from doing their jobs. It won’t be feasible to tell the business that all features/bug fixes will need to stop until we’ve completed the migration. With this in mind, we are having to come up with ways to allow the building/deployments of services to continue with zero impact.

With this in mind, we’ve come up with initial plans to have builds/pipelines run concurrently in both Azure and AWS. This will allow us to have both systems in sync in terms of maven artefacts and services.

This will involve creating additional webhooks for all of the github repositories that makes calls to the AWS jenkins instance at the time of merges, branches and PR’s. Additional changes to all of the pipelines will need to happen to push docker images straight to AWS’s ECR (docker container registry) - this will then save us from paying for 2 lots of storage for docker images.

An additional change to remove the DEV environment from the pipelines will also be required as it’s been decided that we will only have a PROD and PREPROD environments in AWS once this is done, we can destroy the DEV systems in Azure saving money early on.

 

Environments

As stated earlier, as the DEV environment in Azure isn’t used much, we'll want to only have 2 environments in AWS (prod/preprod). This would simplify deployments as well as save HEE money (less virtual machine instances).

The new AWS environments will also have the exact same subnet CIDR blocks too. This will enable us to keep all of the same internal IP’s in AWS - further simplifying things for both developers (they will not need to learn new IP’s and machines) as well as simplifying the pipeline deployments as the Ansible configuration (inventory, var’s and vault) will not need to change