Migrate Trainee Data from TIS to TIS Self Service Infra
The aim of this document to evaluate what solutions are available for event driven updates from TIS MySQL DB to TIS-Self Service Mongo DB.
Current TIS Setup
Current TIS Self Service Setup
Tested Migration:
Options when using AWS provided tools to migrate database:
There are 2 approaches we can consider to migrate MySQL hosted on Azure to Mongo DB hosted on TIS Self Service AWS VPC.
From above (approach 1) diagram AWS DMS service running on VPC1 (TIS VPC) will migrate the data from MySQL hosted on Azure to AWS RDS (can be AWS MySQL/Aurora). We can create trigger which runs stored proc when change in data happens on a table and from stored proc call Lambda to invoke function which can update Mongo DB running on different VPC through ENI.
From above (approach 2) diagram AWS DMS service running on VPC1 (TIS VPC) will migrate the data from MySQL hosted on Azure to AWS RDS (can be AWS MySQL/Aurora) and at the same time sends data to Amazon Kinesis Data Streams. There can be multiple record consumer created on Kinesis streams which can deliver same data. A Lambda function can consume the streams which further can filter the data and can update the Mongo DB. At the same time we can send to Kinesis Data Analytics to analyse the data and Graph database.
Both approaches require to write Lambda function to update MongoDB on certain conditions.
First approach will require to write triggers and SP to find change in data.
Second approach will add some more costs due to Kinesis and Lambda. Lambda costs might be more as all the changed data will be sent to it and filtering will be done within Lambda function.
Second approach will give more flexibility to add analytics and auditing.
Choosing between Amazon’s RDS MySQL and Aurora:
S.No. | Points to consider | Amazon MySQL | Amazon Aurora | Comments |
1 | Cost |
| 20% more thenAmazon SQL (from an article) | I couldn't find much as cost depends on many factors. |
2 | Performance | I/O is faster thenAurora | Responses are faster then MySQL |
|
3 | Performance - Query cache | Disabled from V5.7.20 | Better than MySQL | MySQL had performance issues with query cache so it was disabled from 5.7.20 version onwards |
4 | Performance - DDL |
| Faster |
|
5 | Performance - Connection Pooling | One thread per connection | Adaptive Thread Pool | In Aurora each thread will be able to handle multiple connections which is a new method so called multiple-connections-per-thread. Aurora support 16000 max connections and is designed for high number of concurrent connections with better performance compared to MySQL. |
6 | Replications | upto 5 replicas | upto 16 replicas |
|
7 | TIS MySQL (5.7) Support | Yes | Yes |
|
8 | TIS MySQL Storage Engine (InnoDB) support | Yes | Yes | Aurora doesn't support MyISAM |
9 | Capacity Planning | Auto grow | Auto grow | Aurora automatically increase the capacity from 10GB to 64TB in increment of 10GB. I think same applies to MySQL. |
Misc reading:
Lambdas using Db triggers: https://aws.amazon.com/blogs/database/capturing-data-changes-in-amazon-aurora-using-aws-lambda/
Guide on migrating including a bit on CDC with Amazon DMS https://aws.amazon.com/blogs/database/migrating-from-mysql-to-amazon-aurora-using-aws-sct-and-aws-dms/
Debezium for creating CDC from MySQL - https://debezium.io/
Amazon Kinesis Agent - https://github.com/awslabs/amazon-kinesis-agent
Migration playbook to Aurora - https://d1.awsstatic.com/asset-repository/RDS%20Data%20Migration/SQL%20Server%20to%20Aurora%20MySQL%20Migration%20Playbook.pdf
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213