Microservice Architecture Design
The purpose of this page is to share the initial designs to which we will try to adhere to on this project, so that we can promote a shared understanding.
Not everything has been thought through as of yet and there will be a lot of detail missing but being true to agile values things will change, we will adapt and this document will be updated to reflect that
The Data
Overall, the existing system only has two types of files it consumes. These are the RMT (C/F) files as well Confirmation based files. The RMT file contains 7 different types of records. These records will be striped out of the files, turned into messages and placed into their own queues for processing. These messages will likely go through a number of functions held within one or two services where they will be processed and then ultimately placed into a final queue that writes it into a database.
The confirmation types are stored in the DCC and APC files (notification and APP confirmations respectively). These are relatively simple files that can be placed straight into an audit service
Each message starting from the parsing of the file, will contain meta data detailing correlation IDs, file it originated from, the queue its in, the deanery, created date, record type, deanery post number, retry count, error messages and the message body itself. This is to allow any auditing system to easily filter/search data flowing through the system.
Other data is sourced and triggered from TIS itself. Data such as notifications can be triggered based on date/time (where the day rolls over and a Placement falls between a window) or based on creation/update. In both cases, the main body of data is the Placement itself.
To summarise, 7 main data journeys have been identified (so far)
- POR records
- POS records
- Personal details records
- Deletion records
- App confirmation records
- Notification records
- Notification confirmation records
The Services
The existing system was developed as single "Microservices" backed with a database and a single ETL that was triggered multiple times a day with different arguments to trigger different behaviour. The new design will contain a number of different services based on the following tech:
- Build tool: gradle
- Language: Java 11
- Framework: Spring boot v2+
- Authorisation: Keycloak
- Utility functions: Apache commons & Guava
- Unit Tests: Junit & Mockito
- Integration/Functional Tests: RESTAssured
All will be built using the current build tool Jenkins but will run their own build pipelines. If possible, to limit any conflict on Jenkins, it is suggested that we will use Dockers relatively new feature multi stage builds, this will allow us to encapsulate the build tools so that there is no need to install Java 11 on Jenkins itself. This method will allow us to leverage the current infrastructure, use current knowledge (thats also in the rest of the team) but also gives us the flexibility to port over to another build system that uses containers in the future.
There will be two services built for the reading of inbound data (to be placed on queues) and the reading of outbound data (to write to files). These inbound and outbound services will be relatively simple (dumb) in that they wont have too much business logic.
Then there will be a service for each of the data journeys. There is a risk that we will create a distributed monolith, in that each stage of the data journey will be services themselves. This is highly unwanted as following a single journey would require multiple projects open and will be difficult to develop and debug. So there will be an additional 7 services for each journey as well as an audit service/ui if required
Another service will be required to inspect the deadletter queue so that we can investigate errors or requeue the messages.
Service Names
- InboundReaderService
- OutBoundWriterService
- PorService
- PosService
- PersonalDetailService
- DeletionService
- AppConfirmationService
- NotificationService
- NotificationConfirmationService
- AuditService
- AuditAdminUI
- RetryService
The Queue
RabbitMQ has been chosen as the message broker system of choice. We've discussed many aspects of this tool already, some of which are:
- Message processing order
- The most simplest form of rabbit allows messages to be consumed in a FIFO (first in first out) form. If performance is shown to be an issues and there are no dependency between records, then deploying replica's will allow for a round robin between consumers
- Deployment
- The VM's should be deploying using our Infrastructure as code tool (Terraform)
- The broker itself should be deployed using docker containers
- Retries
- Retry queues will be used and if possible, use plugins are enable this feature
- Dead letter queues
- We don't want to lose messages, so all messages that cant be processed must be placed somewhere to be investigated
- Securing the system
- It mustn't be exposed to the outside if we are to go for a batched based system
- Resiliency
- The system must be resilient, so we may need a number of VM's clustered to ensure it stays up and operational
- Auditing
- must be a first class citizen in this implementation.
The Database
We've decided that because of the nature of the data (data that isn't modified much and is very flat) and that we would want it to be as fast as possible, we've decided to use a noSQL database, MongoDB was our favourite due to it's portability and interoperability
There are a number of points in this new architecture where we will like to store state, the main point is at the end of the data journey, where we'll need to store the records to be written to files.
Depending if we choose to create our own audit type service, we may also store the message bodies as a whole into an audit table so we can easily track state changes
The Existing system
The current and future integration work depends on some of the existing services. All the REST Endpoints are secured with authorisation roles that require communication to the profile service to validate a logged in users roles.
There is also a dependency on the TCS service which holds among other entities, the Post and Placement data.
Some additional work may happen to split Posts and Placements from TCS as the service is already large and additional strain on the service may cause issues with other parts of TIS
Diagrams
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213