Where are we now (microservices)
On 24/07/2018 we had a developers meeting to surmise where we are in terms of the system and what issues we know about. Below is a summary of that meeting, with the hopes that we will move forwards for fix these issues in the coming sprints
Meeting objectives
1. to understand what we currently have as the TIS system
2. to understand the issues we currently have
3. to float some ideas on how to effectively fix the issues
We went through some of the ways in which the TIS system works; the flow of requests
and the infrastructure and came up with some rudimentary diagrams.
Notes:
- this is just happy flow after obtaining a valid JWT token
- communication with KC is transparent to validate a JWT token
- the services based on spring boot with spring security (when configured correctly) automatically does the authorisation automatically
Notes:
- this is another basic image of how our services are deployed
- for each environment, we have two machines (blue, green) where our services are deployed
- each machine has a full copy of the stack deployed in docker containers, the only difference is that blue will also contain K.C
- the ELK stack and monitoring services are on separate machines
- the external facing load balancer is the entity that decides where the request is forwarded to
Notes:
- a further look into a single machine where the services are deployed (green in this case)
- communication happens over the docker network
- what happens when a service goes down
Notes:
- what happens when profile goes down?
- what is a cascading failure
Questions:
- What do we do when a service goes down?
- How do we know?
- How do we notify the end user - how do we react?
Problems:
- K.C goes down, whole system is no longer available
- We only have one DB. Not the microservices way, another single point of failure
- Read replicas not in place
- Can the LB go down or struggle with traffic
- Profile service goes down we get cascading failures
- Docker network stopping us from communicating to other instances of a service
Possible solutions:
- Service discovery
- Load balancing infront of all services
- Kubernetes
- Hystrix for graceful failure
Slack: https://hee-nhs-tis.slack.com/
Jira issues: https://hee-tis.atlassian.net/issues/?filter=14213