What makes a cluster
A Rabbit MQ cluster will provide the usual resilience you’d expect from a cluster against any particular node, disk or vm failure. The Rabbit MQ clustering guide https://www.rabbitmq.com/clustering.html is great and describes the following:
Discovery/Joining a cluster
A conf file seemed a good ‘quick’ option. The DNS based discovery sounded like a good way to be able to dynamically resize and modify a cluster but requires someone with the knowledge & access to create and maintain the DNS records.
Failing to connect to one or more nodes won’t stop the node coming up but (configuration dependant) may pause message processing.
Replication/Cluster type
The nodes act as true peers to clients and will internally handle routing to any leader/master where ‘queue mirroring’. Individual queues are defined as ‘classic' 'mirrored' or ‘quorum’ queues. By default, exchanges and bindings are mirrored.
By default, queues are not mirrored or replicated between nodes, they live on one node. Failure of that node, would result in the loss of data but with no replication, this is the option with highest throughput.
Under ‘classic’ behaviour, queue mirroring is established by creating a policy ‘regex’ matching the queues with a
ha-mode
, setting where/how many replicas are created. See: https://www.rabbitmq.com/ha.html#mirroring-arguments.Quorum queues replicate to a ‘majority’ of nodes in the cluster; i.e. if you have a bunch of classic (default) queues; queues need to be recreated with
x-queue-type=quorum
Building the cluster
I started from the cluster that Liban Hirey (Unlicensed) created and applied in the stage environment. I used that along with Paul Hoang (Unlicensed)'s walkthtrough on on “turning a main.tf file in terraform into a module” & work on configuring persistence for the mongodb replicaset. They may be able to help you work out where/why I did something that looks odd.
Terraform
Responsibilities here are:
Creating resources in azure (including the security groups that ping helpful messages to IT)
Install and copy some essential resources to allow ansible to be run by a friendly neighbourhood devops person.
Ansible
There are some ‘generic’ playbooks:
Give everyone ssh access
Install docker-compose etc.
Service specific:
The
rabbitmq-cluster.yml
installs the cluster.Modify the monitoring config and run the
monserver.yml
playbook.
Done
All in https://github.com/Health-Education-England/TIS-DEVOPS/pull/869 .
2 Dev VMs created; 1 commented out in the configuration
Removed firewall rules as I’m not certain what we actually need (seems easier to add than remove holes) and we can’t currently use the rules that were there for the London & Manchester offices.
Snags
Due to hitting a limit on number of VMs in the subscription, Dev has been created as 2 nodes (1 commented out). As per “What makes a cluster”, this is potentially worse than having a single node. 💽
Configuration seems to be ‘all by one method’. I initially tried maintaining the user & vhost environment variables but these were ignored; giving preference to the config file in which I had only defined the cluster discovery mechanism. 🐰
Erlang cookie file: I left it as an environment variable because of the extra settings required it is a security parameter rather than ‘configuration’ as above. 🐰
Terraform seems to install docker but doesn’t have the
docker
group so fails to add theheetis
user. This is rectified by thedocker-upgrade.yml
playbook. 🐳Ansible ‘role’ playbooks; these seemed to be doing more than I could deduce from the playbooks, inc. copying files as directories. I resorted to using the
config
directory that is created irrespective of need. 📁
TODO
As new VMs are provisioned
The ansible playbooks need to be modified to remove exclusions for the IP addresses used (already configured in the ‘hosts’ inventory).
Azure VM
‘Burstable’ seems like a good model but I only used this as that is what was configured.
Data storage
Mount point, size & required throughput (data/s & IOPS).
Iterating over a VM ‘base module' in terraform
Generate the erlang cookie file at deploy time (using ansible?) and set permissions as
600
.
Stage & Prod
It seems cleaner for Liban’s stage cluster (
rabbitmq-cluster-server-[1-3]
interraform/stage/
) to be destroyed.Creating new environments should be simplified by the use of the
rabbitmq-cluster
module which sits in the DEVOPS repo underterraform/common/
.
0 Comments