Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

What makes a cluster

A Rabbit MQ cluster will provide the usual resilience you’d expect from a cluster against any particular node, disk or vm failure. The Rabbit MQ clustering guide https://www.rabbitmq.com/clustering.html is great and describes the following:

Discovery/Joining a cluster

A conf file seemed a good ‘quick’ option. The DNS based discovery sounded like a good way to be able to dynamically resize and modify a cluster but requires someone with the knowledge & access to create and maintain the DNS records.

Failing to connect to one or more nodes won’t stop the node coming up but (configuration dependant) may pause message processing.

Replication/Cluster type

The nodes act as true peers to clients and will internally handle routing to any leader/master where ‘queue mirroring’. Individual queues are defined as ‘classic' 'mirrored' or ‘quorum’ queues. By default, exchanges and bindings are mirrored.

  • By default, queues are not mirrored or replicated between nodes, they live on one node. Failure of that node, would result in the loss of data but with no replication, this is the option with highest throughput.

  • Under ‘classic’ behaviour, queue mirroring is established by creating a policy ‘regex’ matching the queues with a ha-mode, setting where/how many replicas are created. See: https://www.rabbitmq.com/ha.html#mirroring-arguments.

  • Quorum queues replicate to a ‘majority’ of nodes in the cluster; i.e. if you have a bunch of classic (default) queues; queues need to be recreated with x-queue-type=quorum

Building the cluster

I started from the cluster that Liban Hirey (Unlicensed) created and applied in the stage environment. I used that along with Paul Hoang (Unlicensed)'s walkthtrough on on “turning a main.tf file in terraform into a module” & work on configuring persistence for the mongodb replicaset. They may be able to help you work out where/why I did something that looks odd.

Terraform

Responsibilities here are:

  • Creating resources in azure (including the security groups that ping helpful messages to IT)

  • Install and copy some essential resources to allow ansible to be run by a friendly neighbourhood devops person.

Ansible

There are some ‘generic’ playbooks:

  • Give everyone ssh access

  • Install docker-compose etc.

Service specific:

  • The rabbitmq-cluster.yml installs the cluster.

  • Modify the monitoring config and run the monserver.yml playbook.

Done

All in https://github.com/Health-Education-England/TIS-DEVOPS/pull/869 .

2 Dev VMs created; 1 commented out in the configuration

Removed firewall rules as I’m not certain what we actually need (seems easier to add than remove holes) and we can’t currently use the rules that were there for the London & Manchester offices.

Snags

  • Due to hitting a limit on number of VMs in the subscription, Dev has been created as 2 nodes (1 commented out). As per “What makes a cluster”, this is potentially worse than having a single node. 💽

  • Configuration seems to be ‘all by one method’. I initially tried maintaining the user & vhost environment variables but these were ignored; giving preference to the config file in which I had only defined the cluster discovery mechanism. 🐰

  • Erlang cookie file: I left it as an environment variable because of the extra settings required it is a security parameter rather than ‘configuration’ as above. 🐰

  • Terraform seems to install docker but doesn’t have the docker group so fails to add the heetis user. This is rectified by the docker-upgrade.yml playbook. 🐳

  • Ansible ‘role’ playbooks; these seemed to be doing more than I could deduce from the playbooks, inc. copying files as directories. I resorted to using the config directory that is created irrespective of need. 📁

TODO

As new VMs are provisioned

  • The ansible playbooks need to be modified to remove exclusions for the IP addresses used (already configured in the ‘hosts’ inventory).

Azure VM

  • ‘Burstable’ seems like a good model but I only used this as that is what was configured.

Data storage

  • Mount point, size & required throughput (data/s & IOPS).

  • Iterating over a VM ‘base module' in terraform

  • Generate the erlang cookie file at deploy time (using ansible?) and set permissions as 600.

Stage & Prod

  • It seems cleaner for Liban’s stage cluster (rabbitmq-cluster-server-[1-3] in terraform/stage/) to be destroyed.

  • Creating new environments should be simplified by the use of the rabbitmq-cluster module which sits in the DEVOPS repo under terraform/common/.

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.