Message Queue overview

It has become apparent that as we continue to work on TIS, adding new features, fixing bugs and doing new work, that there is a need to have cross systems communication. We have gotten to a point there more and more changes are affecting other systems and that having knowledge of what system needs to know what and when is getting difficult to manage. This way of working is known an a orchestrator pattern, where an event occurs and its up to the event source to let other parties know of the event.

The alternative to the orchestrator approach is to use the choreography approach. This is where an event occurs and is published. Once published, other systems subscribed to that particular type of event react to it. This decouples the source of the event, to all of the systems that need to react to the event. Message queues is one of the main ways in which this choreography approach is implemented.

The following is a very high overview of 2 messaging systems that we could potentially use and is not intended to be used to pick a particular one but to give a distilled sense of the two systems.

What are Message queues?

Message queues are systems that are wide and complex but in essence, they help the communication between systems while also decoupling them. Features include, consistency, high availability, reliability and speed.

Which Message queues?

The sector is awash with many systems but the two most popular out in the industry right now seem to be Kafka and RabbitMQ. Both of which have good supported in our Backend framework, Spring boot.

Both systems have the same result but the way in which they do it, how they are architected as well as the concepts are vastly different. The following (I hope) will provide a high level overview of what these systems are like on a number of criteria

Queues

RabbitMQ has a concept of exchanges and queues where exchanges receive and route messages to queues and queues deliver the messages to consumers. Whereby once delivered, the message is deleted.

Kafka doesn't have the concept of a queue but have a notion of Partitions. These partitions are like queues that are append only, where consumers read messages and keep a note of where they are in the partition so that they do not re process messages. Messages are not immediately removed from the partition after it is read but are removed when a retention policy is breached.

This persisted queue allows for an interesting feature, which allows a consumer to replay messages that it already read. It does this by changing the offset value (the position of the consumer in the partition).

Dead Letters.

If there are issues with routing messages (queue full) or processing them, RabbitMQ has the notion of a dead letter queue. These can be read at a later date for be investigated and processed.

Kafka on the other hand can have dead letter queues but it isn't required for cases of routing.

Ordering / Message Retrieval

Messages in RabbitMQ are ordered in a First In First Out (FIFO) order. Seeing as its possible for multiple consumers be attached the a queue, that means that the processing of messages are always ordered, as some consumers could be processing at different speeds. Messages by default are provided in a push manner, meaning messages are push by RabbitMQ to the consumer. This give more control to RabbitMQ in terms of load balancing but also means that it can overwhelm the consumer. Of course messages can be pulled by the consumer but it can only pull one message at a time

Kafka, like RabbitMQ orders messages in a FIFO matter but unlike RabbitMQ can only achieve this within a single partition. But as Kafka allows only a single consumer on a single partition, ordering is not much of an issue.

It should be noted that Kafka DOES NOT support global ordering over partitions, to get around this, messages can be routed to specific partitions using an algorithm, which means that certain types of messages can always go to the same partition (and therefore keep an order). One other thing to keep in mind, is that this algorithm typically involves the total amount of partitions, which means that any running system that has specific messages going to one partition, once a change of partitions is applied, those messages could be going to a different partition, which would make replay difficult

By default Kafka reads messages in batch (pull from consumer) and is a synchronous.

Speed, Latency and Scale.

Kafka is well known for it performance characteristics, its asynchronous (by default) and as such allows messages to flow through over 100k per sec. RabbitMQ on the other hand isn't as fast, even with its ability to push data to consumers it can only do so at the rate of approx 20k per sec.

Even though RabbitMQ is substantially slow than Kafka, unless we have a reason to be pushing large amounts of data through the system at speed, either system will be ok

Scaling, is done by adding new nodes and attaching them to the cluster as well as attaching new instances of the consumers, I don't know how easy this will be though

Consistency and Reliability

Both systems can be run in a cluster, Kafka pushes this type of setup. Queues in RabbitMQ have the ability to mirror queues while Kafka has replicated partitions

Administration / dependencies

Running a RabbitMQ cluster seems a lot easier than a Kafka one. Kafka requires the standard minimum of 3 nodes of Kafka and another cluster of Zookeeper nodes to maintain the Kafka cluster.

Kafka being a "dumb system" with smarts in the consumers, don't need as much administration while RabbitMQ being "Smart" needs a lot more upfront admin

Conversely, with RabbitMQ being a "smart" system, developers or admin staff will need to setup the exchanges, bindings and queues, this should then make developing consumers and therefore running them easier as they should be stateless and "dumb". The configuration of RabbitMQ can be done via a nice web interface that can be installed separately. This allows a user to dynamically/at run time configure the system.

Kafka on the other hand will need to have "smarter" consumers that would have to have some knowledge of state and the ability to manipulate it (offset). Management is still a little difficult, as it requires the use of scripts e.g. the creation of topics is done with the kafka-topics.sh script with a number of arguments that state replication and partitions

Developer experience

Spring boot, our backend framework offers good support for a wide range of messaging systems from JMS, AMQP and Kafka and with most of Spring boot provides auto configuration.

A note on flow.

Some thought needs to be given to how developers are going to use these systems and how we are going to have them in our dev, stage and production environments. As we're currently on Azure as our cloud provider some thought into its managed "Service Bus/Event Hub" should be considered. It may turn out to be cheaper to manage in terms of resource and people but this will be a problem for developers working in their local environment.


Code examples

RabbitMQ

Setup for a spring boot app is easy, ensure that the starter is imported as a dependency with 

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-amqp</artifactId>
</dependency>


Method to send the Message

@Component
public class Sender {

  public static final String queueName = "blahqueue";

  @Autowired
  private AmqpAdmin amqpAdmin;
  @Autowired
  private AmqpTemplate amqpTemplate;

  public void sendMessage(String message) {
    amqpTemplate.convertAndSend(queueName, message);
  }
}

Endpoint to trigger the sending of a message

  @PostMapping("/say-hello")
  public String sayHello() {
    LocalDateTime now = LocalDateTime.now();
    String message = "Hello... the date and time is: " + now.toString();
    sender.sendMessage(message);
    return message;
  }

Receiver

@Component
public class Receiver {

  @RabbitListener(queues = Sender.queueName)
  public void processMessage(String content) {
    System.out.println("Got message from queue: " + content);
  }

}


Kafka

Much like the RabbitMQ example, connecting, sending and receiving messages is simple.

Dependencies

<dependency>
	<groupId>org.springframework.kafka</groupId>
	<artifactId>spring-kafka</artifactId>
</dependency>

Send data

@Component
public class Sender {

  public static final String TOPIC = "blahTopic";

  @Autowired
  private KafkaTemplate kafkaTemplate;

  public void sendMessage(String message) {
    kafkaTemplate.send(TOPIC, message);
  }
}

Receive data

@Component
public class Receiver {

  @KafkaListener(topics = TOPIC)
  public void processMessage(String content) {
    System.out.println("Got message: " + content);
  }
}


Plugins

Both systems provide plugins to extend the features of the corresponding message system


Demo

A spike  TISNEW-1531 - Getting issue details... STATUS  was done during the sprint to quickly demo what can be done with audit logs. The auditing library is in Shared-Modules Audit where it currently logs at INFO level to file and console.

For the demo, we used a similar setup to production, this was changed to use a message queue instead e.g.

public class TisAuditRepository implements AuditEventRepository {
  private static final String AUDIT_ROUTING_KEY = "audit_queue";
  private ObjectMapper mapper = new ObjectMapper();

  @Autowired
  private AmqpTemplate amqpTemplate;

  @Override
  public void add(AuditEvent event) {
    try {
      amqpTemplate.convertAndSend(AUDIT_ROUTING_KEY, mapper.writeValueAsString(event));
    } catch (JsonProcessingException e) {
      throw new RuntimeException(e.getOriginalMessage());
    }
  }

  @Override
  public List<AuditEvent> find(String principal, Instant after, String type) {
    throw new UnsupportedOperationException();
  }
}

The audit logs were pulled by Logstash was used to collect the data and feed it to Elasticsearch. Logstash has modules that can ben enabled to allow the automated pull from Kafka or RabbitMQ. The setup for this was

input {
  kafka {
    bootstrap_servers => "10.160.31.49:9092"
    topics => ["audit_topic"]
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_temaplate => false
    index => "audit_kafka"
  }
}
input {
  rabbitmq {
    hosts => "10.160.31.49"
    queue => "audit_queue"
    durable => true
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_temaplate => false
    index => "audit_kafka"
  }
}
spring:
  rabbitmq:
    host: localhost
    port: 5672
    username: guest
    password: guest
  kafka:
    bootstrap-servers: localhost:9092
    consumer.group-id: myGroup


This quick demo was sucessful in that we managed to automate all audit logs from going to the log files to Elasticsearch - to Elasticsearch via a message queue with logstash

Resources

https://jack-vanlightly.com/blog/2017/12/4/rabbitmq-vs-kafka-part-1-messaging-topologies

https://content.pivotal.io/blog/messaging-patterns-for-event-driven-microservices

https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka

https://metabroadcast.com/blog/processing-dead-letter-queues

https://www.rabbitmq.com/ha.html

https://stackoverflow.com/questions/42151544/is-there-any-reason-to-use-rabbitmq-over-kafka

https://docs.spring.io/spring-boot/docs/2.1.2.RELEASE/reference/htmlsingle/#boot-features-messaging

https://kafka.apache.org/documentation/