SlideShare une entreprise Scribd logo
1  sur  71
Télécharger pour lire hors ligne
© 2016 Nokia1
A Closer Look at RabbitMQ
(i.e. a few notes on its internals,
performance, and clustering)
Kyumars Sheykh Esmaili (with additional inputs from Philippe Dobbelaere)
July 2016
2
• Part I: Internals (slide no. 3)
• Part II: Basic Benchmarking (slide no. 24)
• Part III: Clustering (slide no. 43)
• Part IV: Summary (slide no.69)
Outline
3 <Change information classification in footer>
Part I:
Internals
4
• Consumption alternatives: Push (deliver) vs Pull (get)
– Acknowledgements (=> impact of prefetch)
• Finding bottlenecks in RabbitMQ
– Publisher side: Flow Control
• Related topic: Alarms
– Consumer side: Consumer Utilization
• Impact of queue length
Internals - Overview
5
• Consumption alternatives: Push (deliver) vs Pull (get)
– Acknowledgements (=> impact of prefetch)
• Finding bottlenecks in RabbitMQ
– Publisher side: Flow Control
• Related topic: Alarms
– Consumer side: Consumer Utilization
• Impact of queue length
Internals - Overview
6
• There are two ways for applications to consume messages from a queue:
– Have messages delivered to them ("push API"): i.e. ‘basic.deliver’
• It does not require a roundtrip to the broker => can achieve higher delivery rates
– Fetch messages as needed ("pull API"): e.g. ‘basic.get’
• In either case, a consumer can acknowledge messages
– Use of acknowledgments allows for stronger delivery guarantees (using ‘basic.ack’)
• It’s possible to ACK more than one message at once => allows stronger guarantees with higher performance
• If client fails to acknowledge, RabbitMQ re-enqueues the messages
– This can be turned off by setting the ‘auto-ack/no-ack’ option
• Will result in higher performance, but weaker guarantees
Message Consumption Alternatives in AMQP 0.9.1
https://www.rabbitmq.com/amqp-0-9-1-reference.html
7
• What’s Prefetch?
– while consuming messages, the client can request that messages be sent in advance so that when
the client finishes processing a message, the following message is already held locally, rather than
needing to be sent down the channel
– So, it is really a "windowed ack" implementation for push mode
• Prefetching gives performance improvement
• Prefetch is only applicable when consumer acknowledgement is enabled
– They are ignored if the no-ack option is set
Consumption Speed-up through Prefetch
https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/
https://www.rabbitmq.com/amqp-0-9-1-reference.html
8
• Limits are set through prefetch-size or prefetch-count
– Of the basic.qos method
– Default value = 0
• meaning "no specific limit"
• The server may send less data in advance than allowed by the client's specified prefetch
windows but it MUST NOT send more.
• The server MUST ignore this setting when the client is not processing any messages
• AMQP defines this per channel; Optionally RabbitMQ can apply this per individual
consumer
Prefetch Parameters
https://www.rabbitmq.com/amqp-0-9-1-reference.html
9
• Consumption alternatives: Push (deliver) vs Pull (get)
– Acknowledgements (=> impact of prefetch)
• Finding bottlenecks in RabbitMQ
– Publisher-side: Flow Control
• Related topic: Alarms
– Consumer side: Consumer Utilization
• Impact of queue length
Internals - Overview
10
• RabbitMQ provides useful information to help users spot bottlenecks
• Two groups of bottlenecks: on the publisher side (more critical) or on the consumer side
• On the publisher side, RabbitMQ has a very effective (and somehow aggressive?)
backpressure mechanism called “Flow Control”, to mitigate bottlenecks and avoid
crashes
• Resource (Memory/Disk) Alarms are also integrated into the backpressure mechanism
(serve as triggers)
• On the consumer side, Consumer Utilization can provide useful hints
Finding Bottlenecks in RabbitMQ
11
The Publisher Side of RabbitMQ: Stages & Their Responsibilities
https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
• Side note: there is no one-to-one mapping between "processes" and "architectural
components” of RabbitMQ: e.g. while queues are actual processes, exchanges are not.
hence the routing is part of the channel process, but since most of the LOGIC is in
exchange/routing, this overview is a bit under-representing it.
12
Further Elaboration (from Philippe)
13
• In order to prevent any of those processes from overflowing the next one down the
chain, we have a credit flow mechanism in place.
• Each process initially grants certain amount of credits to the process that it’s sending
them messages. Once a process is able to handle N of those messages, it will grant
more credit to the process that sent them
Flow Control in RabbitMQ through “Flow Credit”
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
reader -> channel -> queue process -> message store
reader <--[grant]-- channel <--[grant]-- queue process <--[grant]-- message store.
14
The Flow Control Sign & Examples
That’s the Flow Control
sign (under the ‘state’
column)
Example for
Queue Flow
Control
Examples for
Connection
Flow Control
15
• If a connection is in flow control, but none of its channels are - This means that one or
more of the channels is the bottleneck; the server is CPU-bound on something the
channel does, probably routing logic. This is most likely to be seen when publishing
small transient messages.
• If a connection is in flow control, some of its channels are, but none of the queues it is
publishing to are - This means that one or more of the queues is the bottleneck; the
server is either CPU-bound on accepting messages into the queue or I/O-bound on
writing queue indexes to disc. This is most likely to be seen when publishing small
persistent messages.
• if a connection is in flow control, some of its channels are, and so are some of the
queues it is publishing to - This means that the message store is the bottleneck; the
server is I/O-bound on writing messages to disc. This is most likely to be seen when
publishing larger persistent messages.
VERY IMPORTANT: How to Decode Flow Control Signs
https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
16
Further Elaboration (from Philippe)
17
• Related config variables
– vm_memory_high_watermark
• It sets the upper limit of how much of the
'installed' memory on the machine
RabbitMQ can use
• Default: 0.4 (I changed it to 0.6)
– vm_memory_high_watermark_paging
_ratio
• It sets a ratio on the above limit, to tell
RabbitMQ when to start moving
messages from memory to the disk
• Default: 0.5 (I changed it to 1.0)
• See also
– https://www.rabbitmq.com/production-
checklist.html
Alarms: Memory/Disk Allocation & Flow Control Triggers
18
Resource Allocation/Utilization: Where to Check Them
Important in cluster-setup
19
• The flow control mechanism doesn't extend as far as
consumers, but we do have a new metric to help you
tell how hard your consumers are working.
• That metric is consumer utilization. The definition of
consumer utilization is the proportion of time that a
queue's consumers could take new messages. It's thus
a number from 0 to 1, or 0% to 100% (or N/A if the
queue has no consumers).
• So if a queue has a consumer utilization of 100% then
it never needs to wait for its consumers; it's always
able to push messages out to them as fast as it can.
Finding Bottlenecks: The Consumer Side
https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
20
• If its utilization is less than 100% then this implies that its consumers are sometimes not
able to take messages. Network congestion can limit the utilization you can achieve, or
low utilization can be due to the use of too low a prefetch limit, leading to the queue
needing to wait while the consumer processes messages until it can send out more.
Consumer Utilization (cont.)
21
• Consumption alternatives: Push (deliver) vs Pull (get)
– Acknowledgements (=> impact of prefetch)
• Finding bottlenecks in RabbitMQ
– Publisher side: Flow Control
• Related topic: Alarms
– Consumer side: Consumer Utilization
• Impact of queue length
Internals - Overview
22
• RabbitMQ's queues are fastest when they're empty.
– When a queue is empty, and it has consumers ready to receive messages, then as soon as a
message is received by the queue, it goes straight out to the consumer.
– The main point is that very little book-keeping needs to be done, very few data structures are
modified, and very little additional memory needs allocating. Consequently, the CPU load of a
message going through an empty queue is very small.
• If the queue is not empty then a bit more work has to be done:
– the messages have to actually be queued up. Initially, this too is fast and cheap as the underlying
functional data structures are very fast.
– Nevertheless, by holding on to messages, the overall memory usage of the queue will be higher,
– and we are doing more work than before per message (each message is being both enqueued and
dequeued now, whereas before each message was just going straight out to a consumer), so the
CPU cost per message is higher.
– data structures are optimized to be fast when queues are near to empty
Queue Length: Benefits of Empty/Near-empty Queues
https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
23
• Additionally, if a queue receives a spike of publications, then the queue must spend time
dealing with those publications, which takes CPU time away from sending existing
messages out to consumers:
– a queue of a million messages will be able to be drained out to ready consumers at a much higher
rate if there are no publications arriving at the queue to distract it.
• Eventually, as a queue grows, it'll become so big that we have to start writing messages
out to disk and forgetting about them from RAM in order to free up RAM.
– At this point, the CPU cost per message is much higher than had the message been dealt with by
an empty queue.
– and more importantly, the latency per message grow drastically, regardless of CPU utilization due to
the slow path over the disk
Queue Length: Growing Overhead of Bursty/Long Queues
https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
These statements have been experimentally verified
(see Part II of this slide set)
24 <Change information classification in footer>
Part II:
Basic Benchmarking
25
Basic Benchmarking - Overview
• Experimental Setup
• Throughput rates
– Publish Only
– Consume Only
– Simultaneous Publish and Consume
• Memory consumption scheme
• Impact of disk access
26
Basic Benchmarking - Overview
• Experimental Setup
• Throughput rates
– Publish Only
– Consume Only
– Simultaneous Publish and Consume
• Memory consumption scheme
• Impact of disk access
27
• Machine:
– Kernel Version: 3.13.0-91-generic
– Operating System: Ubuntu 14.04.4 LTS
– CPUs: 8
– Total Memory: 7.715 GiB
• RabbitMQ version: 3.5.3
• Client: Java client's PerfTest (next slide)
• Server and client on the same machine
• Server running in a Docker container; client running on the host
Experimental Setup
28
Performance Measurement Tool: PerfTest
https://www.rabbitmq.com/java-tools.html
29
• #consumers: 1
• #producers: 1
• #queues: 1
• Load: 1 million (10^6) messages of size 1kb each
• Exchange of type topic; routing key “pertest”
• Acknowledgement enabled (prefetch: unlimited)
Default Setup
30
Disclaimer:
no benchmarking is flawless or
complete
31
Basic Benchmarking - Overview
• Experimental setup
• Throughput rates
– Publish only
– Consume only
– Simultaneous publish and consume
• Memory consumption scheme
• Impact of disk access
32
Throughput Rates - Publish Only
NOTE: back pressure kicks in for all these cases
No bound
queue
1 bound
queue
2 bound
queues
Increaseinroutingoverhead
Increaseinlevelofbackpressure
Increaseinmemory/CPUutilization
33
Throughput Rates - Consume Only (Ack vs No-Ack)
Acknowledgment has a significant
impact on throughput
34
Throughput Rates - Consume Only (Varying prefetch count)
prefetch count = 1
prefetch count = 100
prefetch count = 10
prefetch count = 0 (unlimited)
35
Simultaneous Publish and Consume (varying Numbers)
1 publisher, 1 consumer 2 publishers, 1 consumer
2 publishers, 2 consumers 5 publishers, 5 consumers
36
Basic Benchmarking - Overview
• Experimental setup
• Throughput rates
– Publish only
– Consume only
– Simultaneous publish and consume
• Memory consumption scheme
• Impact of disk access
37
Memory Consumption Scheme – A Long, Stable Queue
Notice the considerable memory
consumed by the queue process
itself (for message metadata,
indexes,…)
Injected 1M messages, each 1kb => total ~
1GB
38
Memory Consumption Scheme – An Active-but-almost-empty Queue
The queue process consumes
very little memory
Empty queue
39
• In RabbitMQ, memory used by
message bodies is shared
among processes
– Under a group called “Binaries”
• This sharing also happens
between queues too
– if an exchange routes a message
to many queues, the message
body is only stored in memory
once.
Memory Consumption Scheme – Sharing Across Queues
1 queue
(empty)
1 queue with
1M messages
2 identical
queues, each
with 1M
messages
https://www.rabbitmq.com/blog/2014/10/30/
understanding-memory-use-with-rabbitmq-
3-4/
40
Basic Benchmarking - Overview
• Experimental setup
• Throughput rates
– Publish only
– Consume only
– Simultaneous publish and consume
• Memory consumption scheme
• Impact of disk access
41
Impact of Disk Access – Publish Phase
New load: 10^6 message of 5kb each
Total size ~ 5 GB (beyond limit)
During swap-to-disk periods, back pressure is
the highest (publisher completely stopped)
Indicators of disk access
42
Impact of Disk Access – Consume Phase
At the beginning, messages are served from
memory, at a reasonable rate (15k/s)
Once it starts to hit disk, the rates drop
drastically (to less than 500/s)
43 <Change information classification in footer>
Part III:
Clustering
44
• Concepts
• Basic benchmarking (for clustering)
• Load balancing
Clustering - Overview
45
• Concepts
• Basic benchmarking (for clustering)
• Load balancing
Clustering - Overview
46
Distribution Alternatives in RabbitMQ
https://www.rabbitmq.com/distributed.html
meaning it
is hard to
get through
firewalls,
which are
typically
open in one
direction
only
47
• Scale-out
– Focus of this slide deck
• High availability/Fail-over (through mirrored queues)
– not discussed here (see https://www.rabbitmq.com/ha.html)
Clustering in RabbitMQ: Benefits
48
• All data/state required for the operation of a RabbitMQ broker is replicated across all
nodes.
• An exception to this are message queues, which by default reside on one node,
though they are visible and reachable from all nodes.
• To replicate queues across nodes in a cluster, see the documentation on high availability
(note that you will need a working cluster first).
Clustering in RabbitMQ: What is Replicated?
https://www.rabbitmq.com/clustering.html
49
• Queues within a RabbitMQ cluster are located on a single node (by default, the
node on which they were first declared), called home node or queue master
– This is in contrast to exchanges and bindings, which can always be considered to be on all nodes.
• Queues can optionally be made mirrored across multiple nodes
– All queue operations go through the master first and then are replicated to mirrors.
• This is necessary to guarantee FIFO ordering of messages.
– Consumers are connected to the master regardless of which node they connect to
• Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating
nodes each do all the work).
Crucial Details about Queues in a RabbitMQ Cluster
https://www.rabbitmq.com/ha.html
50
• Cluster can formed in a number of ways:
– Manually with rabbitmqctl
– Declaratively by listing cluster nodes in config file
– Declaratively with plugins
• Two options:
– rabbitmq-autocluster
– rabbitmq-clusterer
• The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out
as running on a single node. These nodes can be joined into clusters, and subsequently
turned back into individual brokers again.
Clustering in RabbitMQ: Cluster Formation Alternatives
https://github.com/harbur/docker-rabbitmq-cluster
51
• Concepts
• Basic benchmarking (for clustering)
• Load balancing
Clustering - Overview
52
• A cluster of two nodes:
– Node 1 (N1):
• Operating System: Ubuntu 14.04.4 LTS, Kernel Version: 3.16.0-71-generic
• CPUs: 4
• Total Memory: 3.774 GiB
– Node 2 (N2):
• Operating System: Ubuntu 14.04.4 LTS , Kernel Version: 3.13.0-91-generic
• CPUs: 8
• Total Memory: 7.715 GiB
– Network connection: Ethernet cable (1GB/s)
• No high availability (i.e. mirrored queues)
• Default values inherited from Part II
– Addition: clients running on N2
Basic Benchmarking - Setup
53
• Impact of network latency
• Impact of locality
– (Both Producer/Consumer connected directly to the queue node)
– Both Producer/Consumer connected indirectly to the queue node
– Producer directly to the queue node, consumer indirectly
– Producer indirectly to the queue node, consumer directly
Basic Benchmarking - Scenarios
54
Impact of Network Latency
Q/P/C hosted-on/connected-to N2
P & C running-on N1
Q/P/C hosted-on/connected-to N2
P & C running-on N2
Remarks:
1) No backlog in either case
2) Comparable throughput
(indicating that in LAN
setup, network latency is
not a decisive factor)
55
Impact of Locality: Indirect Producer & Indirect Consumer
Q/P/C hosted-on/connected-to N2 Q hosted-on N1
P/C connected-to N2
Remarks:
1) Producer and Consumer are
connected to a the queue
through a proxy node
2) Both have lower
throughputs
3) Backlog is building up
56
Impact of Locality: Direct Producer & Indirect Consumer
Q /P hosted-on/ connected-to N2
C connected-to N1
Remarks:
1) Lowest output throughput
2) Fastest queue length growth
Q/P/C hosted-on/connected-to N2
57
Impact of Locality: Indirect Producer & Direct Consumer
Q/C hosted-on /connected-to N2
P connected-to N1
Q/P/C hosted-on/connected-to N2
Remarks:
1) Moderate overall throughput
2) No backlog
58
Inter-Node Data Transfer
Q/C hosted-on /connected-to N2
P connected-to N1
Q /P hosted-on/ connected-to N2
C connected-to N1
Q hosted-on N1
P/C connected-to N2
59
• Queues in RabbitMQ have one “home” node and all related operations go through those
nodes
• This highlights the importance of “locality” for performance (throughput + backlog)
– Q/P/C all co-located => highest throughput, empty queue
• Best case scenario
– Q/C co-located => moderate levels of throughput, empty queue
– Q/P co-located => relatively low throughput, increasing backlog (fastest)
– Neither co-located Q/C or co-located Q/P, lowest throughput, increasing backlog
Conclusions
60
• Concepts
• Basic Benchmarking (for clustering)
• Load balancing
Clustering - Overview
61
• “Many Queues” scenario
– Large number of (small) queues
– Problem: queues are by default created on the node
the client is connected to, resulting in an imbalance in
long run (see the figure for an example)
• Mitigated in newer versions (see next slide)
– Focus of this slide set
• “Large Queues” scenario
– A few (large) queues
– Problem: how to share the load of a big ‘logical’ queue
among different brokers
– Not discussed here, for a proposal, see this post:
• https://insidethecpu.com/2014/11/17/load-balancing-a-
rabbitmq-cluster/
Load Balancing in a RabbitMQ Cluster: Two Different Scenarios
62
• Good news: newer versions of RabbitMQ (> 3.6.0) provide control over where to create
master queues
– Through “Queue Master Locator” strategies
• Proposed Solution: a service (part of WWS Deployment component) that
– a) creates queues, ensuring a balanced output
• With the help of the locator feature of RabbitMQ
– b) for each created queue, figures out its “home” node
• Using the a REST call to the Management API of RabbitMQ
– c) points the producer(s) and consumer(s) of the queue to the right node
Load Balancing RabbitMQ Cluster : The “Many Queues” Scenario
63
• Queue masters can be distributed between nodes using several
strategies. Which strategy is used is controlled in three ways:
– using the x-queue-master-locator queue declare argument
– setting the queue-master-locator policy key
– by defining the queue_master_locator key in the configuration file.
• Here are the possible strategies:
– min-masters: pick the node hosting the minimum number of masters
– client-local: pick the node the client that declares the queue is connected to
– random: pick a random node
Queue Master Location
https://www.rabbitmq.com/ha.html
https://www.erlang-solutions.com/blog/take-control-of-your-rabbitmq-queues.html
64
• Sample code in Python (using Pika client)
Queue Location Setting: Option 1 – Through Queue Declare Arguments
queue_name = 'microservice.queue.1'
args = {"x-queue-master-locator": "min-masters"}
channel.queue_declare(queue = queue_name, durable = True,
arguments = args )
65
• A sample policy, set up by REST call to Management API
– (sample) endpoint:
• http://192.168.0.108:15672/api/policies/%2ftest/min-masters
– Verb:
• PUT
– Body content =>
• Result:
Queue Location Setting: Option 2 – Through Policy
{"pattern":"^min-masters",
"definition":{
"queue-master-locator":"min-
masters",
"apply-to": "queues”
}
}
66
Queue Master Locator Policy In Practice
Step 1: create 5 queues
with names that don’t
match the policy pattern
Result: all on the same
broker (that client is
connected to)
Step 2: create 9 additional
queues, with names that
match the policy pattern
Result: they are
distributed fairly across
the two brokers
67
• The corresponding entry line =>
– Note, default is “client-local”
• In practice
– result after creating 8 queues =>
Queue Location Setting: Option 3 – Through Config File
{rabbit,[ .
.
{queue_master_locator, <<"min-masters">>},
.
. ]},
NOTE: it may make sense to make
“min-masters” our default
68
• A REST call to the Management API
– (sample) endpoint:
• http://localhost:15672/api/queues/%2ftest/min-masters.queue9
– Verb:
• GET
• Sample output =>
Retrieve Home Node of a Queue
{
"name":"min-masters.queue9",
"vhost":"/test",
"durable":false,
"auto_delete":false,
"exclusive":false,
"arguments":{},
"node":"rabbit@broker2",
...}
69 <Change information classification in footer>
Part IV:
Summary
70
• RabbitMQ has a very consequential back-pressure mechanism (Flow Control)
• Keep your queues empty! (memory and cpu overhead grows quickly with the length)
• Clustering is not fully transparent (loss of locality vs metadata store)
• Management API exposes a wealth of useful information (particularly, look out for the
node stats, “flow” signs, “disk read/write rates”)
A Few Lessons Learned
71
• Use separate connections for producers and consumers
• Use more than one connection for high-load producers
• Use message batching, if possible
– Amortized overhead
– Increase in latency
• Use distinct user credentials!
– Helps with troubleshooting
A Few Lessons Learned (cont.)

Contenu connexe

Tendances

Tendances (20)

IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
IBM MQ: An Introduction to Using and Developing with MQ Publish/SubscribeIBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
 
Introduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQIntroduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQ
 
Messaging with RabbitMQ and AMQP
Messaging with RabbitMQ and AMQPMessaging with RabbitMQ and AMQP
Messaging with RabbitMQ and AMQP
 
IBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ NetworkIBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ Network
 
Message Oriented Middleware (MOM)
Message Oriented Middleware (MOM)Message Oriented Middleware (MOM)
Message Oriented Middleware (MOM)
 
RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1
 
Enterprise messaging with jms
Enterprise messaging with jmsEnterprise messaging with jms
Enterprise messaging with jms
 
Rabbitmq & Kafka Presentation
Rabbitmq & Kafka PresentationRabbitmq & Kafka Presentation
Rabbitmq & Kafka Presentation
 
RabbitMQ and AMQP with .net client library
RabbitMQ and AMQP with .net client libraryRabbitMQ and AMQP with .net client library
RabbitMQ and AMQP with .net client library
 
Overview of Message Queues
Overview of Message QueuesOverview of Message Queues
Overview of Message Queues
 
Orchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQOrchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQ
 
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
 
Getting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentationGetting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentation
 
Message Broker System and RabbitMQ
Message Broker System and RabbitMQMessage Broker System and RabbitMQ
Message Broker System and RabbitMQ
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II Webinar
 
Sctp
SctpSctp
Sctp
 
RabbitMQ interview Questions and Answers
RabbitMQ interview Questions and AnswersRabbitMQ interview Questions and Answers
RabbitMQ interview Questions and Answers
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Congestion control in TCP
Congestion control in TCPCongestion control in TCP
Congestion control in TCP
 
message passing
 message passing message passing
message passing
 

Similaire à A Closer Look at RabbitMQ

The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
ch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.pptch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.ppt
AbyThomas54
 

Similaire à A Closer Look at RabbitMQ (20)

IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
 
Tef con2016 (1)
Tef con2016 (1)Tef con2016 (1)
Tef con2016 (1)
 
Art Of Message Queues
Art Of Message QueuesArt Of Message Queues
Art Of Message Queues
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Rabbitmq an amqp message broker
Rabbitmq an amqp message brokerRabbitmq an amqp message broker
Rabbitmq an amqp message broker
 
Tcp(no ip) review part2
Tcp(no ip) review part2Tcp(no ip) review part2
Tcp(no ip) review part2
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
 
Architecting for the cloud scability-availability
Architecting for the cloud scability-availabilityArchitecting for the cloud scability-availability
Architecting for the cloud scability-availability
 
Monitoring Clusters and Load Balancers
Monitoring Clusters and Load BalancersMonitoring Clusters and Load Balancers
Monitoring Clusters and Load Balancers
 
Mini proj ii sdn video communication
Mini proj ii   sdn video communicationMini proj ii   sdn video communication
Mini proj ii sdn video communication
 
Distributed Systems in Data Engineering
Distributed Systems in Data EngineeringDistributed Systems in Data Engineering
Distributed Systems in Data Engineering
 
Reactive solutions using java 9 and spring reactor
Reactive solutions using java 9 and spring reactorReactive solutions using java 9 and spring reactor
Reactive solutions using java 9 and spring reactor
 
Transport layer
Transport layerTransport layer
Transport layer
 
Comparison of Reporting architectures
Comparison of Reporting architecturesComparison of Reporting architectures
Comparison of Reporting architectures
 
Congestion control 1
Congestion control 1Congestion control 1
Congestion control 1
 
Congestion control
Congestion controlCongestion control
Congestion control
 
ch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.pptch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.ppt
 
ch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.pptch24-congestion-control-and-quality-of-service.ppt
ch24-congestion-control-and-quality-of-service.ppt
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
DLC_23 (3).pptx
DLC_23 (3).pptxDLC_23 (3).pptx
DLC_23 (3).pptx
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

A Closer Look at RabbitMQ

  • 1. © 2016 Nokia1 A Closer Look at RabbitMQ (i.e. a few notes on its internals, performance, and clustering) Kyumars Sheykh Esmaili (with additional inputs from Philippe Dobbelaere) July 2016
  • 2. 2 • Part I: Internals (slide no. 3) • Part II: Basic Benchmarking (slide no. 24) • Part III: Clustering (slide no. 43) • Part IV: Summary (slide no.69) Outline
  • 3. 3 <Change information classification in footer> Part I: Internals
  • 4. 4 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  • 5. 5 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  • 6. 6 • There are two ways for applications to consume messages from a queue: – Have messages delivered to them ("push API"): i.e. ‘basic.deliver’ • It does not require a roundtrip to the broker => can achieve higher delivery rates – Fetch messages as needed ("pull API"): e.g. ‘basic.get’ • In either case, a consumer can acknowledge messages – Use of acknowledgments allows for stronger delivery guarantees (using ‘basic.ack’) • It’s possible to ACK more than one message at once => allows stronger guarantees with higher performance • If client fails to acknowledge, RabbitMQ re-enqueues the messages – This can be turned off by setting the ‘auto-ack/no-ack’ option • Will result in higher performance, but weaker guarantees Message Consumption Alternatives in AMQP 0.9.1 https://www.rabbitmq.com/amqp-0-9-1-reference.html
  • 7. 7 • What’s Prefetch? – while consuming messages, the client can request that messages be sent in advance so that when the client finishes processing a message, the following message is already held locally, rather than needing to be sent down the channel – So, it is really a "windowed ack" implementation for push mode • Prefetching gives performance improvement • Prefetch is only applicable when consumer acknowledgement is enabled – They are ignored if the no-ack option is set Consumption Speed-up through Prefetch https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/ https://www.rabbitmq.com/amqp-0-9-1-reference.html
  • 8. 8 • Limits are set through prefetch-size or prefetch-count – Of the basic.qos method – Default value = 0 • meaning "no specific limit" • The server may send less data in advance than allowed by the client's specified prefetch windows but it MUST NOT send more. • The server MUST ignore this setting when the client is not processing any messages • AMQP defines this per channel; Optionally RabbitMQ can apply this per individual consumer Prefetch Parameters https://www.rabbitmq.com/amqp-0-9-1-reference.html
  • 9. 9 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher-side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  • 10. 10 • RabbitMQ provides useful information to help users spot bottlenecks • Two groups of bottlenecks: on the publisher side (more critical) or on the consumer side • On the publisher side, RabbitMQ has a very effective (and somehow aggressive?) backpressure mechanism called “Flow Control”, to mitigate bottlenecks and avoid crashes • Resource (Memory/Disk) Alarms are also integrated into the backpressure mechanism (serve as triggers) • On the consumer side, Consumer Utilization can provide useful hints Finding Bottlenecks in RabbitMQ
  • 11. 11 The Publisher Side of RabbitMQ: Stages & Their Responsibilities https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/ • Side note: there is no one-to-one mapping between "processes" and "architectural components” of RabbitMQ: e.g. while queues are actual processes, exchanges are not. hence the routing is part of the channel process, but since most of the LOGIC is in exchange/routing, this overview is a bit under-representing it.
  • 13. 13 • In order to prevent any of those processes from overflowing the next one down the chain, we have a credit flow mechanism in place. • Each process initially grants certain amount of credits to the process that it’s sending them messages. Once a process is able to handle N of those messages, it will grant more credit to the process that sent them Flow Control in RabbitMQ through “Flow Credit” https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/ reader -> channel -> queue process -> message store reader <--[grant]-- channel <--[grant]-- queue process <--[grant]-- message store.
  • 14. 14 The Flow Control Sign & Examples That’s the Flow Control sign (under the ‘state’ column) Example for Queue Flow Control Examples for Connection Flow Control
  • 15. 15 • If a connection is in flow control, but none of its channels are - This means that one or more of the channels is the bottleneck; the server is CPU-bound on something the channel does, probably routing logic. This is most likely to be seen when publishing small transient messages. • If a connection is in flow control, some of its channels are, but none of the queues it is publishing to are - This means that one or more of the queues is the bottleneck; the server is either CPU-bound on accepting messages into the queue or I/O-bound on writing queue indexes to disc. This is most likely to be seen when publishing small persistent messages. • if a connection is in flow control, some of its channels are, and so are some of the queues it is publishing to - This means that the message store is the bottleneck; the server is I/O-bound on writing messages to disc. This is most likely to be seen when publishing larger persistent messages. VERY IMPORTANT: How to Decode Flow Control Signs https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
  • 17. 17 • Related config variables – vm_memory_high_watermark • It sets the upper limit of how much of the 'installed' memory on the machine RabbitMQ can use • Default: 0.4 (I changed it to 0.6) – vm_memory_high_watermark_paging _ratio • It sets a ratio on the above limit, to tell RabbitMQ when to start moving messages from memory to the disk • Default: 0.5 (I changed it to 1.0) • See also – https://www.rabbitmq.com/production- checklist.html Alarms: Memory/Disk Allocation & Flow Control Triggers
  • 18. 18 Resource Allocation/Utilization: Where to Check Them Important in cluster-setup
  • 19. 19 • The flow control mechanism doesn't extend as far as consumers, but we do have a new metric to help you tell how hard your consumers are working. • That metric is consumer utilization. The definition of consumer utilization is the proportion of time that a queue's consumers could take new messages. It's thus a number from 0 to 1, or 0% to 100% (or N/A if the queue has no consumers). • So if a queue has a consumer utilization of 100% then it never needs to wait for its consumers; it's always able to push messages out to them as fast as it can. Finding Bottlenecks: The Consumer Side https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
  • 20. 20 • If its utilization is less than 100% then this implies that its consumers are sometimes not able to take messages. Network congestion can limit the utilization you can achieve, or low utilization can be due to the use of too low a prefetch limit, leading to the queue needing to wait while the consumer processes messages until it can send out more. Consumer Utilization (cont.)
  • 21. 21 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  • 22. 22 • RabbitMQ's queues are fastest when they're empty. – When a queue is empty, and it has consumers ready to receive messages, then as soon as a message is received by the queue, it goes straight out to the consumer. – The main point is that very little book-keeping needs to be done, very few data structures are modified, and very little additional memory needs allocating. Consequently, the CPU load of a message going through an empty queue is very small. • If the queue is not empty then a bit more work has to be done: – the messages have to actually be queued up. Initially, this too is fast and cheap as the underlying functional data structures are very fast. – Nevertheless, by holding on to messages, the overall memory usage of the queue will be higher, – and we are doing more work than before per message (each message is being both enqueued and dequeued now, whereas before each message was just going straight out to a consumer), so the CPU cost per message is higher. – data structures are optimized to be fast when queues are near to empty Queue Length: Benefits of Empty/Near-empty Queues https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
  • 23. 23 • Additionally, if a queue receives a spike of publications, then the queue must spend time dealing with those publications, which takes CPU time away from sending existing messages out to consumers: – a queue of a million messages will be able to be drained out to ready consumers at a much higher rate if there are no publications arriving at the queue to distract it. • Eventually, as a queue grows, it'll become so big that we have to start writing messages out to disk and forgetting about them from RAM in order to free up RAM. – At this point, the CPU cost per message is much higher than had the message been dealt with by an empty queue. – and more importantly, the latency per message grow drastically, regardless of CPU utilization due to the slow path over the disk Queue Length: Growing Overhead of Bursty/Long Queues https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/ These statements have been experimentally verified (see Part II of this slide set)
  • 24. 24 <Change information classification in footer> Part II: Basic Benchmarking
  • 25. 25 Basic Benchmarking - Overview • Experimental Setup • Throughput rates – Publish Only – Consume Only – Simultaneous Publish and Consume • Memory consumption scheme • Impact of disk access
  • 26. 26 Basic Benchmarking - Overview • Experimental Setup • Throughput rates – Publish Only – Consume Only – Simultaneous Publish and Consume • Memory consumption scheme • Impact of disk access
  • 27. 27 • Machine: – Kernel Version: 3.13.0-91-generic – Operating System: Ubuntu 14.04.4 LTS – CPUs: 8 – Total Memory: 7.715 GiB • RabbitMQ version: 3.5.3 • Client: Java client's PerfTest (next slide) • Server and client on the same machine • Server running in a Docker container; client running on the host Experimental Setup
  • 28. 28 Performance Measurement Tool: PerfTest https://www.rabbitmq.com/java-tools.html
  • 29. 29 • #consumers: 1 • #producers: 1 • #queues: 1 • Load: 1 million (10^6) messages of size 1kb each • Exchange of type topic; routing key “pertest” • Acknowledgement enabled (prefetch: unlimited) Default Setup
  • 30. 30 Disclaimer: no benchmarking is flawless or complete
  • 31. 31 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  • 32. 32 Throughput Rates - Publish Only NOTE: back pressure kicks in for all these cases No bound queue 1 bound queue 2 bound queues Increaseinroutingoverhead Increaseinlevelofbackpressure Increaseinmemory/CPUutilization
  • 33. 33 Throughput Rates - Consume Only (Ack vs No-Ack) Acknowledgment has a significant impact on throughput
  • 34. 34 Throughput Rates - Consume Only (Varying prefetch count) prefetch count = 1 prefetch count = 100 prefetch count = 10 prefetch count = 0 (unlimited)
  • 35. 35 Simultaneous Publish and Consume (varying Numbers) 1 publisher, 1 consumer 2 publishers, 1 consumer 2 publishers, 2 consumers 5 publishers, 5 consumers
  • 36. 36 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  • 37. 37 Memory Consumption Scheme – A Long, Stable Queue Notice the considerable memory consumed by the queue process itself (for message metadata, indexes,…) Injected 1M messages, each 1kb => total ~ 1GB
  • 38. 38 Memory Consumption Scheme – An Active-but-almost-empty Queue The queue process consumes very little memory Empty queue
  • 39. 39 • In RabbitMQ, memory used by message bodies is shared among processes – Under a group called “Binaries” • This sharing also happens between queues too – if an exchange routes a message to many queues, the message body is only stored in memory once. Memory Consumption Scheme – Sharing Across Queues 1 queue (empty) 1 queue with 1M messages 2 identical queues, each with 1M messages https://www.rabbitmq.com/blog/2014/10/30/ understanding-memory-use-with-rabbitmq- 3-4/
  • 40. 40 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  • 41. 41 Impact of Disk Access – Publish Phase New load: 10^6 message of 5kb each Total size ~ 5 GB (beyond limit) During swap-to-disk periods, back pressure is the highest (publisher completely stopped) Indicators of disk access
  • 42. 42 Impact of Disk Access – Consume Phase At the beginning, messages are served from memory, at a reasonable rate (15k/s) Once it starts to hit disk, the rates drop drastically (to less than 500/s)
  • 43. 43 <Change information classification in footer> Part III: Clustering
  • 44. 44 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  • 45. 45 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  • 46. 46 Distribution Alternatives in RabbitMQ https://www.rabbitmq.com/distributed.html meaning it is hard to get through firewalls, which are typically open in one direction only
  • 47. 47 • Scale-out – Focus of this slide deck • High availability/Fail-over (through mirrored queues) – not discussed here (see https://www.rabbitmq.com/ha.html) Clustering in RabbitMQ: Benefits
  • 48. 48 • All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. • An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. • To replicate queues across nodes in a cluster, see the documentation on high availability (note that you will need a working cluster first). Clustering in RabbitMQ: What is Replicated? https://www.rabbitmq.com/clustering.html
  • 49. 49 • Queues within a RabbitMQ cluster are located on a single node (by default, the node on which they were first declared), called home node or queue master – This is in contrast to exchanges and bindings, which can always be considered to be on all nodes. • Queues can optionally be made mirrored across multiple nodes – All queue operations go through the master first and then are replicated to mirrors. • This is necessary to guarantee FIFO ordering of messages. – Consumers are connected to the master regardless of which node they connect to • Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating nodes each do all the work). Crucial Details about Queues in a RabbitMQ Cluster https://www.rabbitmq.com/ha.html
  • 50. 50 • Cluster can formed in a number of ways: – Manually with rabbitmqctl – Declaratively by listing cluster nodes in config file – Declaratively with plugins • Two options: – rabbitmq-autocluster – rabbitmq-clusterer • The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out as running on a single node. These nodes can be joined into clusters, and subsequently turned back into individual brokers again. Clustering in RabbitMQ: Cluster Formation Alternatives https://github.com/harbur/docker-rabbitmq-cluster
  • 51. 51 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  • 52. 52 • A cluster of two nodes: – Node 1 (N1): • Operating System: Ubuntu 14.04.4 LTS, Kernel Version: 3.16.0-71-generic • CPUs: 4 • Total Memory: 3.774 GiB – Node 2 (N2): • Operating System: Ubuntu 14.04.4 LTS , Kernel Version: 3.13.0-91-generic • CPUs: 8 • Total Memory: 7.715 GiB – Network connection: Ethernet cable (1GB/s) • No high availability (i.e. mirrored queues) • Default values inherited from Part II – Addition: clients running on N2 Basic Benchmarking - Setup
  • 53. 53 • Impact of network latency • Impact of locality – (Both Producer/Consumer connected directly to the queue node) – Both Producer/Consumer connected indirectly to the queue node – Producer directly to the queue node, consumer indirectly – Producer indirectly to the queue node, consumer directly Basic Benchmarking - Scenarios
  • 54. 54 Impact of Network Latency Q/P/C hosted-on/connected-to N2 P & C running-on N1 Q/P/C hosted-on/connected-to N2 P & C running-on N2 Remarks: 1) No backlog in either case 2) Comparable throughput (indicating that in LAN setup, network latency is not a decisive factor)
  • 55. 55 Impact of Locality: Indirect Producer & Indirect Consumer Q/P/C hosted-on/connected-to N2 Q hosted-on N1 P/C connected-to N2 Remarks: 1) Producer and Consumer are connected to a the queue through a proxy node 2) Both have lower throughputs 3) Backlog is building up
  • 56. 56 Impact of Locality: Direct Producer & Indirect Consumer Q /P hosted-on/ connected-to N2 C connected-to N1 Remarks: 1) Lowest output throughput 2) Fastest queue length growth Q/P/C hosted-on/connected-to N2
  • 57. 57 Impact of Locality: Indirect Producer & Direct Consumer Q/C hosted-on /connected-to N2 P connected-to N1 Q/P/C hosted-on/connected-to N2 Remarks: 1) Moderate overall throughput 2) No backlog
  • 58. 58 Inter-Node Data Transfer Q/C hosted-on /connected-to N2 P connected-to N1 Q /P hosted-on/ connected-to N2 C connected-to N1 Q hosted-on N1 P/C connected-to N2
  • 59. 59 • Queues in RabbitMQ have one “home” node and all related operations go through those nodes • This highlights the importance of “locality” for performance (throughput + backlog) – Q/P/C all co-located => highest throughput, empty queue • Best case scenario – Q/C co-located => moderate levels of throughput, empty queue – Q/P co-located => relatively low throughput, increasing backlog (fastest) – Neither co-located Q/C or co-located Q/P, lowest throughput, increasing backlog Conclusions
  • 60. 60 • Concepts • Basic Benchmarking (for clustering) • Load balancing Clustering - Overview
  • 61. 61 • “Many Queues” scenario – Large number of (small) queues – Problem: queues are by default created on the node the client is connected to, resulting in an imbalance in long run (see the figure for an example) • Mitigated in newer versions (see next slide) – Focus of this slide set • “Large Queues” scenario – A few (large) queues – Problem: how to share the load of a big ‘logical’ queue among different brokers – Not discussed here, for a proposal, see this post: • https://insidethecpu.com/2014/11/17/load-balancing-a- rabbitmq-cluster/ Load Balancing in a RabbitMQ Cluster: Two Different Scenarios
  • 62. 62 • Good news: newer versions of RabbitMQ (> 3.6.0) provide control over where to create master queues – Through “Queue Master Locator” strategies • Proposed Solution: a service (part of WWS Deployment component) that – a) creates queues, ensuring a balanced output • With the help of the locator feature of RabbitMQ – b) for each created queue, figures out its “home” node • Using the a REST call to the Management API of RabbitMQ – c) points the producer(s) and consumer(s) of the queue to the right node Load Balancing RabbitMQ Cluster : The “Many Queues” Scenario
  • 63. 63 • Queue masters can be distributed between nodes using several strategies. Which strategy is used is controlled in three ways: – using the x-queue-master-locator queue declare argument – setting the queue-master-locator policy key – by defining the queue_master_locator key in the configuration file. • Here are the possible strategies: – min-masters: pick the node hosting the minimum number of masters – client-local: pick the node the client that declares the queue is connected to – random: pick a random node Queue Master Location https://www.rabbitmq.com/ha.html https://www.erlang-solutions.com/blog/take-control-of-your-rabbitmq-queues.html
  • 64. 64 • Sample code in Python (using Pika client) Queue Location Setting: Option 1 – Through Queue Declare Arguments queue_name = 'microservice.queue.1' args = {"x-queue-master-locator": "min-masters"} channel.queue_declare(queue = queue_name, durable = True, arguments = args )
  • 65. 65 • A sample policy, set up by REST call to Management API – (sample) endpoint: • http://192.168.0.108:15672/api/policies/%2ftest/min-masters – Verb: • PUT – Body content => • Result: Queue Location Setting: Option 2 – Through Policy {"pattern":"^min-masters", "definition":{ "queue-master-locator":"min- masters", "apply-to": "queues” } }
  • 66. 66 Queue Master Locator Policy In Practice Step 1: create 5 queues with names that don’t match the policy pattern Result: all on the same broker (that client is connected to) Step 2: create 9 additional queues, with names that match the policy pattern Result: they are distributed fairly across the two brokers
  • 67. 67 • The corresponding entry line => – Note, default is “client-local” • In practice – result after creating 8 queues => Queue Location Setting: Option 3 – Through Config File {rabbit,[ . . {queue_master_locator, <<"min-masters">>}, . . ]}, NOTE: it may make sense to make “min-masters” our default
  • 68. 68 • A REST call to the Management API – (sample) endpoint: • http://localhost:15672/api/queues/%2ftest/min-masters.queue9 – Verb: • GET • Sample output => Retrieve Home Node of a Queue { "name":"min-masters.queue9", "vhost":"/test", "durable":false, "auto_delete":false, "exclusive":false, "arguments":{}, "node":"rabbit@broker2", ...}
  • 69. 69 <Change information classification in footer> Part IV: Summary
  • 70. 70 • RabbitMQ has a very consequential back-pressure mechanism (Flow Control) • Keep your queues empty! (memory and cpu overhead grows quickly with the length) • Clustering is not fully transparent (loss of locality vs metadata store) • Management API exposes a wealth of useful information (particularly, look out for the node stats, “flow” signs, “disk read/write rates”) A Few Lessons Learned
  • 71. 71 • Use separate connections for producers and consumers • Use more than one connection for high-load producers • Use message batching, if possible – Amortized overhead – Increase in latency • Use distinct user credentials! – Helps with troubleshooting A Few Lessons Learned (cont.)