Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

A Closer Look at RabbitMQ

A few notes on internals, performance, and clustering of RabbitMQ.

  • Identifiez-vous pour voir les commentaires

A Closer Look at RabbitMQ

  1. 1. © 2016 Nokia1 A Closer Look at RabbitMQ (i.e. a few notes on its internals, performance, and clustering) Kyumars Sheykh Esmaili (with additional inputs from Philippe Dobbelaere) July 2016
  2. 2. 2 • Part I: Internals (slide no. 3) • Part II: Basic Benchmarking (slide no. 24) • Part III: Clustering (slide no. 43) • Part IV: Summary (slide no.69) Outline
  3. 3. 3 <Change information classification in footer> Part I: Internals
  4. 4. 4 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  5. 5. 5 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  6. 6. 6 • There are two ways for applications to consume messages from a queue: – Have messages delivered to them ("push API"): i.e. ‘basic.deliver’ • It does not require a roundtrip to the broker => can achieve higher delivery rates – Fetch messages as needed ("pull API"): e.g. ‘basic.get’ • In either case, a consumer can acknowledge messages – Use of acknowledgments allows for stronger delivery guarantees (using ‘basic.ack’) • It’s possible to ACK more than one message at once => allows stronger guarantees with higher performance • If client fails to acknowledge, RabbitMQ re-enqueues the messages – This can be turned off by setting the ‘auto-ack/no-ack’ option • Will result in higher performance, but weaker guarantees Message Consumption Alternatives in AMQP 0.9.1 https://www.rabbitmq.com/amqp-0-9-1-reference.html
  7. 7. 7 • What’s Prefetch? – while consuming messages, the client can request that messages be sent in advance so that when the client finishes processing a message, the following message is already held locally, rather than needing to be sent down the channel – So, it is really a "windowed ack" implementation for push mode • Prefetching gives performance improvement • Prefetch is only applicable when consumer acknowledgement is enabled – They are ignored if the no-ack option is set Consumption Speed-up through Prefetch https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/ https://www.rabbitmq.com/amqp-0-9-1-reference.html
  8. 8. 8 • Limits are set through prefetch-size or prefetch-count – Of the basic.qos method – Default value = 0 • meaning "no specific limit" • The server may send less data in advance than allowed by the client's specified prefetch windows but it MUST NOT send more. • The server MUST ignore this setting when the client is not processing any messages • AMQP defines this per channel; Optionally RabbitMQ can apply this per individual consumer Prefetch Parameters https://www.rabbitmq.com/amqp-0-9-1-reference.html
  9. 9. 9 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher-side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  10. 10. 10 • RabbitMQ provides useful information to help users spot bottlenecks • Two groups of bottlenecks: on the publisher side (more critical) or on the consumer side • On the publisher side, RabbitMQ has a very effective (and somehow aggressive?) backpressure mechanism called “Flow Control”, to mitigate bottlenecks and avoid crashes • Resource (Memory/Disk) Alarms are also integrated into the backpressure mechanism (serve as triggers) • On the consumer side, Consumer Utilization can provide useful hints Finding Bottlenecks in RabbitMQ
  11. 11. 11 The Publisher Side of RabbitMQ: Stages & Their Responsibilities https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/ • Side note: there is no one-to-one mapping between "processes" and "architectural components” of RabbitMQ: e.g. while queues are actual processes, exchanges are not. hence the routing is part of the channel process, but since most of the LOGIC is in exchange/routing, this overview is a bit under-representing it.
  12. 12. 12 Further Elaboration (from Philippe)
  13. 13. 13 • In order to prevent any of those processes from overflowing the next one down the chain, we have a credit flow mechanism in place. • Each process initially grants certain amount of credits to the process that it’s sending them messages. Once a process is able to handle N of those messages, it will grant more credit to the process that sent them Flow Control in RabbitMQ through “Flow Credit” https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/ reader -> channel -> queue process -> message store reader <--[grant]-- channel <--[grant]-- queue process <--[grant]-- message store.
  14. 14. 14 The Flow Control Sign & Examples That’s the Flow Control sign (under the ‘state’ column) Example for Queue Flow Control Examples for Connection Flow Control
  15. 15. 15 • If a connection is in flow control, but none of its channels are - This means that one or more of the channels is the bottleneck; the server is CPU-bound on something the channel does, probably routing logic. This is most likely to be seen when publishing small transient messages. • If a connection is in flow control, some of its channels are, but none of the queues it is publishing to are - This means that one or more of the queues is the bottleneck; the server is either CPU-bound on accepting messages into the queue or I/O-bound on writing queue indexes to disc. This is most likely to be seen when publishing small persistent messages. • if a connection is in flow control, some of its channels are, and so are some of the queues it is publishing to - This means that the message store is the bottleneck; the server is I/O-bound on writing messages to disc. This is most likely to be seen when publishing larger persistent messages. VERY IMPORTANT: How to Decode Flow Control Signs https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
  16. 16. 16 Further Elaboration (from Philippe)
  17. 17. 17 • Related config variables – vm_memory_high_watermark • It sets the upper limit of how much of the 'installed' memory on the machine RabbitMQ can use • Default: 0.4 (I changed it to 0.6) – vm_memory_high_watermark_paging _ratio • It sets a ratio on the above limit, to tell RabbitMQ when to start moving messages from memory to the disk • Default: 0.5 (I changed it to 1.0) • See also – https://www.rabbitmq.com/production- checklist.html Alarms: Memory/Disk Allocation & Flow Control Triggers
  18. 18. 18 Resource Allocation/Utilization: Where to Check Them Important in cluster-setup
  19. 19. 19 • The flow control mechanism doesn't extend as far as consumers, but we do have a new metric to help you tell how hard your consumers are working. • That metric is consumer utilization. The definition of consumer utilization is the proportion of time that a queue's consumers could take new messages. It's thus a number from 0 to 1, or 0% to 100% (or N/A if the queue has no consumers). • So if a queue has a consumer utilization of 100% then it never needs to wait for its consumers; it's always able to push messages out to them as fast as it can. Finding Bottlenecks: The Consumer Side https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
  20. 20. 20 • If its utilization is less than 100% then this implies that its consumers are sometimes not able to take messages. Network congestion can limit the utilization you can achieve, or low utilization can be due to the use of too low a prefetch limit, leading to the queue needing to wait while the consumer processes messages until it can send out more. Consumer Utilization (cont.)
  21. 21. 21 • Consumption alternatives: Push (deliver) vs Pull (get) – Acknowledgements (=> impact of prefetch) • Finding bottlenecks in RabbitMQ – Publisher side: Flow Control • Related topic: Alarms – Consumer side: Consumer Utilization • Impact of queue length Internals - Overview
  22. 22. 22 • RabbitMQ's queues are fastest when they're empty. – When a queue is empty, and it has consumers ready to receive messages, then as soon as a message is received by the queue, it goes straight out to the consumer. – The main point is that very little book-keeping needs to be done, very few data structures are modified, and very little additional memory needs allocating. Consequently, the CPU load of a message going through an empty queue is very small. • If the queue is not empty then a bit more work has to be done: – the messages have to actually be queued up. Initially, this too is fast and cheap as the underlying functional data structures are very fast. – Nevertheless, by holding on to messages, the overall memory usage of the queue will be higher, – and we are doing more work than before per message (each message is being both enqueued and dequeued now, whereas before each message was just going straight out to a consumer), so the CPU cost per message is higher. – data structures are optimized to be fast when queues are near to empty Queue Length: Benefits of Empty/Near-empty Queues https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
  23. 23. 23 • Additionally, if a queue receives a spike of publications, then the queue must spend time dealing with those publications, which takes CPU time away from sending existing messages out to consumers: – a queue of a million messages will be able to be drained out to ready consumers at a much higher rate if there are no publications arriving at the queue to distract it. • Eventually, as a queue grows, it'll become so big that we have to start writing messages out to disk and forgetting about them from RAM in order to free up RAM. – At this point, the CPU cost per message is much higher than had the message been dealt with by an empty queue. – and more importantly, the latency per message grow drastically, regardless of CPU utilization due to the slow path over the disk Queue Length: Growing Overhead of Bursty/Long Queues https://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/ These statements have been experimentally verified (see Part II of this slide set)
  24. 24. 24 <Change information classification in footer> Part II: Basic Benchmarking
  25. 25. 25 Basic Benchmarking - Overview • Experimental Setup • Throughput rates – Publish Only – Consume Only – Simultaneous Publish and Consume • Memory consumption scheme • Impact of disk access
  26. 26. 26 Basic Benchmarking - Overview • Experimental Setup • Throughput rates – Publish Only – Consume Only – Simultaneous Publish and Consume • Memory consumption scheme • Impact of disk access
  27. 27. 27 • Machine: – Kernel Version: 3.13.0-91-generic – Operating System: Ubuntu 14.04.4 LTS – CPUs: 8 – Total Memory: 7.715 GiB • RabbitMQ version: 3.5.3 • Client: Java client's PerfTest (next slide) • Server and client on the same machine • Server running in a Docker container; client running on the host Experimental Setup
  28. 28. 28 Performance Measurement Tool: PerfTest https://www.rabbitmq.com/java-tools.html
  29. 29. 29 • #consumers: 1 • #producers: 1 • #queues: 1 • Load: 1 million (10^6) messages of size 1kb each • Exchange of type topic; routing key “pertest” • Acknowledgement enabled (prefetch: unlimited) Default Setup
  30. 30. 30 Disclaimer: no benchmarking is flawless or complete
  31. 31. 31 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  32. 32. 32 Throughput Rates - Publish Only NOTE: back pressure kicks in for all these cases No bound queue 1 bound queue 2 bound queues Increaseinroutingoverhead Increaseinlevelofbackpressure Increaseinmemory/CPUutilization
  33. 33. 33 Throughput Rates - Consume Only (Ack vs No-Ack) Acknowledgment has a significant impact on throughput
  34. 34. 34 Throughput Rates - Consume Only (Varying prefetch count) prefetch count = 1 prefetch count = 100 prefetch count = 10 prefetch count = 0 (unlimited)
  35. 35. 35 Simultaneous Publish and Consume (varying Numbers) 1 publisher, 1 consumer 2 publishers, 1 consumer 2 publishers, 2 consumers 5 publishers, 5 consumers
  36. 36. 36 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  37. 37. 37 Memory Consumption Scheme – A Long, Stable Queue Notice the considerable memory consumed by the queue process itself (for message metadata, indexes,…) Injected 1M messages, each 1kb => total ~ 1GB
  38. 38. 38 Memory Consumption Scheme – An Active-but-almost-empty Queue The queue process consumes very little memory Empty queue
  39. 39. 39 • In RabbitMQ, memory used by message bodies is shared among processes – Under a group called “Binaries” • This sharing also happens between queues too – if an exchange routes a message to many queues, the message body is only stored in memory once. Memory Consumption Scheme – Sharing Across Queues 1 queue (empty) 1 queue with 1M messages 2 identical queues, each with 1M messages https://www.rabbitmq.com/blog/2014/10/30/ understanding-memory-use-with-rabbitmq- 3-4/
  40. 40. 40 Basic Benchmarking - Overview • Experimental setup • Throughput rates – Publish only – Consume only – Simultaneous publish and consume • Memory consumption scheme • Impact of disk access
  41. 41. 41 Impact of Disk Access – Publish Phase New load: 10^6 message of 5kb each Total size ~ 5 GB (beyond limit) During swap-to-disk periods, back pressure is the highest (publisher completely stopped) Indicators of disk access
  42. 42. 42 Impact of Disk Access – Consume Phase At the beginning, messages are served from memory, at a reasonable rate (15k/s) Once it starts to hit disk, the rates drop drastically (to less than 500/s)
  43. 43. 43 <Change information classification in footer> Part III: Clustering
  44. 44. 44 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  45. 45. 45 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  46. 46. 46 Distribution Alternatives in RabbitMQ https://www.rabbitmq.com/distributed.html meaning it is hard to get through firewalls, which are typically open in one direction only
  47. 47. 47 • Scale-out – Focus of this slide deck • High availability/Fail-over (through mirrored queues) – not discussed here (see https://www.rabbitmq.com/ha.html) Clustering in RabbitMQ: Benefits
  48. 48. 48 • All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. • An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. • To replicate queues across nodes in a cluster, see the documentation on high availability (note that you will need a working cluster first). Clustering in RabbitMQ: What is Replicated? https://www.rabbitmq.com/clustering.html
  49. 49. 49 • Queues within a RabbitMQ cluster are located on a single node (by default, the node on which they were first declared), called home node or queue master – This is in contrast to exchanges and bindings, which can always be considered to be on all nodes. • Queues can optionally be made mirrored across multiple nodes – All queue operations go through the master first and then are replicated to mirrors. • This is necessary to guarantee FIFO ordering of messages. – Consumers are connected to the master regardless of which node they connect to • Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating nodes each do all the work). Crucial Details about Queues in a RabbitMQ Cluster https://www.rabbitmq.com/ha.html
  50. 50. 50 • Cluster can formed in a number of ways: – Manually with rabbitmqctl – Declaratively by listing cluster nodes in config file – Declaratively with plugins • Two options: – rabbitmq-autocluster – rabbitmq-clusterer • The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out as running on a single node. These nodes can be joined into clusters, and subsequently turned back into individual brokers again. Clustering in RabbitMQ: Cluster Formation Alternatives https://github.com/harbur/docker-rabbitmq-cluster
  51. 51. 51 • Concepts • Basic benchmarking (for clustering) • Load balancing Clustering - Overview
  52. 52. 52 • A cluster of two nodes: – Node 1 (N1): • Operating System: Ubuntu 14.04.4 LTS, Kernel Version: 3.16.0-71-generic • CPUs: 4 • Total Memory: 3.774 GiB – Node 2 (N2): • Operating System: Ubuntu 14.04.4 LTS , Kernel Version: 3.13.0-91-generic • CPUs: 8 • Total Memory: 7.715 GiB – Network connection: Ethernet cable (1GB/s) • No high availability (i.e. mirrored queues) • Default values inherited from Part II – Addition: clients running on N2 Basic Benchmarking - Setup
  53. 53. 53 • Impact of network latency • Impact of locality – (Both Producer/Consumer connected directly to the queue node) – Both Producer/Consumer connected indirectly to the queue node – Producer directly to the queue node, consumer indirectly – Producer indirectly to the queue node, consumer directly Basic Benchmarking - Scenarios
  54. 54. 54 Impact of Network Latency Q/P/C hosted-on/connected-to N2 P & C running-on N1 Q/P/C hosted-on/connected-to N2 P & C running-on N2 Remarks: 1) No backlog in either case 2) Comparable throughput (indicating that in LAN setup, network latency is not a decisive factor)
  55. 55. 55 Impact of Locality: Indirect Producer & Indirect Consumer Q/P/C hosted-on/connected-to N2 Q hosted-on N1 P/C connected-to N2 Remarks: 1) Producer and Consumer are connected to a the queue through a proxy node 2) Both have lower throughputs 3) Backlog is building up
  56. 56. 56 Impact of Locality: Direct Producer & Indirect Consumer Q /P hosted-on/ connected-to N2 C connected-to N1 Remarks: 1) Lowest output throughput 2) Fastest queue length growth Q/P/C hosted-on/connected-to N2
  57. 57. 57 Impact of Locality: Indirect Producer & Direct Consumer Q/C hosted-on /connected-to N2 P connected-to N1 Q/P/C hosted-on/connected-to N2 Remarks: 1) Moderate overall throughput 2) No backlog
  58. 58. 58 Inter-Node Data Transfer Q/C hosted-on /connected-to N2 P connected-to N1 Q /P hosted-on/ connected-to N2 C connected-to N1 Q hosted-on N1 P/C connected-to N2
  59. 59. 59 • Queues in RabbitMQ have one “home” node and all related operations go through those nodes • This highlights the importance of “locality” for performance (throughput + backlog) – Q/P/C all co-located => highest throughput, empty queue • Best case scenario – Q/C co-located => moderate levels of throughput, empty queue – Q/P co-located => relatively low throughput, increasing backlog (fastest) – Neither co-located Q/C or co-located Q/P, lowest throughput, increasing backlog Conclusions
  60. 60. 60 • Concepts • Basic Benchmarking (for clustering) • Load balancing Clustering - Overview
  61. 61. 61 • “Many Queues” scenario – Large number of (small) queues – Problem: queues are by default created on the node the client is connected to, resulting in an imbalance in long run (see the figure for an example) • Mitigated in newer versions (see next slide) – Focus of this slide set • “Large Queues” scenario – A few (large) queues – Problem: how to share the load of a big ‘logical’ queue among different brokers – Not discussed here, for a proposal, see this post: • https://insidethecpu.com/2014/11/17/load-balancing-a- rabbitmq-cluster/ Load Balancing in a RabbitMQ Cluster: Two Different Scenarios
  62. 62. 62 • Good news: newer versions of RabbitMQ (> 3.6.0) provide control over where to create master queues – Through “Queue Master Locator” strategies • Proposed Solution: a service (part of WWS Deployment component) that – a) creates queues, ensuring a balanced output • With the help of the locator feature of RabbitMQ – b) for each created queue, figures out its “home” node • Using the a REST call to the Management API of RabbitMQ – c) points the producer(s) and consumer(s) of the queue to the right node Load Balancing RabbitMQ Cluster : The “Many Queues” Scenario
  63. 63. 63 • Queue masters can be distributed between nodes using several strategies. Which strategy is used is controlled in three ways: – using the x-queue-master-locator queue declare argument – setting the queue-master-locator policy key – by defining the queue_master_locator key in the configuration file. • Here are the possible strategies: – min-masters: pick the node hosting the minimum number of masters – client-local: pick the node the client that declares the queue is connected to – random: pick a random node Queue Master Location https://www.rabbitmq.com/ha.html https://www.erlang-solutions.com/blog/take-control-of-your-rabbitmq-queues.html
  64. 64. 64 • Sample code in Python (using Pika client) Queue Location Setting: Option 1 – Through Queue Declare Arguments queue_name = 'microservice.queue.1' args = {"x-queue-master-locator": "min-masters"} channel.queue_declare(queue = queue_name, durable = True, arguments = args )
  65. 65. 65 • A sample policy, set up by REST call to Management API – (sample) endpoint: • http://192.168.0.108:15672/api/policies/%2ftest/min-masters – Verb: • PUT – Body content => • Result: Queue Location Setting: Option 2 – Through Policy {"pattern":"^min-masters", "definition":{ "queue-master-locator":"min- masters", "apply-to": "queues” } }
  66. 66. 66 Queue Master Locator Policy In Practice Step 1: create 5 queues with names that don’t match the policy pattern Result: all on the same broker (that client is connected to) Step 2: create 9 additional queues, with names that match the policy pattern Result: they are distributed fairly across the two brokers
  67. 67. 67 • The corresponding entry line => – Note, default is “client-local” • In practice – result after creating 8 queues => Queue Location Setting: Option 3 – Through Config File {rabbit,[ . . {queue_master_locator, <<"min-masters">>}, . . ]}, NOTE: it may make sense to make “min-masters” our default
  68. 68. 68 • A REST call to the Management API – (sample) endpoint: • http://localhost:15672/api/queues/%2ftest/min-masters.queue9 – Verb: • GET • Sample output => Retrieve Home Node of a Queue { "name":"min-masters.queue9", "vhost":"/test", "durable":false, "auto_delete":false, "exclusive":false, "arguments":{}, "node":"rabbit@broker2", ...}
  69. 69. 69 <Change information classification in footer> Part IV: Summary
  70. 70. 70 • RabbitMQ has a very consequential back-pressure mechanism (Flow Control) • Keep your queues empty! (memory and cpu overhead grows quickly with the length) • Clustering is not fully transparent (loss of locality vs metadata store) • Management API exposes a wealth of useful information (particularly, look out for the node stats, “flow” signs, “disk read/write rates”) A Few Lessons Learned
  71. 71. 71 • Use separate connections for producers and consumers • Use more than one connection for high-load producers • Use message batching, if possible – Amortized overhead – Increase in latency • Use distinct user credentials! – Helps with troubleshooting A Few Lessons Learned (cont.)

×