Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Troubleshooting RabbitMQ and services that use it

1 406 vues

Publié le

Designing a system in terms of [micro] services is hype du jour but it's not without trade-offs. Debugging a distributed system can be challenging. In this talk we will cover how one can start troubleshooting a distributed service-oriented system.

Publié dans : Technologie
  • Soyez le premier à commenter

Troubleshooting RabbitMQ and services that use it

  1. 1. Troubleshooting RabbitMQ and services that use it
  2. 2. Who am I? • Staff Engineer, RabbitMQ @ Pivotal
  3. 3. Who am I? • Staff Engineer, RabbitMQ @ Pivotal • @michaelklishin, github.com/michaelklishin
  4. 4. The monolith problem
  5. 5. Troubleshooting publishers
  6. 6. Troubleshooting publishers • I/O exceptions (shutdown handlers)
  7. 7. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms
  8. 8. When in doubt, borrow ideas from TCP
  9. 9. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers
  10. 10. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers • Invalid payload (e.g. fails to deserialize or decrypt)
  11. 11. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers • Invalid payload (e.g. fails to deserialize or decrypt) • Identifying publisher instances
  12. 12. Troubleshooting publishers • identifying blocked (throttled) publishers
  13. 13. Client-provided connection names in RabbitMQ 3.6.3+
  14. 14. Troubleshooting publishers • identifying blocked (throttled) publishers • retries
  15. 15. Troubleshooting publishers • spring-amqp can cover all of the above
  16. 16. Troubleshooting consumers
  17. 17. Troubleshooting consumers • I/O exceptions
  18. 18. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS
  19. 19. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming
  20. 20. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming
  21. 21. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming • Redelivery metrics
  22. 22. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming • Redelivery metrics • Identifying consumer instances
  23. 23. Troubleshooting consumers • Consumer utilization (reported by HTTP API)
  24. 24. Troubleshooting consumers • spring-amqp can help with some of the above
  25. 25. — W. Edwards Deming “In God we trust, all others must bring data…”
  26. 26. — W. Edwards Deming “In God we trust, all others must bring data…”
  27. 27. — What do you do for a living?
  28. 28. — What do you do for a living? — Tell people to read the logs.
  29. 29. Sources of data useful for debugging
  30. 30. Sources of data useful for debugging • Metrics
  31. 31. Sources of data useful for debugging • Metrics • Your logs
  32. 32. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs
  33. 33. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs • Tracing data
  34. 34. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs • Tracing data • Wireshark (tcpdump, libpcap)
  35. 35. Collecting data from RabbitMQ
  36. 36. Collecting data from RabbitMQ • Logs
  37. 37. Collecting data from RabbitMQ • Logs • rabbitmqctl status
  38. 38. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment
  39. 39. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3)
  40. 40. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3) • HTTP API (lots of metrics)
  41. 41. http://{hostname}:15672/api
  42. 42. curl -u guest:guest http://127.0.0.1:15672/api/overview | python -m json.tool curl -u guest:guest http://127.0.0.1:15672/api/nodes/{node} | python -m json.tool curl -u guest:guest http://127.0.0.1:15672/api/queues | python -m json.tool
  43. 43. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3) • HTTP API (lots of metrics) • Message tracing ("firehose")
  44. 44. Collecting data from RabbitMQ • HTTP API (lots of metrics) • Message tracing ("firehose") • Infrastructure metrics
  45. 45. Common theme?
  46. 46. Common theme? • Collect logs system-wide
  47. 47. Common theme? • Collect logs system-wide • Collect metrics system-wide
  48. 48. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide
  49. 49. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin)
  50. 50. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin) • Analyze
  51. 51. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin) • Analyze • Sounds like something a structured platform can help with!
  52. 52. Distributed system debugging is a problem far from being solved.
  53. 53. Thank you
  54. 54. Thank you • @michaelklishin
  55. 55. Thank you • @michaelklishin • github.com/michaelklishin
  56. 56. Thank you • @michaelklishin • github.com/michaelklishin • mklishin@pivotal.io

×