Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Kafka/SMM Crash Course

142 vues

Publié le

Introduction: This session will cover learning about fundamentals of Apache Kafka and SMM (Streams Messaging Manager)

Format: This session will start with understanding the basic concepts/entities of Apache Kafka like Brokers, Topics, Producers and Consumers/Consumer Groups. It will then delve deeper in to advanced topics like idempotent producer, transactional API in Kafka for exactly once processing, authentication, authorization, replication, log compaction, compression, performance, etc. It will later on be followed by a demo of SMM, an open source Cloudera initiative to help users of Kafka get a better insight in to their Kafka clusters from an operational perspective using an elegant and slick GUI rather than writing complex manual scripts. It will also cover a demo of Alerting/Notification framework that can be used to trigger alerts and notify based on certain conditions one wants to monitor for.

Objective: The objective of this session is to learn about Apache Kafka and illustrate how SMM can help to answer questions that arise in production deployments. Example questions are “Do I have any offline topic partitions”, “Which consumer group is falling behind most”, “What producers are generating the most data right now”, “How does data in my application topic look like” and so on. It will also aim to get familiar with SMM GUI exploring different views around different entities like Brokers, Topics, Producers and Consumer Groups so that user can quickly look for valuable information needed to monitor Kafka clusters or their application. It will also aim to learn how to use the Alerting and Notification framework that comes with SMM to automate monitoring of Kafka clusters and the applications built around it.




Speakers: Daniel Chaffelson

Location: University of DC/Catholic University Room

Publié dans : Technologie
  • Soyez le premier à commenter

Kafka/SMM Crash Course

  1. 1. 1 © Cloudera Inc. 2011–2018. All rights reserved Kafka – SMM – Crashcourse Purnima Kuchikulla Solution Engineer Daniel Chaffelson Sr. Solution Engineer
  2. 2. 2 © Cloudera Inc. 2011–2018. All rights reserved Kafka Basics Kafka has 4 core APIs 1. Producer API 2. Consumer API 3. Streams API 4. Connector API Anatomy of a Kafka Topic Kafka Consumers
  3. 3. 3 © Cloudera Inc. 2011–2018. All rights reserved Kafka Advance Topics 1. Idempotent producer 2. Transactional API in Kafka for exactly once processing 3. Authentication & authorization 4. Replication – Mirror Maker 2 Issues with current Mirror Maker i. No support for making the configs stay in sync, ii. no support for DR, iii. no support for topic names to avoid cycles in an active-active configuration, iv. number of clusters proportionally to number of destination end points, v. no support for handling rebalances gracefully so every change blocks the entire pipeline vi. no monitoring tools 5. Log Compaction
  4. 4. 4 © Cloudera Inc. 2011–2018. All rights reserved
  5. 5. 5 © Cloudera Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Reference Architecture Data Flow Apps Powered by NiFi Kafka is Everywhere. Critical Component of Streaming Architectures Kafka Producers Kafka Topics Kafka TopicsKafka Consumers & Producers Kafka Consumers US West Fleet Truck Sensors C++ Agent US Central Fleet Truck Sensors C++ Agent US East Fleet Truck Sensors C++ Agent Analytics App 1 Analytics App 2 Analytics App 5 Analytics App 3 Analytics App 4
  6. 6. 6 © Cloudera Inc. 2011 – 2016. All Rights Reserved Kafka’s Omnipresence Has Led to the Onset of “Kafka Blindness”  What is “Kafka Blindness”? – Customers who use Kafka today struggle with monitoring / “seeing”/troubleshooting what is happening in their clusters  Who is Affected? – Platform Operation Teams – Developers / DevOps Teams – Security / Governance Teams  What are the Symptoms? – Difficulty seeing who is producing and consuming data – Difficulty understanding the flow of data from producers -> topics  consumers – Difficulty troubleshooting/monitoring.
  7. 7. 7 © Cloudera Inc. 2011 – 2016. All Rights Reserved Cure is Here: Cloudera Streams Messaging Manager (SMM) What is SMM?  New Open Source project led by Cloudera to Cure the “Kafka Blindness”  Single Monitoring Dashboard for all your Kafka Clusters across 4 entities – Broker – Producer – Topic – Consumer  Designed for the Enterprise – Support for Secure/Kerborized Kafka cluster – Rich Access Control Policies (ACLS) – Supports multiple HDP and/or HDF Kafka Clusters  REST as a First Class Citizen  Delivered as a DataPlane Service
  8. 8. 8 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM Addresses the Distinct Needs of 3 Personas/Teams Concerned with monitoring the overall health of the cluster and the infrastructure it runs on Concerned with monitoring the Kafka entities associated with their apps Concerned with audit, compliance, access control & chain of custody requirements
  9. 9. 9 © Cloudera Inc. 2011–2018. All rights reserved SMM Component Architecture
  10. 10. 10 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM for the Platform Ops Team
  11. 11. 11 © Cloudera Inc. 2011 – 2016. All Rights Reserved Demo SMM From Lens of Platform Operations
  12. 12. 12 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM for the DevOps/AppDev Teams
  13. 13. 13 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM for the Security and Governance Team
  14. 14. 14 © Cloudera Inc. 2011 – 2016. All Rights Reserved Demo SMM From Lens of DevOps/App Dev and Security Governance
  15. 15. 15 © Cloudera Inc. 2011 – 2016. All Rights Reserved Demo Environment Crash Course material https://tinyurl.com/dataworks-smm Cloudbreak https://52.31.131.253/clusters Cloudbreak user: admin@example.com pwd:supersecret1 DPS https://18.203.24.241/ambari/ user: admin pwd:supersecret1 Add to /etc/hosts file sudo vi /etc/hosts ##DPS 18.203.24.241 ip-10-0-1-202.eu-west-1.compute.internal ####SMM0 34.249.39.177 ip-10-0-1-188.eu-west-1.compute.internal 63.35.75.183 ip-10-0-1-23.eu-west-1.compute.internal 18.202.81.131 ip-10-0-1-26.eu-west-1.compute.internal 34.242.214.202 ip-10-0-1-229.eu-west-1.compute.internal 34.251.28.182 ip-10-0-1-80.eu-west-1.compute.internal ####SMM1 52.209.60.127 ip-10-0-1-183.eu-west-1.compute.internal 34.255.124.144 ip-10-0-1-61.eu-west-1.compute.internal 52.215.38.248 ip-10-0-1-174.eu-west-1.compute.internal 34.244.53.147 ip-10-0-1-18.eu-west-1.compute.internal 52.212.185.176 ip-10-0-1-20.eu-west-1.compute.internal ####SMM2 34.245.20.14 ip-10-0-1-53.eu-west-1.compute.internal 99.80.118.220 ip-10-0-1-124.eu-west-1.compute.internal 34.241.110.150 ip-10-0-1-112.eu-west-1.compute.internal 34.242.239.55 ip-10-0-1-128.eu-west-1.compute.internal 63.35.190.170 ip-10-0-1-89.eu-west-1.compute.internal ####SMM3 52.18.248.254 ip-10-0-1-213.eu-west-1.compute.internal 34.246.180.11 ip-10-0-1-97.eu-west-1.compute.internal 63.35.178.38 ip-10-0-1-153.eu-west-1.compute.internal 52.17.237.69 ip-10-0-1-56.eu-west-1.compute.internal 34.253.187.104 ip-10-0-1-103.eu-west-1.compute.internal ####SMM4 34.245.156.47 ip-10-0-1-111.eu-west-1.compute.internal 52.30.110.86 ip-10-0-1-249.eu-west-1.compute.internal 52.31.2.74 ip-10-0-1-138.eu-west-1.compute.internal 54.246.247.159 ip-10-0-1-144.eu-west-1.compute.internal 34.244.233.108 ip-10-0-1-141.eu-west-1.compute.internal
  16. 16. 16 © Cloudera Inc. 2011 – 2016. All Rights Reserved Demo Setup: Dev Ops / App Dev Persona – Monitoring the Streaming Truck App Data Flow Apps Powered by NiFi Kafka Producers Kafka Topics Kafka TopicsKafka Consumers & Producers Kafka Consumers US West Fleet Truck Sensors C++ Agent US Central Fleet Truck Sensors C++ Agent US East Fleet Truck Sensors C++ Agent Analytics App 1 Analytics App 2 Analytics App 5 Analytics App 3 Analytics App 4
  17. 17. 17 © Cloudera Inc. 2011 – 2016. All Rights Reserved Log into demo env. Via DPS – Use URL in slide 14 Click on the Globe to see Services available Choose DataPlane to see clusters currently registered Choose SMM to get to the main SMM dashboard
  18. 18. 18 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM Platform Operations Use Cases
  19. 19. 19 © Cloudera Inc. 2011 – 2016. All Rights Reserved Main Dashboard View The Kafka cluster called orlandostreamcluser selected
  20. 20. 20 © Cloudera Inc. 2011 – 2016. All Rights Reserved Find the Most Active Producer in my Cluster Click on Messages to sort on messages sent by all producers in the last 30 mins A Kafka producer called minfi-eu-i1 is the most active producer sending 39K messages in the last 30 mins
  21. 21. 21 © Cloudera Inc. 2011 – 2016. All Rights Reserved Find the Consumer Who Has Fallen Behind the Most Click on LAG to sort on consumer lag across all consumers in the last 30 mins Consumer group named route-micro-service has significantly more lag (97K) than any another consumer in the cluster.
  22. 22. 22 © Cloudera Inc. 2011 – 2016. All Rights Reserved Broker Centric View – View Details of the Brokers in My Cluster Click on the Brokers tab to see a broker centric view of the Dashboard
  23. 23. 23 © Cloudera Inc. 2011 – 2016. All Rights Reserved Broker Centric View: Find my Hottest Broker – Broker with Highest Throughput In Step 1 Click on the Brokers tab to see a broker centric view of the Dashboard Step 2 Click on Throughput to sort on data in across all brokers Analysis Broker 1001 has the highest rate of data in over the last 30 mins. 80K messages totaling 17MB
  24. 24. 24 © Cloudera Inc. 2011 – 2016. All Rights Reserved Find Partitions on a given Broker and Understand flow of data flow from Producer to selected Broker Partition to Consumer Step 1 Click on Panel expand to get more details on the broker like all partitions that are stored on the broker Step 2 - Analysis Note that partition 0 of topic syndicate-speed has high throughput-out on that partition Step 3 Click on the partition and see who are all the producers and consumers sending/consuming from that partition. There is 1 producer and 3 consumer groups explaining why the high throughput out vs in
  25. 25. 25 © Cloudera Inc. 2011 – 2016. All Rights Reserved Analyze Detailed Broker Metrics – Grafana Integration Click on the Grafana icon on the broker panel and a Grafana dashboard for that broker is displayed providing more broker metrics graphed across time
  26. 26. 26 © Cloudera Inc. 2011 – 2016. All Rights Reserved Analyze Detailed Broker Host Metrics – Ambari Integration Click on the Ambari icon on the broker panel and the Ambari host detail view for that broker is displayed providing host level metrics and a view of other services running on that host
  27. 27. 27 © Cloudera Inc. 2011 – 2016. All Rights Reserved Keyword Search via Log Search Click on the Log Search icon on the broker panel and the Log Search detail view is displayed. This enables you to search for specific keywords and to filter for specific log levels, components, and time ranges.
  28. 28. 28 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM DevOps/App Dev Use Cases
  29. 29. 29 © Cloudera Inc. 2011 – 2016. All Rights Reserved Topic Centric Dashboard View: Filter on Topics associated with my Topic Use the Filter to filter on topics and select all the IOT gateway topics
  30. 30. 30 © Cloudera Inc. 2011 – 2016. All Rights Reserved Intelligent Filtering – Selected Topics causes Producers / Consumer to be Intelligently Filtered User Action 4 IOT Gateway topics have been selected Intelligent Filtering SMM automatically filters the producers associated with the selected topics. 34 of the 84 producers have been identified as sending data to the 4 topics selected Intelligent Filtering SMM automatically filters the consumers associated with the selected topics. 3 of the 26 consumers have been identified as consuming data from the 4 topics selected
  31. 31. 31 © Cloudera Inc. 2011 – 2016. All Rights Reserved Find the Hottest Topic – Topic With Highest Throughput-In Step 1 Click on DATA IN to sort on data-in across all topics in the last 30 mins Analysis Kafka topic called gateway-europe- raw-sensors has more data being sent to it than any other topic: 88K messages totaling 18 MB in the last 30 mins
  32. 32. 32 © Cloudera Inc. 2011 – 2016. All Rights Reserved How are the Partitions Laid out for the Topic? Who are the Producers and Consumers? Are there any Partition Skews? Step 1 Expand topic panel to see more details of the topic that has high data-in rates Step 2 Click on the Topic to see who are all the producers sending data to the topic Analysis Note that for each partition there is no data going out (0B) and we see no data going to any consumer groups. This means that while the topic has lots of producers, there is no consumers which could indicate a problem
  33. 33. 33 © Cloudera Inc. 2011 – 2016. All Rights Reserved How does Data Flow between Producers to Topics to Consumers? Step 1 Expand details of a topic that has consumers Step 2 Click on the topic to see all producers sending data to it and all consumers consuming from it Analysis 1 We have 3 truck producers from the west fleet sending data to gateway-west topic and a NiFi consumer called truck-sensors- west consuming from it Analysis 2 Note that there is no data in 2 of the 4 partitions. This could be a partition/event key skew issue
  34. 34. 34 © Cloudera Inc. 2011 – 2016. All Rights Reserved Explore/Search Messages in the Kafka Topic Click on the explorer icon to search for events in the Kafka Topic
  35. 35. 35 © Cloudera Inc. 2011 – 2016. All Rights Reserved Explore Metadata about the Topic in Atlas Click on Atlas Link to see the metadata of the topic gateway-west-raw-sensors in Atlas If Atlas does not come up then use this link and then pick Type as Kafka_Topic for search https://99.80.132.89:8443/pkuc-wv-smm-m/dp-proxy/atlas/
  36. 36. 36 © Cloudera Inc. 2011 – 2016. All Rights Reserved Traverse the flow of data across multiple Kafka Topics using SMM and Atlas Integration Question The topic has one active consumer which is a NiFi consumer. Which Kafka topic if any is this Nifi Flow consumer publishing events to? Step 1 Click on Atlas Icon to see lineage of the the topic gateway-west-raw-sensors Analysis NiFi App consumes from the gateway-west-raw-sensors topic and publishes events to downstream Kafka topic called syndicate-geo-event-avro
  37. 37. 37 © Cloudera Inc. 2011 – 2016. All Rights Reserved Search for that syndicate-geo-event-avro-topic in SMM and find its Consumers Step 1 Search for the Kafka topic that the the nifi-truck-sensor-west consumer was publishing events to Analysis We see that that this topic has 4 downstream consumers. We just tracked a flow across multiple Kafka hops
  38. 38. 38 © Cloudera Inc. 2011 – 2016. All Rights Reserved Recap: What Did We Just Show? Tracking the flow of data across multiple Kafka Hops with SMM & Atlas Integration-Powerful Data Flow Apps Powered by NiFi US West Fleet Truck Sensors C++ Agent US Central Fleet Truck Sensors C++ Agent US East Fleet Truck Sensors C++ Agent Analytics App 3 Analytics App 2 Analytics App 1
  39. 39. 39 © Cloudera Inc. 2011 – 2016. All Rights Reserved SMM 1.2 New Features
  40. 40. 40 © Cloudera Inc. 2011 – 2016. All Rights Reserved New Features Topic Lifecycle Management • Create • Update • Delete Alerting • Alert Notifier • Alert Policy Schema Registry Integration • Data Governance • Provide reusable schema (centralized registry) • Define relationship between schemas (version management) • Enable generic format conversion, and generic routing (schema validation) • Operational Efficiency • To avoid attaching schema to every piece of data (centralized registry • Consumers and producers can evolve at different rates (version management) • Data quality (schema validation)
  41. 41. 41 © Cloudera Inc. 2011 – 2016. All Rights Reserved Kafka Command Line Interface
  42. 42. 42 © Cloudera Inc. 2011 – 2016. All Rights Reserved Topic Lifecycle Management Add Topic User friendly UI to create new topics. Simple and Advance features are available
  43. 43. 43 © Cloudera Inc. 2011 – 2016. All Rights Reserved Topic Update Search Topic Search the Topic you would like to update. Then click on profile Update Topic Click on Config and you can change Cleanup Policy or click Advanced to modify the configuration parameters.
  44. 44. 44 © Cloudera Inc. 2011 – 2016. All Rights Reserved Alerting – Create Notifier 1st Key Construct of Alerts Alert Notifier 1. Email 2. http end point 3. Kafka topic
  45. 45. 45 © Cloudera Inc. 2011 – 2016. All Rights Reserved Alerting – Create Alert 2nd Key Construct of Alerts Alert Policy 1. Defined for 5 key entities (cluster, broker, topic, producer, consumer) 2. Metrics defined on entities 3. Complex alerts 4. Includes notifier when triggered
  46. 46. 46 © Cloudera Inc. 2011 – 2016. All Rights Reserved Alert History Disable Alert Disable alert
  47. 47. 47 © Cloudera Inc. 2011 – 2016. All Rights Reserved Example: Alerting on Micro-Service Consumer Group with High Lag
  48. 48. 48 © Cloudera Inc. 2011 – 2016. All Rights Reserved Thank YOU!

×