SlideShare une entreprise Scribd logo
1  sur  25
Brian S Paskin, Senior Application Architect, R&D Services, IBM Cloud Innovations Lab
Updated 22 May 2019
Kafka and IBM Event
Streams Basics
What is Kafka
2
 Kafka was originally developed at LinkedIn in 2010 and opened sourced in 2011
 A version and extras maintained by Confluent, the original Kafka creators from LinkedIn
 A distributed publish and subscribe middleware where all records are persistent
 Used as a part in Event Driven Architectures
 Fault tolerant and scalable when running multiple brokers with multiple partitions
 Kafka runs on Java with clients in many languages
 Uses Apache Zookeeper for metadata (leader and follower setup)
 Can be used with Java Messaging Services (JMS), but does not support all features
 Kafka clients are written in many languages
– C/C++, Python, Go, Erlang, .NET, Clojure, Ruby, Node.js, Proxy (HTTP REST), Perl,
stdin/stdout, PHP, Rust, Alternative Java, Storm, Scala DSL, Clojure, Swift
What is Kafka
3
Brokers and Clusters
4
 A broker is an instance of Kafka, identified by an integer, in the configuration file
 More than 1 broker working together is a cluster
– Can span multiple systems
 All brokers in a cluster know about all other brokers
 All information is written to disk
 A connection to a broker is called the bootstrap broker
 A cluster durably persists all published records for the retention period
– Default retention period is 1 week
Topics and Partitions
5
 A topic is a category or feed name to which records are published
– Subtopics are not supported (i.e. sports/football, sports/football/ASRoma)
 A partition is an ordered, immutable sequence of records of a specific topic
 The records in the partitions are each assigned a sequential id number called the offset
 A topic can have multiple partitions that may span brokers in the cluster
– Allows for fault tolerance and better for consuming of messages
 Partitions can be replicated with in sync replicas (ISR) that are passive
 Partitions/Replicas have a leader that is elected
– If a partition goes down then a new leader is elected
– Cannot have more replicas than brokers
 Brokers can have more than 1 partition, and have multiple partitions for the same topic
Topics and Partitions
6
Cluster with Brokers and 3 Partitions Scenarios
Records
7
 Records consist of a key, a value, and a timestamp
– A key is not required
– Timestamp is added automatically
– The key and value can be an Objects
 Records are serialized by Producers and deserialized by Consumers
– Several serializers/deserializers are available
– Can write other serializers/deserializers
Producers
8
 A Producer writes a record to a Topic
– If there are more than 1 partition then round robin is used to each partition of the topic
– If a key is given, then the record will always be written to a single partition
 For guaranteed deliver there are three types of acknowledgments (ack)
– 0. No acknowledgment (fire and forget)
– 1. Wait for leader to acknowledge
– All. wait for leader and replicas to acknowledge
 Producer retries if acknowledgement is never received
– Can be sent out of order
– May cause duplicate records
 Producers can be idempotent, which prevents sending a message twice
 Producers can use message compression
– Compression codecs supported are Snappy, GZIP and LZ4
– Consumers will automatically know a message is compressed and decompress
Producers
9
 Producers can send messages in batches for efficiency
– By default 5 messages can be in flight at a time
– More messages are placed in batch and sent all at once
– Creating a small delay in processing can lead to better performance
– Batch waits until the delay is expired or batch size is filled
– Messages larger than the batch size will not be batched
 If Producers are sending faster than Brokers can handle then the Producers can be slowed
– Set the buffer memory for storage
– Set the blocking time (milliseconds)
– Throw an error message that the records cannot be sent
 Schema Registry is available to validate data using Confluent Schema Registry
– Uses Apache Avro
– Protects from bad data or mismatches
– Self describing
Consumers
10
 Consumers subscribe to 1 or more Topics
– Read from all partitions from the last offset and consumes records in FIFO order
– Can have multiple consumers subscribed to a topic
– Consumers can set the offset if records need to be processed again
 Multiple Consumers in a consumer groups will read each from a fixed amount of partitions
exclusively
– Having more consumers in a group than partitions will lead to inactive consumers
– Adding or removing Consumers will automatically rebalance the Consumers with the
number of partitions
 Consumers can be idempotent by coding
 Schema Registry is available
Connectors
11
 Connectors allow for integration from sources to sinks and vice versa
– Import from sources like DBs, JDBC, Blockchain, Salesforce, Twitter, etc
– Export to AWS S3, Elastic Search, JDBC, DB, Twitter, Splunk, etc
– Run a connect cluster to pull from source and publish it to Kafka
– Can be used with Streams
– Confluent Hub has many connectors already available
 Connectors can be managed with REST calls
Streams
12
 Consumers from a Topic, processes data, and Publishes in another Topic
 Several built in functions to process or transform data
– Can create other functions
– branch, filter, filterNot, flatMap, flatMapValues, foreach, groupByKey, groupBy, join,
leftJoin, map, mapValues, merge, outerJoin, peek, print, selectKey, through, transform,
tranformFormValues
 Exactly once processing
 Event time windowing is supported
– Group of records with the same key perform stateful operations
Streams
13
Zookeeper Quick Look
14
 Open source project from Apache
 Comes in the package with Kafka
 Centralized system for maintaining configuration information in a distributed system
 There is a Leader service and follower services that exchange information
 Runs on Java
 Should always have an odd number of Zookeeper services started
 Keeps information in files
 Do not need to use the Zookeeper provided with Kafka
Kafka Command Line Basics
15
 Start Zookeeper as a daemon
zookeeper-server-start.sh –daemon ../config/zookeeper.properties
 Stop Zookeeper
zookeeper-server-stop.sh
 Start Kafka as a daemon
kafka-server-start.sh –daemon ../config/server.properties
 Stop Kafka
kafka-server-stop.sh
 Create a topic with number of partitions and number of replications
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
create --partitions 3 --replication-factor 1
 List Topics
kafka-topics.sh –-bootstrap-server host:port --list
Kafka Command Line Basics
16
 Retrieve information about a Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
describe
 Delete Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName –-
delete
 Produce messages to a Topic
kafka-console-producer.sh --broker-list host:port --topic topicName
 Consume from Topic from current Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName
 Consume from Topic from Beginning Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName --from-beginning
Kafka Command Line Basics
17
 Consume from Topic using Consumer Group
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName --group groupName
Event Streams
18
 Event Streams is IBM’s implementation of Kafka
– Several different versions and support
 IBM Event Streams is Kafka with enterprise features and IBM Support
 IBM Event Streams Community Edition is a free version for evaluation and demo use
 IBM Event Streams on IBM Cloud is Kafka as a service on the IBM Cloud
 Support on Red Hat Open Shift and IBM Cloud Private
 Contains REST Proxy Interface for the Producer
 Use external monitoring tools
 Producer Dashboard
 Health Checks for Cluster, Deployment and Topics
 Geo-replication of Topics for high availability and scalability
 Encrypted communications
Event Streams on IBM Cloud
19
 Select Event Streams from the Catalog
 Enter details and which plan is to be used
– Classic, as a Cloud Foundry Service
– Standard, as a standard Kubernetes service
– Enterprise, dedicate
 Fill out topic information and other attributes
 Create credentials that can be used by selecting Service Credentials
 Viewing the credentials shows Brokers hosts and ports, Admin URL, userid and password
 IBM Cloud has its own ES CLI to connect
 IBM MQ Connectors are available
Event Streams on IBM Cloud
20
Kafka and IBM MQ
21
 Kafka is a pub/sub engine with streams and
connectors
 All topics are persistent
 All subscribers are durable
 Adding brokers to requires little work
(changing a configuration file)
 Topics can be spread across brokers
(partitions) with a command
 Producers and Consumers are aware of
changes made to the cluster
 Can have n number of replication partitions
 MQ is a queue, pub/sub engine with file
transfer, MQTT, AMQP and other capabilities
 Queues and topics can be persistent or non
persistent
 Subscribers can be durable or non durable
 Adding QMGRs to requires some work (Add
the QMGRs to the cluster, add cluster
channels. Queues and Topics need to be
added to the cluster.)
 Queues and topics can be spread across a
cluster by adding them to clustered QMGRs
 All MQ clients require a CCDT file to know of
changes if not using a gateway QMGR
 Can have 2 replicas (RDQM) of a QMGR,
Multi Instance QMGRs
Kafka and IBM MQ
22
 Simple load balancing
 Can reread messages
 All clients connect using a single connection
method
 Streams processing built in
 Has connection security, authentication
security, and ACLs (read/write to Topic)
 Load balancing can be simple or more
complex using weights and affinity
 Cannot reread messages that have been
already processed
 MQ has Channels which allow different
clients to connect, each having the ability to
have different security requirements
 Stream processing is not built in, but using
third party libraries, like MicroProfile Reactive
Streams, ReactiveX, etc.
 Has connection security, channel security,
authentication security, message
security/encryption, ACLs for each Object,
third party plugins (Channel Exits)
Kafka and IBM MQ
23
 Built on Java, so can run on any platform that
support Java 8+
 Monitoring by using statistics provided by
Kafka CLI, open source tools, Confluent
Control Center
 Latest native on AIX, IBM i, Linux systems,
Solaris, Windows, z/OS.
 Much more can be monitored. Monitoring
using PCF API, MQ Explorer, MQ CLI
(runmqsc), Third Party Tools (Tivoli, CA
APM, Help Systems, Open Source, etc)
More information
24
 Sample code on GitHub
 Kafka documentation
 Event Streams documentation
 Event Streams on IBM Cloud
 Event Streams sample on GitHub
 IBM Cloud Event Driven Architecture (EDA) Reference
 IBM Cloud EDA Solution
Kafka and ibm event streams basics

Contenu connexe

Tendances

IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersDavid Ware
 
Api gateway in microservices
Api gateway in microservicesApi gateway in microservices
Api gateway in microservicesKunal Hire
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryMarkTaylorIBM
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
 
Data power Performance Tuning
Data power Performance TuningData power Performance Tuning
Data power Performance TuningKINGSHUK MAJUMDER
 
Exploiting IAM in the google cloud platform - dani_goland_mohsan_farid
Exploiting IAM in the google cloud platform - dani_goland_mohsan_faridExploiting IAM in the google cloud platform - dani_goland_mohsan_farid
Exploiting IAM in the google cloud platform - dani_goland_mohsan_faridCloudVillage
 
Microservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactMicroservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactAraf Karsh Hamid
 
IBM MQ Whats new - up to 9.3.4.pdf
IBM MQ Whats new - up to 9.3.4.pdfIBM MQ Whats new - up to 9.3.4.pdf
IBM MQ Whats new - up to 9.3.4.pdfRobert Parker
 
IBM Integration Bus and REST APIs - Sanjay Nagchowdhury
IBM Integration Bus and REST APIs - Sanjay NagchowdhuryIBM Integration Bus and REST APIs - Sanjay Nagchowdhury
IBM Integration Bus and REST APIs - Sanjay NagchowdhuryKaren Broughton-Mabbitt
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
Serverless computing
Serverless computingServerless computing
Serverless computingNitinSalvi14
 
IBM MQ Whats new - up to 9.3.4.pptx
IBM MQ Whats new - up to 9.3.4.pptxIBM MQ Whats new - up to 9.3.4.pptx
IBM MQ Whats new - up to 9.3.4.pptxMatt Leming
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
IBM Think 2018: IBM MQ High Availability
IBM Think 2018: IBM MQ High AvailabilityIBM Think 2018: IBM MQ High Availability
IBM Think 2018: IBM MQ High AvailabilityJamie Squibb
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatAmazon Web Services
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesWojciech Barczyński
 
Exploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex GatewayExploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex Gatewayshyamraj55
 

Tendances (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
 
Api gateway in microservices
Api gateway in microservicesApi gateway in microservices
Api gateway in microservices
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
 
Data power Performance Tuning
Data power Performance TuningData power Performance Tuning
Data power Performance Tuning
 
Exploiting IAM in the google cloud platform - dani_goland_mohsan_farid
Exploiting IAM in the google cloud platform - dani_goland_mohsan_faridExploiting IAM in the google cloud platform - dani_goland_mohsan_farid
Exploiting IAM in the google cloud platform - dani_goland_mohsan_farid
 
Microservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactMicroservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito Pact
 
RabbitMQ
RabbitMQ RabbitMQ
RabbitMQ
 
IBM MQ Whats new - up to 9.3.4.pdf
IBM MQ Whats new - up to 9.3.4.pdfIBM MQ Whats new - up to 9.3.4.pdf
IBM MQ Whats new - up to 9.3.4.pdf
 
IBM Integration Bus and REST APIs - Sanjay Nagchowdhury
IBM Integration Bus and REST APIs - Sanjay NagchowdhuryIBM Integration Bus and REST APIs - Sanjay Nagchowdhury
IBM Integration Bus and REST APIs - Sanjay Nagchowdhury
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Serverless computing
Serverless computingServerless computing
Serverless computing
 
IBM MQ Whats new - up to 9.3.4.pptx
IBM MQ Whats new - up to 9.3.4.pptxIBM MQ Whats new - up to 9.3.4.pptx
IBM MQ Whats new - up to 9.3.4.pptx
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
IBM Think 2018: IBM MQ High Availability
IBM Think 2018: IBM MQ High AvailabilityIBM Think 2018: IBM MQ High Availability
IBM Think 2018: IBM MQ High Availability
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with Kubernetes
 
Exploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex GatewayExploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex Gateway
 

Similaire à Kafka and ibm event streams basics

Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet viewyounessx01
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaDucas Francis
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationMuleSoft Meetup
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsRavindra kumar
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorJoe Olson
 

Similaire à Kafka and ibm event streams basics (20)

Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch Processor
 

Dernier

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Dernier (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Kafka and ibm event streams basics

  • 1. Brian S Paskin, Senior Application Architect, R&D Services, IBM Cloud Innovations Lab Updated 22 May 2019 Kafka and IBM Event Streams Basics
  • 2. What is Kafka 2  Kafka was originally developed at LinkedIn in 2010 and opened sourced in 2011  A version and extras maintained by Confluent, the original Kafka creators from LinkedIn  A distributed publish and subscribe middleware where all records are persistent  Used as a part in Event Driven Architectures  Fault tolerant and scalable when running multiple brokers with multiple partitions  Kafka runs on Java with clients in many languages  Uses Apache Zookeeper for metadata (leader and follower setup)  Can be used with Java Messaging Services (JMS), but does not support all features  Kafka clients are written in many languages – C/C++, Python, Go, Erlang, .NET, Clojure, Ruby, Node.js, Proxy (HTTP REST), Perl, stdin/stdout, PHP, Rust, Alternative Java, Storm, Scala DSL, Clojure, Swift
  • 4. Brokers and Clusters 4  A broker is an instance of Kafka, identified by an integer, in the configuration file  More than 1 broker working together is a cluster – Can span multiple systems  All brokers in a cluster know about all other brokers  All information is written to disk  A connection to a broker is called the bootstrap broker  A cluster durably persists all published records for the retention period – Default retention period is 1 week
  • 5. Topics and Partitions 5  A topic is a category or feed name to which records are published – Subtopics are not supported (i.e. sports/football, sports/football/ASRoma)  A partition is an ordered, immutable sequence of records of a specific topic  The records in the partitions are each assigned a sequential id number called the offset  A topic can have multiple partitions that may span brokers in the cluster – Allows for fault tolerance and better for consuming of messages  Partitions can be replicated with in sync replicas (ISR) that are passive  Partitions/Replicas have a leader that is elected – If a partition goes down then a new leader is elected – Cannot have more replicas than brokers  Brokers can have more than 1 partition, and have multiple partitions for the same topic
  • 6. Topics and Partitions 6 Cluster with Brokers and 3 Partitions Scenarios
  • 7. Records 7  Records consist of a key, a value, and a timestamp – A key is not required – Timestamp is added automatically – The key and value can be an Objects  Records are serialized by Producers and deserialized by Consumers – Several serializers/deserializers are available – Can write other serializers/deserializers
  • 8. Producers 8  A Producer writes a record to a Topic – If there are more than 1 partition then round robin is used to each partition of the topic – If a key is given, then the record will always be written to a single partition  For guaranteed deliver there are three types of acknowledgments (ack) – 0. No acknowledgment (fire and forget) – 1. Wait for leader to acknowledge – All. wait for leader and replicas to acknowledge  Producer retries if acknowledgement is never received – Can be sent out of order – May cause duplicate records  Producers can be idempotent, which prevents sending a message twice  Producers can use message compression – Compression codecs supported are Snappy, GZIP and LZ4 – Consumers will automatically know a message is compressed and decompress
  • 9. Producers 9  Producers can send messages in batches for efficiency – By default 5 messages can be in flight at a time – More messages are placed in batch and sent all at once – Creating a small delay in processing can lead to better performance – Batch waits until the delay is expired or batch size is filled – Messages larger than the batch size will not be batched  If Producers are sending faster than Brokers can handle then the Producers can be slowed – Set the buffer memory for storage – Set the blocking time (milliseconds) – Throw an error message that the records cannot be sent  Schema Registry is available to validate data using Confluent Schema Registry – Uses Apache Avro – Protects from bad data or mismatches – Self describing
  • 10. Consumers 10  Consumers subscribe to 1 or more Topics – Read from all partitions from the last offset and consumes records in FIFO order – Can have multiple consumers subscribed to a topic – Consumers can set the offset if records need to be processed again  Multiple Consumers in a consumer groups will read each from a fixed amount of partitions exclusively – Having more consumers in a group than partitions will lead to inactive consumers – Adding or removing Consumers will automatically rebalance the Consumers with the number of partitions  Consumers can be idempotent by coding  Schema Registry is available
  • 11. Connectors 11  Connectors allow for integration from sources to sinks and vice versa – Import from sources like DBs, JDBC, Blockchain, Salesforce, Twitter, etc – Export to AWS S3, Elastic Search, JDBC, DB, Twitter, Splunk, etc – Run a connect cluster to pull from source and publish it to Kafka – Can be used with Streams – Confluent Hub has many connectors already available  Connectors can be managed with REST calls
  • 12. Streams 12  Consumers from a Topic, processes data, and Publishes in another Topic  Several built in functions to process or transform data – Can create other functions – branch, filter, filterNot, flatMap, flatMapValues, foreach, groupByKey, groupBy, join, leftJoin, map, mapValues, merge, outerJoin, peek, print, selectKey, through, transform, tranformFormValues  Exactly once processing  Event time windowing is supported – Group of records with the same key perform stateful operations
  • 14. Zookeeper Quick Look 14  Open source project from Apache  Comes in the package with Kafka  Centralized system for maintaining configuration information in a distributed system  There is a Leader service and follower services that exchange information  Runs on Java  Should always have an odd number of Zookeeper services started  Keeps information in files  Do not need to use the Zookeeper provided with Kafka
  • 15. Kafka Command Line Basics 15  Start Zookeeper as a daemon zookeeper-server-start.sh –daemon ../config/zookeeper.properties  Stop Zookeeper zookeeper-server-stop.sh  Start Kafka as a daemon kafka-server-start.sh –daemon ../config/server.properties  Stop Kafka kafka-server-stop.sh  Create a topic with number of partitions and number of replications kafka-topics.sh –-bootstrap-server host:port --topic topicName -- create --partitions 3 --replication-factor 1  List Topics kafka-topics.sh –-bootstrap-server host:port --list
  • 16. Kafka Command Line Basics 16  Retrieve information about a Topic kafka-topics.sh –-bootstrap-server host:port --topic topicName -- describe  Delete Topic kafka-topics.sh –-bootstrap-server host:port --topic topicName –- delete  Produce messages to a Topic kafka-console-producer.sh --broker-list host:port --topic topicName  Consume from Topic from current Offset kafka-console-consumer.sh --bootstrap-server host:port --topic topicName  Consume from Topic from Beginning Offset kafka-console-consumer.sh --bootstrap-server host:port --topic topicName --from-beginning
  • 17. Kafka Command Line Basics 17  Consume from Topic using Consumer Group kafka-console-consumer.sh --bootstrap-server host:port --topic topicName --group groupName
  • 18. Event Streams 18  Event Streams is IBM’s implementation of Kafka – Several different versions and support  IBM Event Streams is Kafka with enterprise features and IBM Support  IBM Event Streams Community Edition is a free version for evaluation and demo use  IBM Event Streams on IBM Cloud is Kafka as a service on the IBM Cloud  Support on Red Hat Open Shift and IBM Cloud Private  Contains REST Proxy Interface for the Producer  Use external monitoring tools  Producer Dashboard  Health Checks for Cluster, Deployment and Topics  Geo-replication of Topics for high availability and scalability  Encrypted communications
  • 19. Event Streams on IBM Cloud 19  Select Event Streams from the Catalog  Enter details and which plan is to be used – Classic, as a Cloud Foundry Service – Standard, as a standard Kubernetes service – Enterprise, dedicate  Fill out topic information and other attributes  Create credentials that can be used by selecting Service Credentials  Viewing the credentials shows Brokers hosts and ports, Admin URL, userid and password  IBM Cloud has its own ES CLI to connect  IBM MQ Connectors are available
  • 20. Event Streams on IBM Cloud 20
  • 21. Kafka and IBM MQ 21  Kafka is a pub/sub engine with streams and connectors  All topics are persistent  All subscribers are durable  Adding brokers to requires little work (changing a configuration file)  Topics can be spread across brokers (partitions) with a command  Producers and Consumers are aware of changes made to the cluster  Can have n number of replication partitions  MQ is a queue, pub/sub engine with file transfer, MQTT, AMQP and other capabilities  Queues and topics can be persistent or non persistent  Subscribers can be durable or non durable  Adding QMGRs to requires some work (Add the QMGRs to the cluster, add cluster channels. Queues and Topics need to be added to the cluster.)  Queues and topics can be spread across a cluster by adding them to clustered QMGRs  All MQ clients require a CCDT file to know of changes if not using a gateway QMGR  Can have 2 replicas (RDQM) of a QMGR, Multi Instance QMGRs
  • 22. Kafka and IBM MQ 22  Simple load balancing  Can reread messages  All clients connect using a single connection method  Streams processing built in  Has connection security, authentication security, and ACLs (read/write to Topic)  Load balancing can be simple or more complex using weights and affinity  Cannot reread messages that have been already processed  MQ has Channels which allow different clients to connect, each having the ability to have different security requirements  Stream processing is not built in, but using third party libraries, like MicroProfile Reactive Streams, ReactiveX, etc.  Has connection security, channel security, authentication security, message security/encryption, ACLs for each Object, third party plugins (Channel Exits)
  • 23. Kafka and IBM MQ 23  Built on Java, so can run on any platform that support Java 8+  Monitoring by using statistics provided by Kafka CLI, open source tools, Confluent Control Center  Latest native on AIX, IBM i, Linux systems, Solaris, Windows, z/OS.  Much more can be monitored. Monitoring using PCF API, MQ Explorer, MQ CLI (runmqsc), Third Party Tools (Tivoli, CA APM, Help Systems, Open Source, etc)
  • 24. More information 24  Sample code on GitHub  Kafka documentation  Event Streams documentation  Event Streams on IBM Cloud  Event Streams sample on GitHub  IBM Cloud Event Driven Architecture (EDA) Reference  IBM Cloud EDA Solution

Notes de l'éditeur

  1. Observer Pattern - https://www.tutorialspoint.com/design_pattern/observer_pattern.htm
  2. Serializers: ByteArraySerializer, ByteBufferSerializer, BytesSerializer, DoubleSerializer, ExtendedSerializer.Wrapper, FloatSerializer, IntegerSerializer, LongSerializer, SessionWindowedSerializer, ShortSerializer, StringSerializer, TimeWindowedSerializer, UUIDSerializer Deserializers: ByteArrayDeserializer, ByteBufferDeserializer, BytesDeserializer, DoubleDeserializer, ExtendedDeserializer.Wrapper, FloatDeserializer, IntegerDeserializer, LongDeserializer, SessionWindowedDeserializer, ShortDeserializer, StringDeserializer, TimeWindowedDeserializer, UUIDDeserializer
  3. Serializers: ByteArraySerializer, ByteBufferSerializer, BytesSerializer, DoubleSerializer, ExtendedSerializer.Wrapper, FloatSerializer, IntegerSerializer, LongSerializer, SessionWindowedSerializer, ShortSerializer, StringSerializer, TimeWindowedSerializer, UUIDSerializer Deserializers: ByteArrayDeserializer, ByteBufferDeserializer, BytesDeserializer, DoubleDeserializer, ExtendedDeserializer.Wrapper, FloatDeserializer, IntegerDeserializer, LongDeserializer, SessionWindowedDeserializer, ShortDeserializer, StringDeserializer, TimeWindowedDeserializer, UUIDDeserializer When an idempotent producer is set the property producerProps.put("enable.idempitence", "true") is added. This changes the following settings: retries = MAX_INT, acks=all,
  4. To add a delay change the property: batch linger.ms = 5 (default 0) To change the batch size : batch.size (default 16 kb) To change the buffer memory : buffer.memory (default 32 MB) To change the blocking milliseconds: max.block.ms (default 1)