Kafka and ibm event streams basics

Brian S Paskin, Senior Application Architect, R&D Services, IBM Cloud Innovations Lab
Updated 22 May 2019
Kafka and IBM Event
Streams Basics

What is Kafka
2
 Kafka was originally developed at LinkedIn in 2010 and opened sourced in 2011
 A version and extras maintained by Confluent, the original Kafka creators from LinkedIn
 A distributed publish and subscribe middleware where all records are persistent
 Used as a part in Event Driven Architectures
 Fault tolerant and scalable when running multiple brokers with multiple partitions
 Kafka runs on Java with clients in many languages
 Uses Apache Zookeeper for metadata (leader and follower setup)
 Can be used with Java Messaging Services (JMS), but does not support all features
 Kafka clients are written in many languages
– C/C++, Python, Go, Erlang, .NET, Clojure, Ruby, Node.js, Proxy (HTTP REST), Perl,
stdin/stdout, PHP, Rust, Alternative Java, Storm, Scala DSL, Clojure, Swift

Brokers and Clusters
4
 A broker is an instance of Kafka, identified by an integer, in the configuration file
 More than 1 broker working together is a cluster
– Can span multiple systems
 All brokers in a cluster know about all other brokers
 All information is written to disk
 A connection to a broker is called the bootstrap broker
 A cluster durably persists all published records for the retention period
– Default retention period is 1 week

Topics and Partitions
5
 A topic is a category or feed name to which records are published
– Subtopics are not supported (i.e. sports/football, sports/football/ASRoma)
 A partition is an ordered, immutable sequence of records of a specific topic
 The records in the partitions are each assigned a sequential id number called the offset
 A topic can have multiple partitions that may span brokers in the cluster
– Allows for fault tolerance and better for consuming of messages
 Partitions can be replicated with in sync replicas (ISR) that are passive
 Partitions/Replicas have a leader that is elected
– If a partition goes down then a new leader is elected
– Cannot have more replicas than brokers
 Brokers can have more than 1 partition, and have multiple partitions for the same topic

Topics and Partitions
6
Cluster with Brokers and 3 Partitions Scenarios

Records
7
 Records consist of a key, a value, and a timestamp
– A key is not required
– Timestamp is added automatically
– The key and value can be an Objects
 Records are serialized by Producers and deserialized by Consumers
– Several serializers/deserializers are available
– Can write other serializers/deserializers

Producers
8
 A Producer writes a record to a Topic
– If there are more than 1 partition then round robin is used to each partition of the topic
– If a key is given, then the record will always be written to a single partition
 For guaranteed deliver there are three types of acknowledgments (ack)
– 0. No acknowledgment (fire and forget)
– 1. Wait for leader to acknowledge
– All. wait for leader and replicas to acknowledge
 Producer retries if acknowledgement is never received
– Can be sent out of order
– May cause duplicate records
 Producers can be idempotent, which prevents sending a message twice
 Producers can use message compression
– Compression codecs supported are Snappy, GZIP and LZ4
– Consumers will automatically know a message is compressed and decompress

Producers
9
 Producers can send messages in batches for efficiency
– By default 5 messages can be in flight at a time
– More messages are placed in batch and sent all at once
– Creating a small delay in processing can lead to better performance
– Batch waits until the delay is expired or batch size is filled
– Messages larger than the batch size will not be batched
 If Producers are sending faster than Brokers can handle then the Producers can be slowed
– Set the buffer memory for storage
– Set the blocking time (milliseconds)
– Throw an error message that the records cannot be sent
 Schema Registry is available to validate data using Confluent Schema Registry
– Uses Apache Avro
– Protects from bad data or mismatches
– Self describing

Consumers
10
 Consumers subscribe to 1 or more Topics
– Read from all partitions from the last offset and consumes records in FIFO order
– Can have multiple consumers subscribed to a topic
– Consumers can set the offset if records need to be processed again
 Multiple Consumers in a consumer groups will read each from a fixed amount of partitions
exclusively
– Having more consumers in a group than partitions will lead to inactive consumers
– Adding or removing Consumers will automatically rebalance the Consumers with the
number of partitions
 Consumers can be idempotent by coding
 Schema Registry is available

Connectors
11
 Connectors allow for integration from sources to sinks and vice versa
– Import from sources like DBs, JDBC, Blockchain, Salesforce, Twitter, etc
– Export to AWS S3, Elastic Search, JDBC, DB, Twitter, Splunk, etc
– Run a connect cluster to pull from source and publish it to Kafka
– Can be used with Streams
– Confluent Hub has many connectors already available
 Connectors can be managed with REST calls

Streams
12
 Consumers from a Topic, processes data, and Publishes in another Topic
 Several built in functions to process or transform data
– Can create other functions
– branch, filter, filterNot, flatMap, flatMapValues, foreach, groupByKey, groupBy, join,
leftJoin, map, mapValues, merge, outerJoin, peek, print, selectKey, through, transform,
tranformFormValues
 Exactly once processing
 Event time windowing is supported
– Group of records with the same key perform stateful operations

Zookeeper Quick Look
14
 Open source project from Apache
 Comes in the package with Kafka
 Centralized system for maintaining configuration information in a distributed system
 There is a Leader service and follower services that exchange information
 Runs on Java
 Should always have an odd number of Zookeeper services started
 Keeps information in files
 Do not need to use the Zookeeper provided with Kafka

Kafka Command Line Basics
15
 Start Zookeeper as a daemon
zookeeper-server-start.sh –daemon ../config/zookeeper.properties
 Stop Zookeeper
zookeeper-server-stop.sh
 Start Kafka as a daemon
kafka-server-start.sh –daemon ../config/server.properties
 Stop Kafka
kafka-server-stop.sh
 Create a topic with number of partitions and number of replications
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
create --partitions 3 --replication-factor 1
 List Topics
kafka-topics.sh –-bootstrap-server host:port --list

16
 Retrieve information about a Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
describe
 Delete Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName –-
delete
 Produce messages to a Topic
kafka-console-producer.sh --broker-list host:port --topic topicName
 Consume from Topic from current Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName
 Consume from Topic from Beginning Offset
topicName --from-beginning

17
 Consume from Topic using Consumer Group
topicName --group groupName

Event Streams
18
 Event Streams is IBM’s implementation of Kafka
– Several different versions and support
 IBM Event Streams is Kafka with enterprise features and IBM Support
 IBM Event Streams Community Edition is a free version for evaluation and demo use
 IBM Event Streams on IBM Cloud is Kafka as a service on the IBM Cloud
 Support on Red Hat Open Shift and IBM Cloud Private
 Contains REST Proxy Interface for the Producer
 Use external monitoring tools
 Producer Dashboard
 Health Checks for Cluster, Deployment and Topics
 Geo-replication of Topics for high availability and scalability
 Encrypted communications

Event Streams on IBM Cloud
19
 Select Event Streams from the Catalog
 Enter details and which plan is to be used
– Classic, as a Cloud Foundry Service
– Standard, as a standard Kubernetes service
– Enterprise, dedicate
 Fill out topic information and other attributes
 Create credentials that can be used by selecting Service Credentials
 Viewing the credentials shows Brokers hosts and ports, Admin URL, userid and password
 IBM Cloud has its own ES CLI to connect
 IBM MQ Connectors are available

Kafka and IBM MQ
21
 Kafka is a pub/sub engine with streams and
connectors
 All topics are persistent
 All subscribers are durable
 Adding brokers to requires little work
(changing a configuration file)
 Topics can be spread across brokers
(partitions) with a command
 Producers and Consumers are aware of
changes made to the cluster
 Can have n number of replication partitions
 MQ is a queue, pub/sub engine with file
transfer, MQTT, AMQP and other capabilities
 Queues and topics can be persistent or non
persistent
 Subscribers can be durable or non durable
 Adding QMGRs to requires some work (Add
the QMGRs to the cluster, add cluster
channels. Queues and Topics need to be
added to the cluster.)
 Queues and topics can be spread across a
cluster by adding them to clustered QMGRs
 All MQ clients require a CCDT file to know of
changes if not using a gateway QMGR
 Can have 2 replicas (RDQM) of a QMGR,
Multi Instance QMGRs

Kafka and IBM MQ
22
 Simple load balancing
 Can reread messages
 All clients connect using a single connection
method
 Streams processing built in
 Has connection security, authentication
security, and ACLs (read/write to Topic)
 Load balancing can be simple or more
complex using weights and affinity
 Cannot reread messages that have been
already processed
 MQ has Channels which allow different
clients to connect, each having the ability to
have different security requirements
 Stream processing is not built in, but using
third party libraries, like MicroProfile Reactive
Streams, ReactiveX, etc.
 Has connection security, channel security,
authentication security, message
security/encryption, ACLs for each Object,
third party plugins (Channel Exits)

Kafka and IBM MQ
23
 Built on Java, so can run on any platform that
support Java 8+
 Monitoring by using statistics provided by
Kafka CLI, open source tools, Confluent
Control Center
 Latest native on AIX, IBM i, Linux systems,
Solaris, Windows, z/OS.
 Much more can be monitored. Monitoring
using PCF API, MQ Explorer, MQ CLI
(runmqsc), Third Party Tools (Tivoli, CA
APM, Help Systems, Open Source, etc)

More information
24
 Sample code on GitHub
 Kafka documentation
 Event Streams documentation
 Event Streams on IBM Cloud
 Event Streams sample on GitHub
 IBM Cloud Event Driven Architecture (EDA) Reference
 IBM Cloud EDA Solution

Kafka and ibm event streams basics

Kafka and ibm event streams basics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Kafka and ibm event streams basics

Similaire à Kafka and ibm event streams basics (20)

Dernier

Dernier (20)

Kafka and ibm event streams basics

Notes de l'éditeur