Basics of Kafka and IBM Cloud Event Streams. Includes all the major topics of Kafka, like Brokers, Clusters, Topics, Partitions, Producers, Consumers, Streams, and Connectors. What Event Stream offers more than just Kafka. Some difference between Kafka and IBM MQ.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Kafka and ibm event streams basics
1. Brian S Paskin, Senior Application Architect, R&D Services, IBM Cloud Innovations Lab
Updated 22 May 2019
Kafka and IBM Event
Streams Basics
2. What is Kafka
2
Kafka was originally developed at LinkedIn in 2010 and opened sourced in 2011
A version and extras maintained by Confluent, the original Kafka creators from LinkedIn
A distributed publish and subscribe middleware where all records are persistent
Used as a part in Event Driven Architectures
Fault tolerant and scalable when running multiple brokers with multiple partitions
Kafka runs on Java with clients in many languages
Uses Apache Zookeeper for metadata (leader and follower setup)
Can be used with Java Messaging Services (JMS), but does not support all features
Kafka clients are written in many languages
– C/C++, Python, Go, Erlang, .NET, Clojure, Ruby, Node.js, Proxy (HTTP REST), Perl,
stdin/stdout, PHP, Rust, Alternative Java, Storm, Scala DSL, Clojure, Swift
4. Brokers and Clusters
4
A broker is an instance of Kafka, identified by an integer, in the configuration file
More than 1 broker working together is a cluster
– Can span multiple systems
All brokers in a cluster know about all other brokers
All information is written to disk
A connection to a broker is called the bootstrap broker
A cluster durably persists all published records for the retention period
– Default retention period is 1 week
5. Topics and Partitions
5
A topic is a category or feed name to which records are published
– Subtopics are not supported (i.e. sports/football, sports/football/ASRoma)
A partition is an ordered, immutable sequence of records of a specific topic
The records in the partitions are each assigned a sequential id number called the offset
A topic can have multiple partitions that may span brokers in the cluster
– Allows for fault tolerance and better for consuming of messages
Partitions can be replicated with in sync replicas (ISR) that are passive
Partitions/Replicas have a leader that is elected
– If a partition goes down then a new leader is elected
– Cannot have more replicas than brokers
Brokers can have more than 1 partition, and have multiple partitions for the same topic
7. Records
7
Records consist of a key, a value, and a timestamp
– A key is not required
– Timestamp is added automatically
– The key and value can be an Objects
Records are serialized by Producers and deserialized by Consumers
– Several serializers/deserializers are available
– Can write other serializers/deserializers
8. Producers
8
A Producer writes a record to a Topic
– If there are more than 1 partition then round robin is used to each partition of the topic
– If a key is given, then the record will always be written to a single partition
For guaranteed deliver there are three types of acknowledgments (ack)
– 0. No acknowledgment (fire and forget)
– 1. Wait for leader to acknowledge
– All. wait for leader and replicas to acknowledge
Producer retries if acknowledgement is never received
– Can be sent out of order
– May cause duplicate records
Producers can be idempotent, which prevents sending a message twice
Producers can use message compression
– Compression codecs supported are Snappy, GZIP and LZ4
– Consumers will automatically know a message is compressed and decompress
9. Producers
9
Producers can send messages in batches for efficiency
– By default 5 messages can be in flight at a time
– More messages are placed in batch and sent all at once
– Creating a small delay in processing can lead to better performance
– Batch waits until the delay is expired or batch size is filled
– Messages larger than the batch size will not be batched
If Producers are sending faster than Brokers can handle then the Producers can be slowed
– Set the buffer memory for storage
– Set the blocking time (milliseconds)
– Throw an error message that the records cannot be sent
Schema Registry is available to validate data using Confluent Schema Registry
– Uses Apache Avro
– Protects from bad data or mismatches
– Self describing
10. Consumers
10
Consumers subscribe to 1 or more Topics
– Read from all partitions from the last offset and consumes records in FIFO order
– Can have multiple consumers subscribed to a topic
– Consumers can set the offset if records need to be processed again
Multiple Consumers in a consumer groups will read each from a fixed amount of partitions
exclusively
– Having more consumers in a group than partitions will lead to inactive consumers
– Adding or removing Consumers will automatically rebalance the Consumers with the
number of partitions
Consumers can be idempotent by coding
Schema Registry is available
11. Connectors
11
Connectors allow for integration from sources to sinks and vice versa
– Import from sources like DBs, JDBC, Blockchain, Salesforce, Twitter, etc
– Export to AWS S3, Elastic Search, JDBC, DB, Twitter, Splunk, etc
– Run a connect cluster to pull from source and publish it to Kafka
– Can be used with Streams
– Confluent Hub has many connectors already available
Connectors can be managed with REST calls
12. Streams
12
Consumers from a Topic, processes data, and Publishes in another Topic
Several built in functions to process or transform data
– Can create other functions
– branch, filter, filterNot, flatMap, flatMapValues, foreach, groupByKey, groupBy, join,
leftJoin, map, mapValues, merge, outerJoin, peek, print, selectKey, through, transform,
tranformFormValues
Exactly once processing
Event time windowing is supported
– Group of records with the same key perform stateful operations
14. Zookeeper Quick Look
14
Open source project from Apache
Comes in the package with Kafka
Centralized system for maintaining configuration information in a distributed system
There is a Leader service and follower services that exchange information
Runs on Java
Should always have an odd number of Zookeeper services started
Keeps information in files
Do not need to use the Zookeeper provided with Kafka
15. Kafka Command Line Basics
15
Start Zookeeper as a daemon
zookeeper-server-start.sh –daemon ../config/zookeeper.properties
Stop Zookeeper
zookeeper-server-stop.sh
Start Kafka as a daemon
kafka-server-start.sh –daemon ../config/server.properties
Stop Kafka
kafka-server-stop.sh
Create a topic with number of partitions and number of replications
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
create --partitions 3 --replication-factor 1
List Topics
kafka-topics.sh –-bootstrap-server host:port --list
16. Kafka Command Line Basics
16
Retrieve information about a Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
describe
Delete Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName –-
delete
Produce messages to a Topic
kafka-console-producer.sh --broker-list host:port --topic topicName
Consume from Topic from current Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName
Consume from Topic from Beginning Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName --from-beginning
17. Kafka Command Line Basics
17
Consume from Topic using Consumer Group
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName --group groupName
18. Event Streams
18
Event Streams is IBM’s implementation of Kafka
– Several different versions and support
IBM Event Streams is Kafka with enterprise features and IBM Support
IBM Event Streams Community Edition is a free version for evaluation and demo use
IBM Event Streams on IBM Cloud is Kafka as a service on the IBM Cloud
Support on Red Hat Open Shift and IBM Cloud Private
Contains REST Proxy Interface for the Producer
Use external monitoring tools
Producer Dashboard
Health Checks for Cluster, Deployment and Topics
Geo-replication of Topics for high availability and scalability
Encrypted communications
19. Event Streams on IBM Cloud
19
Select Event Streams from the Catalog
Enter details and which plan is to be used
– Classic, as a Cloud Foundry Service
– Standard, as a standard Kubernetes service
– Enterprise, dedicate
Fill out topic information and other attributes
Create credentials that can be used by selecting Service Credentials
Viewing the credentials shows Brokers hosts and ports, Admin URL, userid and password
IBM Cloud has its own ES CLI to connect
IBM MQ Connectors are available
21. Kafka and IBM MQ
21
Kafka is a pub/sub engine with streams and
connectors
All topics are persistent
All subscribers are durable
Adding brokers to requires little work
(changing a configuration file)
Topics can be spread across brokers
(partitions) with a command
Producers and Consumers are aware of
changes made to the cluster
Can have n number of replication partitions
MQ is a queue, pub/sub engine with file
transfer, MQTT, AMQP and other capabilities
Queues and topics can be persistent or non
persistent
Subscribers can be durable or non durable
Adding QMGRs to requires some work (Add
the QMGRs to the cluster, add cluster
channels. Queues and Topics need to be
added to the cluster.)
Queues and topics can be spread across a
cluster by adding them to clustered QMGRs
All MQ clients require a CCDT file to know of
changes if not using a gateway QMGR
Can have 2 replicas (RDQM) of a QMGR,
Multi Instance QMGRs
22. Kafka and IBM MQ
22
Simple load balancing
Can reread messages
All clients connect using a single connection
method
Streams processing built in
Has connection security, authentication
security, and ACLs (read/write to Topic)
Load balancing can be simple or more
complex using weights and affinity
Cannot reread messages that have been
already processed
MQ has Channels which allow different
clients to connect, each having the ability to
have different security requirements
Stream processing is not built in, but using
third party libraries, like MicroProfile Reactive
Streams, ReactiveX, etc.
Has connection security, channel security,
authentication security, message
security/encryption, ACLs for each Object,
third party plugins (Channel Exits)
23. Kafka and IBM MQ
23
Built on Java, so can run on any platform that
support Java 8+
Monitoring by using statistics provided by
Kafka CLI, open source tools, Confluent
Control Center
Latest native on AIX, IBM i, Linux systems,
Solaris, Windows, z/OS.
Much more can be monitored. Monitoring
using PCF API, MQ Explorer, MQ CLI
(runmqsc), Third Party Tools (Tivoli, CA
APM, Help Systems, Open Source, etc)
24. More information
24
Sample code on GitHub
Kafka documentation
Event Streams documentation
Event Streams on IBM Cloud
Event Streams sample on GitHub
IBM Cloud Event Driven Architecture (EDA) Reference
IBM Cloud EDA Solution
Serializers: ByteArraySerializer, ByteBufferSerializer, BytesSerializer, DoubleSerializer, ExtendedSerializer.Wrapper, FloatSerializer, IntegerSerializer, LongSerializer, SessionWindowedSerializer, ShortSerializer, StringSerializer, TimeWindowedSerializer, UUIDSerializer
Deserializers: ByteArrayDeserializer, ByteBufferDeserializer, BytesDeserializer, DoubleDeserializer, ExtendedDeserializer.Wrapper, FloatDeserializer, IntegerDeserializer, LongDeserializer, SessionWindowedDeserializer, ShortDeserializer, StringDeserializer, TimeWindowedDeserializer, UUIDDeserializer
When an idempotent producer is set the property producerProps.put("enable.idempitence", "true") is added. This changes the following settings: retries = MAX_INT, acks=all,
To add a delay change the property:
batch linger.ms = 5 (default 0)
To change the batch size :
batch.size (default 16 kb)
To change the buffer memory :
buffer.memory (default 32 MB)
To change the blocking milliseconds:
max.block.ms (default 1)