2. Agenda
• What is Apache Kafka?
• What can I use event streaming for?
• Where is Apache Kafka used?
• Kafka Theory 101 Overview
And maybe a pony or two.
4. What is Apache Kafka?
• It is an 🙌 open-source 🙌 event streaming platform.
• Simply put, it is a way of moving data between systems.
• (i.e. between applications, and servers)
5. 3 key capabilities for event streaming:
• To publish (write) and subscribe to (read) streams of events, including
continuous import/export of data from other systems.
• To store streams of events durably and reliably for as long as a dev needs.
• To process streams of events as they occur or retrospectively.
7. What can I use event streaming for?
• To process payments and financial transactions in real-time
• stock exchanges
• banks
• insurances
• To track and monitor cars, trucks, fleets, and shipments in real-time
• logistics
• automotive industry
• To continuously capture and analyze sensor data from IoT devices/equipment
• factories
• wind parks
8. Event streaming – cont’d…
• To collect and immediately react to customer interactions and orders
• retail
• hotel and travel industry
• mobile applications
• To connect, store, and make available data produced by different divisions of
a company.
• To serve as the foundation for data platforms, event-driven architectures,
and microservices.
10. Where is Apache Kafka used?
• Netflix is using Kafka to apply
recommendations in real-time while
you're watching TV shows.
• Uber uses Kafka to gather user, taxi
and trip data in real-time to compute
and forecast demand and pricing in
real-time.
• LinkedIn uses Kafka to prevent spam in
their platform, collect user
interactions and make better
connection recommendations.
12. T is for Topics and Twilight Sparkle
• A particular stream of data.
• A topic is identified by its name.
• Topics are split into partitions.
• Each partition is ordered.
• Each message within a partition gets an
incremental id, called offset.
• Central protagonist of the show
• Most intellectual member of the Mane Six
• Her cutie mark represents her talent
for magic and her love
for books and knowledge.
0 1 2 3 4
0 1 2 3
0 1 2 3 4 5 6
Partition 0
Partition 1
Partition 2
13. Topic Replication
Topic Replication is the process to offer fail-over
capability for a topic.
• If one broker (holds topics and partitions) is
down, the other broker can serve the data.
• This replication factor defines the number of
copies of a topic in a Kafka cluster (made up of
multiple Kafka brokers).
• Kafka stores messages in topics that
are partitioned and replicated across
multiple brokers in a cluster.
Broker 2
Broker 1
Partition 0
Topic A
Partition 1
Topic B
Partition 0
Topic A
14. B is for Brokers and Babs Seed
• Holds topics and partitions.
• Each broker is identified with its ID (integer).
• Each broker contains certain topic partitions.
• After connecting to any broker, you're
connected to an entire cluster.
• Apple Bloom's cousin from Manehattan.
• Former member of the Cutie Mark
Crusaders
15. P is for Partitions and Pinkie Pie
• An ordered, immutable record sequence.
• Once data is written to a partition, it can't be
changed (immutable).
• Data is assigned randomly to a partition, unless a key
is provided.
• At any one time, only ONE broker can be a leader for
a given partition.
• That leader can receive and serve data for a
partition.
• The other brokers will just be passive
replicas and synchronize the data.
• Each partition is going to have one leader, and
multiple ISR (in-sync replica).
• Baker at Sugarcube Corner.
• Toothless pet alligator, Gummy.
• Represents the element of laughter.
16. P is also for Producers and Pound Cake
• How do we get data in Kafka?
• Producers write data to topics (which are made up of
partitions).
• Producers automatically know to which broker and
partition to write to.
• Producers Message Keys: producers can choose to
send a key with a message. (i.e. string, number, .etc)
• If a key is sent, all messages for that key will
always go to the same partition.
• A key is sent if you need message ordering for a
specific field (i.e. truck_id).
• Parents are surprised to find out that
Pound Cake is a male Pegasus, and his twin
foal sister (Pumpkin Cake) is a female
unicorn… even though their parents are
both Earth ponies.
17. C is for Consumers and Princess Celestia
• How do we read data in Kafka?
• They read data from a topic (identified by name).
• Consumers know which broker to read from.
• In case of broker failures, consumers know how to recover.
• Data is read in order within each partition.
• Consumer Groups: consumers read data in consumer
groups.
• Each consumer within a group will read directly from
exclusive partitions.
• If you have more consumers than partitions, some will
be inactive.
• Most magical pony.
• Responsible for raising the sun to create
light in Equestria.
• Over 1,000 years old!
18. Z is for Zookeeper and Zecora
• Keeps track of status of the Kafka cluster nodes.
• Also keeps track of Kafka topics and partitions.
• Currently, Apache Kafka® uses Apache ZooKeeper™ to
store its metadata (i.e. location of partitions, the
configuration of topics).
⚠️ Apache Kafka is removing the Apache ZooKeeper
Dependency. ⚠️
• In 2019, they outlined a plan to break this
dependency and bring metadata management back
into Kafka itself.
• Female zebra shaman and herbalist.
• Always speaks in rhyme.
20. What now, developer?
• To get hands-on experience, follow the Quickstart.
• To learn more about Apache Kafka, check out the developer docs.
• Books and academic papers!