Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

•

4 likes•926 views

My talk at Strata Data Conference, London, May 2017. https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57619 Abstract: Modern businesses have data at their core, but this data is changing continuously. How can you harness this torrent of information in real time? The answer: stream processing. The core platform for streaming data is Apache Kafka, and thousands of companies are using Kafka to transform and reshape their industries, including Netflix, Uber, PayPal, Airbnb, Goldman Sachs, Cisco, and Oracle. Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: to succeed, many technologies need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we engineers would like to work and how we actually end up working in practice. Michael Noll explains how Apache Kafka helps you radically simplify your data processing architectures by building normal applications to serve your real-time processing needs rather than building clusters or similar special-purpose infrastructure—while still benefiting from properties typically associated exclusively with cluster technologies, like high scalability, distributed computing, and fault tolerance. Michael also covers Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced interactive queries functionality. Along the way, Michael shares common use cases that demonstrate that stream processing in practice often requires database-like functionality and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (for example, in the form of event-driven, containerized microservices). As you’ll see, Kafka makes such architectures equally viable for small-, medium-, and large-scale use cases.

Data & Analytics

1
Rethinking Stream Processing
with Apache Kafka:
Applications vs. Clusters,
Streams vs. Databases
Michael G. Noll
Confluent
Strata Data Conference, London, May 2017

2
0.11* Exactly-once
semantics
0.10 Data processing (Streams API)
0.9 Data integration (Connect API)
Intra-cluster
replication
0.8
2012 2014 2015 2016 2017
Cluster mirroring0.7
2013
Apache Kafka: birthed as a messaging system, now a streaming platform

14
(Does NOT run inside
the Kafka brokers!)

15
(Does NOT run inside
the Kafka brokers!)

17
http://docs.confluent.io/current/cp-docker-images/docs/tutorials/kafka-streams-examples.html

21
KStream<Integer, Integer> input =
builder.stream("numbers-topic");
// Stateless computation
KStream<Integer, Integer> doubled =
input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.groupByKey()
.reduce((v1, v2) -> v1 + v2, "sum-of-odds");
class PrintToConsoleProcessor
implements Processor<K, V> {
@Override
public void init(ProcessorContext context) {}
@Override
void process(K key, V value) {
System.out.println("Got value " + value);
}
@Override
void punctuate(long timestamp) {}
@Override
void close() {}
}

30
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
https://kafka.apache.org/documentation/streams#streams_duality

47
2016 2017
First release of Kafka’s
Streams API (0.10.0.0)
today
Kafka Streams API in the wild
Kafka 0.10.2.1
In production at LINE Corp., Japan
220+ million active users, processing millions of msg/s
“Applying Kafka Streams for internal message delivery pipeline”
https://engineering.linecorp.com/en/blog/detail/80

53*Available in Apache Kafka 0.11 (June 2017)

61
$ curl -sXGET http://localhost:7070/kafka-music/charts/top-five
[
{
"artist": "Subhumans",
"album": "Live In A Dive",
"name": "All Gone Dead",
"plays": 126
},
{
"artist": "Wheres The Pope?",
"album": "PSI",
"name": "Fear Of God",
"plays": 115
},
...
]

66
https://kafka.apache.org/documentation/streams
https://www.confluent.io/downloads/
http://docs.confluent.io/current/streams/

67
Kafka Summit San Francisco
August 28, 2017
www.kafka-summit.org
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
Presented by Questions? We’re at booth #317 in the Exhibition Hall.

What's hot

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent

Apache kafka-a distributed streaming platformconfluent

Monitoring Apache Kafka with Confluent Control Center confluent

What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent

Introduction to apache kafka, confluent and why they matterPaolo Castagna

Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRconfluent

KSQL Introconfluent

Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex

KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner

The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede

Streaming ETL with Apache Kafka and KSQLNick Dearden

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)Keigo Suda

Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent

A Streaming Platform Architecture Based on Apache Kafkaconfluent

Intro to AsyncAPIconfluent

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent

Capture the Streams of Database Changesconfluent

Kafka 탄생과 생태계Gee Yeol Nahm

What's hot (20)

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

Apache kafka-a distributed streaming platform

Monitoring Apache Kafka with Confluent Control Center

What's new in Confluent 3.2 and Apache Kafka 0.10.2

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...

Introduction to apache kafka, confluent and why they matter

Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR

KSQL Intro

Confluent building a real-time streaming platform using kafka streams and k...

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Streaming ETL with Apache Kafka and KSQL

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)

Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...

A Streaming Platform Architecture Based on Apache Kafka

Intro to AsyncAPI

Introduction to Apache Kafka and Confluent... and why they matter

Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL

Capture the Streams of Database Changes

Kafka 탄생과 생태계

Similar to Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

Introducing Kafka's Streams APIconfluent

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLKai Wähner

Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna

Kafka Streams for Java enthusiastsSlim Baltagi

Streaming etl in practice with postgre sql, apache kafka, and ksql micBas van Oudenaarde

Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares

Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion

Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan

Kafka Vienna Meetup 020719Patrik Kleindl

Introduction to Apache Kafka and Confluent... and why they matter!Paolo Castagna

Spark streaming state of the unionDatabricks

Spark streaming State of the Union - Strata San Jose 2015Databricks

High Performance Processing of Streaming DataGeoffrey Fox

Big data apache spark + scalaJuantomás García Molina

Strata NYC 2015: What's new in Spark StreamingDatabricks

Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner

Edbt19 paper 329LUIS ALBEIRO GIRALDO BETANCOURTH

Apache Kafka - A Distributed Streaming PlatformPaolo Castagna

Similar to Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases (20)

Introducing Kafka's Streams API

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL

Un'introduzione a Kafka Streams e KSQL... and why they matter!

Kafka Streams for Java enthusiasts

Streaming etl in practice with postgre sql, apache kafka, and ksql mic

Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...

Building Scalable Data Pipelines - 2016 DataPalooza Seattle

Kafka Vienna Meetup 020719

Introduction to Apache Kafka and Confluent... and why they matter!

Spark streaming state of the union

Spark streaming State of the Union - Strata San Jose 2015

High Performance Processing of Streaming Data

Big data apache spark + scala

Strata NYC 2015: What's new in Spark Streaming

Kafka Connect and Streams (Concepts, Architecture, Features)

Edbt19 paper 329

Apache Kafka - A Distributed Streaming Platform

Recently uploaded

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Predicting Loan Approval: A Data Science Project

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

Aspirational Block Program Block Syaldey District - Almora

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Anomaly detection and data imputation within time series

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

1. 1 Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases Michael G. Noll Confluent Strata Data Conference, London, May 2017

2. 2 0.11* Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 2015 2016 2017 Cluster mirroring0.7 2013 Apache Kafka: birthed as a messaging system, now a streaming platform

3. 3

4. 4

5. 5

6. 6

7. 7

8. 8

9. 9 ,

10. 10 ,

11. 11

12. 12

13. 13

14. 14 (Does NOT run inside the Kafka brokers!)

15. 15 (Does NOT run inside the Kafka brokers!)

16. 16

17. 17 http://docs.confluent.io/current/cp-docker-images/docs/tutorials/kafka-streams-examples.html

18. 18

19. 19 Before

20. 20 Before With Kafka’s Streams API

21. 21 KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); class PrintToConsoleProcessor implements Processor<K, V> { @Override public void init(ProcessorContext context) {} @Override void process(K key, V value) { System.out.println("Got value " + value); } @Override void punctuate(long timestamp) {} @Override void close() {} }

22. 22

23. 23

24. 24 Linux Windows

25. 25

26. 26

27. 27

28. 28

29. 29

30. 30 http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple https://kafka.apache.org/documentation/streams#streams_duality

31. 31

32. 32

33. 33

34. 34

35. 35

36. 36

37. 37

38. 38

39. 39

40. 40

41. 41

42. 42 …and more…

43. 43

44. 44

45. 45 …and more…

46. 46

47. 47 2016 2017 First release of Kafka’s Streams API (0.10.0.0) today Kafka Streams API in the wild Kafka 0.10.2.1 In production at LINE Corp., Japan 220+ million active users, processing millions of msg/s “Applying Kafka Streams for internal message delivery pipeline” https://engineering.linecorp.com/en/blog/detail/80

48. 48

49. 49

50. 50 …and more…

51. 51

52. 52

53. 53*Available in Apache Kafka 0.11 (June 2017)

54. 54

55. 55

56. 56

57. 57

58. 58

59. 59

60. 60

61. 61 $ curl -sXGET http://localhost:7070/kafka-music/charts/top-five [ { "artist": "Subhumans", "album": "Live In A Dive", "name": "All Gone Dead", "plays": 126 }, { "artist": "Wheres The Pope?", "album": "PSI", "name": "Fear Of God", "plays": 115 }, ... ]

62. 62 …and more…

63. 63

64. 64

65. 65

66. 66 https://kafka.apache.org/documentation/streams https://www.confluent.io/downloads/ http://docs.confluent.io/current/streams/

67. 67 Kafka Summit San Francisco August 28, 2017 www.kafka-summit.org Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off Presented by Questions? We’re at booth #317 in the Exhibition Hall.

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

Similar to Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases (20)

Recently uploaded

Recently uploaded (20)

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases