Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1
Rethinking Stream Processing
with Apache Kafka:
Applications vs. Clusters,
Streams vs. Databases
Michael G. Noll
Conflue...
2
0.11* Exactly-once
semantics
0.10 Data processing (Streams API)
0.9 Data integration (Connect API)
Intra-cluster
replica...
3
4
5
6
7
8
9
,
10
,
11
12
13
14
(Does NOT run inside
the Kafka brokers!)
15
(Does NOT run inside
the Kafka brokers!)
16
17
http://docs.confluent.io/current/cp-docker-images/docs/tutorials/kafka-streams-examples.html
18
19
Before
20
Before
With Kafka’s
Streams API
21
KStream<Integer, Integer> input =
builder.stream("numbers-topic");
// Stateless computation
KStream<Integer, Integer> d...
22
23
24
Linux Windows
25
26
27
28
29
30
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
https://kafka.apache.org/documenta...
31
32
33
34
35
36
37
38
39
40
41
42
…and more…
43
44
45
…and more…
46
47
2016 2017
First release of Kafka’s
Streams API (0.10.0.0)
today
Kafka Streams API in the wild
Kafka 0.10.2.1
In product...
48
49
50
…and more…
51
52
53*Available in Apache Kafka 0.11 (June 2017)
54
55
56
57
58
59
60
61
$ curl -sXGET http://localhost:7070/kafka-music/charts/top-five
[
{
"artist": "Subhumans",
"album": "Live In A Dive",
"...
62
…and more…
63
64
65
66
https://kafka.apache.org/documentation/streams
https://www.confluent.io/downloads/
http://docs.confluent.io/current/str...
67
Kafka Summit San Francisco
August 28, 2017
www.kafka-summit.org
Discount code: kafcom17
Use the Apache Kafka community ...
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

3

Partager

Télécharger pour lire hors ligne

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

Télécharger pour lire hors ligne

My talk at Strata Data Conference, London, May 2017.

https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57619

Abstract:
Modern businesses have data at their core, but this data is changing continuously. How can you harness this torrent of information in real time? The answer: stream processing.

The core platform for streaming data is Apache Kafka, and thousands of companies are using Kafka to transform and reshape their industries, including Netflix, Uber, PayPal, Airbnb, Goldman Sachs, Cisco, and Oracle. Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: to succeed, many technologies need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we engineers would like to work and how we actually end up working in practice.

Michael Noll explains how Apache Kafka helps you radically simplify your data processing architectures by building normal applications to serve your real-time processing needs rather than building clusters or similar special-purpose infrastructure—while still benefiting from properties typically associated exclusively with cluster technologies, like high scalability, distributed computing, and fault tolerance. Michael also covers Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced interactive queries functionality. Along the way, Michael shares common use cases that demonstrate that stream processing in practice often requires database-like functionality and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (for example, in the form of event-driven, containerized microservices). As you’ll see, Kafka makes such architectures equally viable for small-, medium-, and large-scale use cases.

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases

  1. 1. 1 Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases Michael G. Noll Confluent Strata Data Conference, London, May 2017
  2. 2. 2 0.11* Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 2015 2016 2017 Cluster mirroring0.7 2013 Apache Kafka: birthed as a messaging system, now a streaming platform
  3. 3. 3
  4. 4. 4
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. 8
  9. 9. 9 ,
  10. 10. 10 ,
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14 (Does NOT run inside the Kafka brokers!)
  15. 15. 15 (Does NOT run inside the Kafka brokers!)
  16. 16. 16
  17. 17. 17 http://docs.confluent.io/current/cp-docker-images/docs/tutorials/kafka-streams-examples.html
  18. 18. 18
  19. 19. 19 Before
  20. 20. 20 Before With Kafka’s Streams API
  21. 21. 21 KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); class PrintToConsoleProcessor implements Processor<K, V> { @Override public void init(ProcessorContext context) {} @Override void process(K key, V value) { System.out.println("Got value " + value); } @Override void punctuate(long timestamp) {} @Override void close() {} }
  22. 22. 22
  23. 23. 23
  24. 24. 24 Linux Windows
  25. 25. 25
  26. 26. 26
  27. 27. 27
  28. 28. 28
  29. 29. 29
  30. 30. 30 http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple https://kafka.apache.org/documentation/streams#streams_duality
  31. 31. 31
  32. 32. 32
  33. 33. 33
  34. 34. 34
  35. 35. 35
  36. 36. 36
  37. 37. 37
  38. 38. 38
  39. 39. 39
  40. 40. 40
  41. 41. 41
  42. 42. 42 …and more…
  43. 43. 43
  44. 44. 44
  45. 45. 45 …and more…
  46. 46. 46
  47. 47. 47 2016 2017 First release of Kafka’s Streams API (0.10.0.0) today Kafka Streams API in the wild Kafka 0.10.2.1 In production at LINE Corp., Japan 220+ million active users, processing millions of msg/s “Applying Kafka Streams for internal message delivery pipeline” https://engineering.linecorp.com/en/blog/detail/80
  48. 48. 48
  49. 49. 49
  50. 50. 50 …and more…
  51. 51. 51
  52. 52. 52
  53. 53. 53*Available in Apache Kafka 0.11 (June 2017)
  54. 54. 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. 58
  59. 59. 59
  60. 60. 60
  61. 61. 61 $ curl -sXGET http://localhost:7070/kafka-music/charts/top-five [ { "artist": "Subhumans", "album": "Live In A Dive", "name": "All Gone Dead", "plays": 126 }, { "artist": "Wheres The Pope?", "album": "PSI", "name": "Fear Of God", "plays": 115 }, ... ]
  62. 62. 62 …and more…
  63. 63. 63
  64. 64. 64
  65. 65. 65
  66. 66. 66 https://kafka.apache.org/documentation/streams https://www.confluent.io/downloads/ http://docs.confluent.io/current/streams/
  67. 67. 67 Kafka Summit San Francisco August 28, 2017 www.kafka-summit.org Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off Presented by Questions? We’re at booth #317 in the Exhibition Hall.
  • JieYao11

    Oct. 15, 2017
  • fendyzhong

    Jul. 15, 2017
  • StreamingAnalytics

    May. 31, 2017

My talk at Strata Data Conference, London, May 2017. https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/57619 Abstract: Modern businesses have data at their core, but this data is changing continuously. How can you harness this torrent of information in real time? The answer: stream processing. The core platform for streaming data is Apache Kafka, and thousands of companies are using Kafka to transform and reshape their industries, including Netflix, Uber, PayPal, Airbnb, Goldman Sachs, Cisco, and Oracle. Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: to succeed, many technologies need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we engineers would like to work and how we actually end up working in practice. Michael Noll explains how Apache Kafka helps you radically simplify your data processing architectures by building normal applications to serve your real-time processing needs rather than building clusters or similar special-purpose infrastructure—while still benefiting from properties typically associated exclusively with cluster technologies, like high scalability, distributed computing, and fault tolerance. Michael also covers Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced interactive queries functionality. Along the way, Michael shares common use cases that demonstrate that stream processing in practice often requires database-like functionality and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (for example, in the form of event-driven, containerized microservices). As you’ll see, Kafka makes such architectures equally viable for small-, medium-, and large-scale use cases.

Vues

Nombre de vues

839

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

3

Actions

Téléchargements

34

Partages

0

Commentaires

0

Mentions J'aime

3

×