Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

262 vues

Publié le

Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses,planes, posts to social media, digital pictures and video and so on.Simple data collection is not enough anymore. Most of the current systems do data processing via nightly extract, transform, and load (ETL)operations, which is common in enterprise environments, requires decision makers to wait an entire day (or night) for reports to become available.

But businesses don’t want «Big Data» anymore. They want «Fast Data».What distinguishes a «streaming systems» from the batch systems is that the event stream is unbounded or “infinite” from a system perspective.

Decision-makers need to analyze these streaming events as a whole to make business decisions as new information arrives.In this talk, after a short introduction to common approaches and architectures (lambda, kappa), Viktor will demonstrate how to use open-source steam processing tools (Flink, Kafka Streams, Hazelcast Jet) for stream processing.

  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

  1. 1. @gamussa @confluentinc @thephillyjug Divide, Distribute and Conquer:
 Stream v. Batch
  2. 2. Stream v. Batch
  3. 3. Who am I?
  4. 4. Solutions Architect Who am I?
  5. 5. Solutions Architect Developer Advocate Who am I?
  6. 6. Solutions Architect Developer Advocate @gamussa in internetz Who am I?
  7. 7. Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes, you, go follow me in twitter © Who am I?
  8. 8. @gamussa @confluentinc @thephillyjug Disclaimer:
 

  9. 9. @gamussa @confluentinc @thephillyjug BATCH PROCESSING Data at rest
  10. 10. @gamussa @confluentinc @thephillyjug Data and Queries Origin and processing
  11. 11. @gamussa @confluentinc @thephillyjug
  12. 12. @gamussa @confluentinc @thephillyjug Data…
  13. 13. @gamussa @confluentinc @thephillyjug Data…
  14. 14. @gamussa @confluentinc @thephillyjug ✓ … inherently immutable Data… ✓ … time-based
  15. 15. @gamussa @confluentinc @thephillyjug CRUD -> CR
  16. 16. @gamussa @confluentinc @thephillyjug Processing is a query
  17. 17. @gamussa @confluentinc @thephillyjug Processing is a query Function on full data set
  18. 18. @gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection
  19. 19. @gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection Aggregations
  20. 20. @gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection Aggregations Joins
  21. 21. SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/07/2017" GROUP BY user_vote;
  22. 22. SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN "04/7/2017" AND "04/08/2017" GROUP BY user_vote;
  23. 23. SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/08/2007" GROUP BY user_vote;
  24. 24. @gamussa @confluentinc @thephillyjug Lambda architecture origins http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
  25. 25. @gamussa @confluentinc @thephillyjug Lambda Architecture
  26. 26. @gamussa @confluentinc @thephillyjug TFW Trying to explain modern big data landscape
  27. 27. @gamussa @confluentinc @thephillyjug Precomputed Results http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
  28. 28. @gamussa @confluentinc @thephillyjug Batch Process http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
  29. 29. @gamussa @confluentinc @thephillyjug STREAM PROCESSING Data is motion
  30. 30. @gamussa @confluentinc @thephillyjug Streaming Platform
  31. 31. @gamussa @confluentinc @thephillyjug Streaming Platform
  32. 32. @gamussa @confluentinc @thephillyjug Directed Acyclic Graph
  33. 33. @gamussa @confluentinc @thephillyjug DEMO
  34. 34. @gamussa @confluentinc @thephillyjug DEMO
  35. 35. @gamussa @confluentinc @thephillyjug Interesting cases Before You Go
  36. 36. I FOUND YOUR LACK OF FAULT TOLERANCE DISTURBING
  37. 37. Data is too important to store it in one computer
  38. 38. @gamussa @confluentinc @thephillyjug How to process «infinite» data?
  39. 39. @gamussa @confluentinc @thephillyjug Time model
  40. 40. @gamussa @confluentinc @thephillyjug Time model Different use cases time semantics
  41. 41. @gamussa @confluentinc @thephillyjug Time model Different use cases time semantics Majority of use cases require event- time semantics
  42. 42. @gamussa @confluentinc @thephillyjug Time model Different use cases time semantics Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time
  43. 43. @gamussa @confluentinc @thephillyjug Time Model
  44. 44. @gamussa @confluentinc @thephillyjug Time Model
  45. 45. @gamussa @confluentinc @thephillyjug Time Model
  46. 46. Finite Representation Of Infinite Data
  47. 47. @gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups events
  48. 48. @gamussa @confluentinc @thephillyjug https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
  49. 49. @gamussa @confluentinc @thephillyjug Windowing Input data, where colors represent
 different users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave
  50. 50. @gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
  51. 51. @gamussa @confluentinc @thephillyjug Fatality
  52. 52. @gamussa @confluentinc @thephillyjug Out-of-order and late data Is very common in practice, not a rare corner case ✗Related to time model discussion
  53. 53. @gamussa @confluentinc @thephillyjug Out-of-order and late data
  54. 54. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity
  55. 55. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight
  56. 56. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now
  57. 57. @gamussa @confluentinc @thephillyjug Stream Processing: results
  58. 58. @gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time
  59. 59. @gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet
  60. 60. @gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait
  61. 61. @gamussa @confluentinc @thephillyjug https://github.com/confluentinc/kafka-streams-examples
  62. 62. @gamussa @confluentinc @thephillyjug Thanks! questions? @gamussa viktor@confluent.io

×