SlideShare une entreprise Scribd logo
DIVIDE, DISTRIBUTE AND CONQUER:

STREAM V. BATCH
Stream v. Batch
Who am I?
Solutions Architect
Who am I?
Solutions Architect
Developer Advocate
Who am I?
Solutions Architect
Developer Advocate
@gamussa in internetz
Who am I?
Solutions Architect
Developer Advocate
@gamussa in internetz
Hey you, yes, you, go follow me in twitter ©
Who am I?
@gamussa @confluentinc @DataSciCon
BATCH PROCESSING
Data at rest
@gamussa @confluentinc @DataSciCon
Data and Queries
Origin and processing
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
Data…
@gamussa @confluentinc @DataSciCon
Data…
@gamussa @confluentinc @DataSciCon
✓ … inherently immutable
Data…
✓ … time-based
@gamussa @confluentinc @DataSciCon
CRUD -> CR
@gamussa @confluentinc @DataSciCon
Processing is a query
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
Aggregations
@gamussa @confluentinc @DataSciCon
Processing is a query
Function on full data set
Projection
Aggregations
Joins
@gamussa @confluentinc @DataSciCon
Lambda architecture origins
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
@gamussa @confluentinc @DataSciCon
https://mapr.com/developercentral/lambda-architecture/
Lambda Architecture
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
TFW Trying to explain modern big data
landscape
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
STREAM PROCESSING
Data is motion
@gamussa @confluentinc @DataSciCon
Streaming Platform
@gamussa @confluentinc @DataSciCon
Streaming Platform
@gamussa @confluentinc @DataSciCon
@gamussa @confluentinc @DataSciCon
Interesting cases
Before You Go
I FOUND YOUR LACK OF FAULT TOLERANCE
DISTURBING
Data is too important to
store it in one computer
@gamussa @confluentinc @DataSciCon
How to process
«infinite» data?
@gamussa @confluentinc @DataSciCon
Time model
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
Majority of use cases require event-
time semantics
@gamussa @confluentinc @DataSciCon
Time model
Different use cases time semantics
Majority of use cases require event-
time semantics
Other use cases may require
processing-time or special variants
like ingestion-time
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Time Model
@gamussa @confluentinc @DataSciCon
Windowing
Input data, where
colors represent

different users events
Rectangles denote

different event-time

windows
processing-time
event-time
windowing
alice
bob
dave
@gamussa @confluentinc @DataSciCon
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
@gamussa @confluentinc @DataSciCon
Windowing
Windowing is an operation that groups
events
Most commonly needed: time windows,
session windows
Examples:
✗Real-time monitoring: 5-minute averages
✗Reader behavior on a website: user browsing sessions
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Is very common in practice, not a rare
corner case
✗Related to time model discussion
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
Emails are being written

during the 10h flight
@gamussa @confluentinc @DataSciCon
Out-of-order and late data
Users with mobile phones enter

airplane, lose Internet connectivity
Emails are being written

during the 10h flight
Internet connectivity is restored,

phones will send queued emails now
@gamussa @confluentinc @DataSciCon
Stream Processing: results
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
• Windows – finite view of infinite data
• Based on temporal characteristics of the evet
@gamussa @confluentinc @DataSciCon
Stream Processing: results
• Yes, it’s possible to get computation
results in real time
• Windows – finite view of infinite data
• Based on temporal characteristics of the evet
• Late event processing
• You choose how long to wait
@gamussa @confluentinc @DataSciCon
DEMO
Let’s analyze flights
@gamussa @confluentinc @DataSciCon
https://www.confluent.io/blog/predicting-flight-arrivals-with-the-apache-kafka-streams-api/
@gamussa @confluentinc @DataSciCon
Example: Training Flight Prediction Model
@gamussa @confluentinc @DataSciCon
https://github.com/confluentinc/online-inferencing-blog-
application
@gamussa @confluentinc @DataSciCon
Thanks!
questions?
@gamussa
viktor@confluent.io

Contenu connexe

Similaire à [DataSciCon] Divide, distribute and conquer stream v. batch

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

Similaire à [DataSciCon] Divide, distribute and conquer stream v. batch (20)

Distributed caching for your next node.js project cf summit - 06-15-2017
Distributed caching for your next node.js project   cf summit - 06-15-2017Distributed caching for your next node.js project   cf summit - 06-15-2017
Distributed caching for your next node.js project cf summit - 06-15-2017
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Managing Creativity
Managing CreativityManaging Creativity
Managing Creativity
 
GrabCAD Print Announcement
GrabCAD Print AnnouncementGrabCAD Print Announcement
GrabCAD Print Announcement
 
2020 06-03 cukenfest-bdd-and-sl_os
2020 06-03 cukenfest-bdd-and-sl_os2020 06-03 cukenfest-bdd-and-sl_os
2020 06-03 cukenfest-bdd-and-sl_os
 
Reactive data analysis with vert.x
Reactive data analysis with vert.xReactive data analysis with vert.x
Reactive data analysis with vert.x
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
 
Damag - EmPower your BI Architecture
Damag - EmPower your BI ArchitectureDamag - EmPower your BI Architecture
Damag - EmPower your BI Architecture
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Inextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AIInextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AI
 
Designing for Everyone: Building great web experiences for any device
Designing for Everyone: Building great web experiences for any deviceDesigning for Everyone: Building great web experiences for any device
Designing for Everyone: Building great web experiences for any device
 
Our application got popular and now it breaks
Our application got popular and now it breaksOur application got popular and now it breaks
Our application got popular and now it breaks
 
Our application got popular and now it breaks
Our application got popular and now it breaksOur application got popular and now it breaks
Our application got popular and now it breaks
 
Milestones, SHUV, Roadmaps - Oh My!
Milestones, SHUV, Roadmaps - Oh My!Milestones, SHUV, Roadmaps - Oh My!
Milestones, SHUV, Roadmaps - Oh My!
 
Milestones, SHUV, Roadmaps - Oh My!
Milestones, SHUV, Roadmaps - Oh My!Milestones, SHUV, Roadmaps - Oh My!
Milestones, SHUV, Roadmaps - Oh My!
 
Crafting an Analytics Strategy
Crafting an Analytics StrategyCrafting an Analytics Strategy
Crafting an Analytics Strategy
 
Introduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformIntroduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data Platform
 

Plus de Viktor Gamov

Распределяй и властвуй — 2: Потоки данных наносят ответный удар
Распределяй и властвуй — 2: Потоки данных наносят ответный ударРаспределяй и властвуй — 2: Потоки данных наносят ответный удар
Распределяй и властвуй — 2: Потоки данных наносят ответный удар
Viktor Gamov
 
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
Viktor Gamov
 
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
WebSockets: The Current State of the Most Valuable HTML5 API for Java DevelopersWebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
Viktor Gamov
 

Plus de Viktor Gamov (14)

Testing containers with TestContainers @ AJUG 7/18/2017
Testing containers with TestContainers @ AJUG 7/18/2017Testing containers with TestContainers @ AJUG 7/18/2017
Testing containers with TestContainers @ AJUG 7/18/2017
 
[Philly ETE] Java Puzzlers NG
[Philly ETE] Java Puzzlers NG[Philly ETE] Java Puzzlers NG
[Philly ETE] Java Puzzlers NG
 
Распределяй и властвуй — 2: Потоки данных наносят ответный удар
Распределяй и властвуй — 2: Потоки данных наносят ответный ударРаспределяй и властвуй — 2: Потоки данных наносят ответный удар
Распределяй и властвуй — 2: Потоки данных наносят ответный удар
 
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
[JBreak] Блеск И Нищета Распределенных Стримов - 04-04-2017
 
[OracleCode - SF] Distributed caching for your next node.js project
[OracleCode - SF] Distributed caching for your next node.js project[OracleCode - SF] Distributed caching for your next node.js project
[OracleCode - SF] Distributed caching for your next node.js project
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast
 
[Jfokus] Riding the Jet Streams
[Jfokus] Riding the Jet Streams[Jfokus] Riding the Jet Streams
[Jfokus] Riding the Jet Streams
 
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
 
[Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"![Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"!
 
[JokerConf] Верхом на реактивных стримах, 10/13/2016
[JokerConf] Верхом на реактивных стримах, 10/13/2016[JokerConf] Верхом на реактивных стримах, 10/13/2016
[JokerConf] Верхом на реактивных стримах, 10/13/2016
 
JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»
JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»
JavaOne 2013: «Java and JavaScript - Shaken, Not Stirred»
 
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
WebSockets: The Current State of the Most Valuable HTML5 API for Java DevelopersWebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
WebSockets: The Current State of the Most Valuable HTML5 API for Java Developers
 
Functional UI testing of Adobe Flex RIA
Functional UI testing of Adobe Flex RIAFunctional UI testing of Adobe Flex RIA
Functional UI testing of Adobe Flex RIA
 
Testing Flex RIAs for NJ Flex user group
Testing Flex RIAs for NJ Flex user groupTesting Flex RIAs for NJ Flex user group
Testing Flex RIAs for NJ Flex user group
 

Dernier

Dernier (20)

Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 

[DataSciCon] Divide, distribute and conquer stream v. batch