SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
YeezyScore
A comparison of stream
processing software
By: Kat Chuang
@katychuang
10 mins
High level overview
Kat Chuang @katychuang
Batch
Streaming
Microbatching
Storm Trident Spark Streaming
Released 2011 2010
Delivery
Semantics
Exactly Once Exactly once
State Management Yes Yes
Latency Seconds Seconds
Output MapState Resilient Distributed
Dataset (RDD)
Throughput 10k/nodes/sec? 400k/nodes/sec?
Test Cases Metrics
1. Does every message pass
through the pipeline?
2. How fast does each message
take to process?
Data
1. Timestamps
Kat Chuang @katychuang
Timestamp1 (Timestamp1,
Timestamp2)
(Timestamp1,
Timestamp2)
Timestamp1
Pipelines
Kat Chuang @katychuang
1. Does every message pass
through the pipeline?
Kat Chuang @katychuang
This is a scatterplot
2. How fast does each
message take to
process?
Kat Chuang @katychuang
This is a scatterplot
Storm Trident Vs Spark Streaming
Storm Trident Spark Streaming
Stream processing framework
that also does micro-batching.
Great for transforming or
computing as data flows in.
Complex event processing
(CEP), continuous computation.
Task-Parallel Computations, i.
e. reading Twitter streams
Batch processing framework
that also does micro-batching.
Great for combining with
historical data.
ML algos included. Requires
HDFS-backed data source.
Data-Parallel Computations, i.
e. offering recommendations
Kat Chuang
Data Engineering Fellow
#DE-2015c
hello@katychuang.com
Github: katychuang
Twitter: katychuang
IG: katychuang.nyc

Contenu connexe

Tendances

Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
 
VeriFlow Presentation
VeriFlow PresentationVeriFlow Presentation
VeriFlow PresentationKrystle Bates
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examplesPeter Lawrey
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Chronicle accelerate building a digital currency
Chronicle accelerate   building a digital currencyChronicle accelerate   building a digital currency
Chronicle accelerate building a digital currencyPeter Lawrey
 
Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayQAware GmbH
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftDmitry Sotnikov
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Steffen Gebert
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Sri Prasanna
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed DebuggingAnant Narayanan
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talkChris Lohfink
 
Batch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastBatch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastMarc Sturlese
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 

Tendances (20)

Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
 
VeriFlow Presentation
VeriFlow PresentationVeriFlow Presentation
VeriFlow Presentation
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Chronicle accelerate building a digital currency
Chronicle accelerate   building a digital currencyChronicle accelerate   building a digital currency
Chronicle accelerate building a digital currency
 
Clocks
ClocksClocks
Clocks
 
Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice Way
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack Swift
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed Debugging
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talk
 
Batch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastBatch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fast
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Bathcamp 2010-riak
Bathcamp 2010-riakBathcamp 2010-riak
Bathcamp 2010-riak
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 

En vedette

Comparison of supportive interactions
Comparison of supportive interactionsComparison of supportive interactions
Comparison of supportive interactionsKat Chuang
 
DjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsDjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsKat Chuang
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defenseKat Chuang
 
NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013Kat Chuang
 
Python talk web frameworks
Python talk web frameworksPython talk web frameworks
Python talk web frameworksKat Chuang
 

En vedette (6)

Seven
SevenSeven
Seven
 
Comparison of supportive interactions
Comparison of supportive interactionsComparison of supportive interactions
Comparison of supportive interactions
 
DjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsDjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clients
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013
 
Python talk web frameworks
Python talk web frameworksPython talk web frameworks
Python talk web frameworks
 

Similaire à Insight DE project

strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration HellDatabricks
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkReynold Xin
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkDatabricks
 
Tecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosTecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosAngel Giraldo
 
Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)NetProtocol Xpert
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptAbhijitManna19
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptsnowflakebatch
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingShidrokhGoudarzi1
 
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...Zhen Ming (Jack) Jiang
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzaAbhishek Shivanna
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
 

Similaire à Insight DE project (20)

strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Malstone KDD 2010
Malstone KDD 2010Malstone KDD 2010
Malstone KDD 2010
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
Tecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosTecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de Datos
 
Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streaming
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 

Plus de Kat Chuang

Android NFC Nuance
Android NFC NuanceAndroid NFC Nuance
Android NFC NuanceKat Chuang
 
rheumatological diseases
rheumatological diseasesrheumatological diseases
rheumatological diseasesKat Chuang
 
Mayans and Chocolate
Mayans and ChocolateMayans and Chocolate
Mayans and ChocolateKat Chuang
 
Nurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingNurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingKat Chuang
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceKat Chuang
 
Candidacy Exam
Candidacy ExamCandidacy Exam
Candidacy ExamKat Chuang
 
How to become a top notch scholar
How to become a top notch scholarHow to become a top notch scholar
How to become a top notch scholarKat Chuang
 
Helping you to help me (slides)
Helping you to help me (slides)Helping you to help me (slides)
Helping you to help me (slides)Kat Chuang
 
Helping you to help me
Helping you to help meHelping you to help me
Helping you to help meKat Chuang
 

Plus de Kat Chuang (10)

Android NFC Nuance
Android NFC NuanceAndroid NFC Nuance
Android NFC Nuance
 
rheumatological diseases
rheumatological diseasesrheumatological diseases
rheumatological diseases
 
Mayans and Chocolate
Mayans and ChocolateMayans and Chocolate
Mayans and Chocolate
 
Nurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingNurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social Networking
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experience
 
Candidacy Exam
Candidacy ExamCandidacy Exam
Candidacy Exam
 
How to become a top notch scholar
How to become a top notch scholarHow to become a top notch scholar
How to become a top notch scholar
 
Prospectus
Prospectus Prospectus
Prospectus
 
Helping you to help me (slides)
Helping you to help me (slides)Helping you to help me (slides)
Helping you to help me (slides)
 
Helping you to help me
Helping you to help meHelping you to help me
Helping you to help me
 

Dernier

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Insight DE project

  • 1. YeezyScore A comparison of stream processing software By: Kat Chuang @katychuang
  • 3. High level overview Kat Chuang @katychuang Batch Streaming Microbatching Storm Trident Spark Streaming Released 2011 2010 Delivery Semantics Exactly Once Exactly once State Management Yes Yes Latency Seconds Seconds Output MapState Resilient Distributed Dataset (RDD) Throughput 10k/nodes/sec? 400k/nodes/sec?
  • 4. Test Cases Metrics 1. Does every message pass through the pipeline? 2. How fast does each message take to process? Data 1. Timestamps Kat Chuang @katychuang
  • 6. 1. Does every message pass through the pipeline? Kat Chuang @katychuang This is a scatterplot
  • 7. 2. How fast does each message take to process? Kat Chuang @katychuang This is a scatterplot
  • 8. Storm Trident Vs Spark Streaming Storm Trident Spark Streaming Stream processing framework that also does micro-batching. Great for transforming or computing as data flows in. Complex event processing (CEP), continuous computation. Task-Parallel Computations, i. e. reading Twitter streams Batch processing framework that also does micro-batching. Great for combining with historical data. ML algos included. Requires HDFS-backed data source. Data-Parallel Computations, i. e. offering recommendations
  • 9. Kat Chuang Data Engineering Fellow #DE-2015c hello@katychuang.com Github: katychuang Twitter: katychuang IG: katychuang.nyc