SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Milan – July 13 2016
Introduction to
Distributed Computing Engines for
Data Processing
Simone Robutti
Machine Learning Engineer at Radicalbit
@SimoneRobutti
What is a Distributed Computing System
It’s the solution to the problem where your
RAM is too small and your data are too big
and/or too CPU-intensive to be processed on
a single machine.
What is a Distributed Computing System
Solution: a huge, monolithic mainframe.
What is a Distributed Computing System
Solution: a huge, monolithic mainframe.
What is a Distributed Computing System
Solution: do your job on a cluster.
Distributed vs Parallel
Parallel: execute identical tasks (with different data
or parameters).
Parallel Distributed: do this on multiple machines.
Distributed: split a big task into smaller tasks and
execute them on multiple machines
What is a Distributed Computing System
Goal: the programmer should write its
programs easily and efficiently without caring
about distribution.
Issues: a cluster is complex and conceptually
very far from a local environment.
Hadoop
What: the first OSS distributed computing engine.
When: 2006 (work began).
Where: Google.
Why: Google had a lot of data (for that time) to process. They built
a solution. Eventually it became a series of papers and got
implemented as OSS.
How: HDFS (distributed file system), MapReduce (computational
abstraction), YARN (resource and cluster manager).
MapReduce - Example
Credits to @sergejusb
Hadoop Today
Common in many enterprise environments.
Still good enough for many batch processing use cases.
HDFS and Yarn widely used by other processing engines.
i.e.:
● Log analysis
● Clickstream analysis
● Text processing
Spark
What: a more generic distributed processing engine for batch and
streaming alike.
When: 2014 (1.0 Release).
Where: Berkeley + Databricks.
Why: They aimed for faster and more general processing with
better abstractions on top.
How: InMemory computing, DAG, RDD, polyglot functional API,
libraries out-of-the-box .
Resilient Distributed Dataset
RDDs hide the underlying distribution of data with a functional API
Directed Acyclic Graph
The graph is defined by the user. The runtime translates it to
operations on distributed data.
create filter
filter join
collect
map
Spark Today
The hot topic everyone talks about.
Just entered the phase of maturity, with an already huge and fast-
growing ecosystem of libraries, integrations and tools.
Widely used as the go-to solution for Big Data (and not-so-Big) use
cases.
I.e.:
● Recommending systems
● Fraud Detection
● Attack-Detection
● Near-Real time decision-heavy solutions
Flink
What: a streaming first (with batch on top), low latency distributed
processing engine.
When: 2016 (1.0 Release).
Where: German Research Foundation + dataArtisans.
Why: a faster and flexible computational model that could
guarantee low latency, high-throughput and fault-tolerance all at the
same time.
How: streaming-first approach, checkpointing, lazy symbolic
computation, powerful optimizations.
Flink Today
Perceived as an alternative to Spark. Gaining traction for specific
use-cases (real-time streaming) but performs well on most generic
uses cases.
Solid runtime and optimization; API and ecosystem still young.
Many big companies already adopted it for fast-data applications.
I.e.:
● Real-time precise analytics (counting)
● Real-time model evaluation
● Online Learning solutions
Alternative solutions
● Apache Storm/Heron
● Apache Samza
● Apache GearPump
● Apache Apex
Following next
"The Barclays Data Science Hackathon: Building
Retail Recommender Systems based on Customer
Shopping Behaviour" by Gianmario Spacagna, Senior Data
Scientist @ Pirelli
"Data intensive applications with Apache Flink" by
Simone Robutti, Machine Learning Engineer @ Radicalbit

Contenu connexe

Tendances

I2C Bus (Inter-Integrated Circuit)
I2C Bus (Inter-Integrated Circuit)I2C Bus (Inter-Integrated Circuit)
I2C Bus (Inter-Integrated Circuit)
Varun Mahajan
 
0021.system partitioning
0021.system partitioning0021.system partitioning
0021.system partitioning
sean chen
 

Tendances (20)

Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
I2C Bus (Inter-Integrated Circuit)
I2C Bus (Inter-Integrated Circuit)I2C Bus (Inter-Integrated Circuit)
I2C Bus (Inter-Integrated Circuit)
 
Basics of Vhdl
Basics of VhdlBasics of Vhdl
Basics of Vhdl
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 
Clock Tree Synthesis.pdf
Clock Tree Synthesis.pdfClock Tree Synthesis.pdf
Clock Tree Synthesis.pdf
 
ROS distributed architecture
ROS  distributed architectureROS  distributed architecture
ROS distributed architecture
 
SRAM
SRAMSRAM
SRAM
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Logic Synthesis
Logic SynthesisLogic Synthesis
Logic Synthesis
 
8259 a
8259 a8259 a
8259 a
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
I2C Protocol
I2C ProtocolI2C Protocol
I2C Protocol
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
1wire protocol
1wire protocol1wire protocol
1wire protocol
 
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorialAndes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Endianness
EndiannessEndianness
Endianness
 
0021.system partitioning
0021.system partitioning0021.system partitioning
0021.system partitioning
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 

En vedette

The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
Data Science Milan
 
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, RadicalbitData intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data Science Milan
 
Racial Profiling and Its Effects
Racial Profiling and Its EffectsRacial Profiling and Its Effects
Racial Profiling and Its Effects
Chey Bradley
 
Britton_NoAH World Package Design 10_Page_1-10
Britton_NoAH World Package Design 10_Page_1-10Britton_NoAH World Package Design 10_Page_1-10
Britton_NoAH World Package Design 10_Page_1-10
Patti Britton
 

En vedette (20)

Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
 
H2O for IoT - Jo-Fai (Joe) Chow, H2O
H2O for IoT - Jo-Fai (Joe) Chow, H2OH2O for IoT - Jo-Fai (Joe) Chow, H2O
H2O for IoT - Jo-Fai (Joe) Chow, H2O
 
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
 
Inaugural talk Data Science Milan - Gianmario Spacagna
Inaugural talk Data Science Milan - Gianmario SpacagnaInaugural talk Data Science Milan - Gianmario Spacagna
Inaugural talk Data Science Milan - Gianmario Spacagna
 
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, RadicalbitData intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
 
mlk-newsletter-april-2013
mlk-newsletter-april-2013mlk-newsletter-april-2013
mlk-newsletter-april-2013
 
La revolución-verde-kappa-cornforth
La revolución-verde-kappa-cornforthLa revolución-verde-kappa-cornforth
La revolución-verde-kappa-cornforth
 
Planning my blog1
Planning my blog1Planning my blog1
Planning my blog1
 
Racial Profiling and Its Effects
Racial Profiling and Its EffectsRacial Profiling and Its Effects
Racial Profiling and Its Effects
 
Equipo kappa-1
Equipo kappa-1Equipo kappa-1
Equipo kappa-1
 
Rajesh
RajeshRajesh
Rajesh
 
505LeePosterPresentation
505LeePosterPresentation505LeePosterPresentation
505LeePosterPresentation
 
Britton_NoAH World Package Design 10_Page_1-10
Britton_NoAH World Package Design 10_Page_1-10Britton_NoAH World Package Design 10_Page_1-10
Britton_NoAH World Package Design 10_Page_1-10
 
Instituciones administrativas del trabajo
Instituciones administrativas del trabajoInstituciones administrativas del trabajo
Instituciones administrativas del trabajo
 
Tecnologias de la comunicación y su infuencia
Tecnologias de la comunicación y su infuenciaTecnologias de la comunicación y su infuencia
Tecnologias de la comunicación y su infuencia
 
Proyecto de-química
Proyecto de-químicaProyecto de-química
Proyecto de-química
 
Bollards
BollardsBollards
Bollards
 
SMART Board PowerPoint
SMART Board PowerPointSMART Board PowerPoint
SMART Board PowerPoint
 
Como a evolucionado la tecnología
Como a evolucionado la tecnologíaComo a evolucionado la tecnología
Como a evolucionado la tecnología
 

Similaire à Introduction to Distributed Computing Engines for Data Processing - Simone Robutti, Radicalbit

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 

Similaire à Introduction to Distributed Computing Engines for Data Processing - Simone Robutti, Radicalbit (20)

The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big Data
Big DataBig Data
Big Data
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 

Plus de Data Science Milan

MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
Data Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
Data Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Data Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
Data Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Data Science Milan
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
Data Science Milan
 

Plus de Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
 

Dernier

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 

Dernier (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 

Introduction to Distributed Computing Engines for Data Processing - Simone Robutti, Radicalbit

  • 1. Milan – July 13 2016 Introduction to Distributed Computing Engines for Data Processing Simone Robutti Machine Learning Engineer at Radicalbit @SimoneRobutti
  • 2. What is a Distributed Computing System It’s the solution to the problem where your RAM is too small and your data are too big and/or too CPU-intensive to be processed on a single machine.
  • 3. What is a Distributed Computing System Solution: a huge, monolithic mainframe.
  • 4. What is a Distributed Computing System Solution: a huge, monolithic mainframe.
  • 5. What is a Distributed Computing System Solution: do your job on a cluster.
  • 6. Distributed vs Parallel Parallel: execute identical tasks (with different data or parameters). Parallel Distributed: do this on multiple machines. Distributed: split a big task into smaller tasks and execute them on multiple machines
  • 7. What is a Distributed Computing System Goal: the programmer should write its programs easily and efficiently without caring about distribution. Issues: a cluster is complex and conceptually very far from a local environment.
  • 8. Hadoop What: the first OSS distributed computing engine. When: 2006 (work began). Where: Google. Why: Google had a lot of data (for that time) to process. They built a solution. Eventually it became a series of papers and got implemented as OSS. How: HDFS (distributed file system), MapReduce (computational abstraction), YARN (resource and cluster manager).
  • 10. Hadoop Today Common in many enterprise environments. Still good enough for many batch processing use cases. HDFS and Yarn widely used by other processing engines. i.e.: ● Log analysis ● Clickstream analysis ● Text processing
  • 11. Spark What: a more generic distributed processing engine for batch and streaming alike. When: 2014 (1.0 Release). Where: Berkeley + Databricks. Why: They aimed for faster and more general processing with better abstractions on top. How: InMemory computing, DAG, RDD, polyglot functional API, libraries out-of-the-box .
  • 12. Resilient Distributed Dataset RDDs hide the underlying distribution of data with a functional API
  • 13. Directed Acyclic Graph The graph is defined by the user. The runtime translates it to operations on distributed data. create filter filter join collect map
  • 14. Spark Today The hot topic everyone talks about. Just entered the phase of maturity, with an already huge and fast- growing ecosystem of libraries, integrations and tools. Widely used as the go-to solution for Big Data (and not-so-Big) use cases. I.e.: ● Recommending systems ● Fraud Detection ● Attack-Detection ● Near-Real time decision-heavy solutions
  • 15. Flink What: a streaming first (with batch on top), low latency distributed processing engine. When: 2016 (1.0 Release). Where: German Research Foundation + dataArtisans. Why: a faster and flexible computational model that could guarantee low latency, high-throughput and fault-tolerance all at the same time. How: streaming-first approach, checkpointing, lazy symbolic computation, powerful optimizations.
  • 16. Flink Today Perceived as an alternative to Spark. Gaining traction for specific use-cases (real-time streaming) but performs well on most generic uses cases. Solid runtime and optimization; API and ecosystem still young. Many big companies already adopted it for fast-data applications. I.e.: ● Real-time precise analytics (counting) ● Real-time model evaluation ● Online Learning solutions
  • 17. Alternative solutions ● Apache Storm/Heron ● Apache Samza ● Apache GearPump ● Apache Apex
  • 18. Following next "The Barclays Data Science Hackathon: Building Retail Recommender Systems based on Customer Shopping Behaviour" by Gianmario Spacagna, Senior Data Scientist @ Pirelli "Data intensive applications with Apache Flink" by Simone Robutti, Machine Learning Engineer @ Radicalbit