Soumettre la recherche
Mettre en ligne
Spark Summit EU talk by Steve Loughran
•
3 j'aime
•
1,405 vues
Spark Summit
Suivre
Spark and Object Stores —What You Need to Know
Lire moins
Lire la suite
Données & analyses
Affichage du diaporama
Signaler
Partager
Affichage du diaporama
Signaler
Partager
1 sur 39
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
Recommandé
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
The Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Tachyon and Apache Spark
Tachyon and Apache Spark
rhatr
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
MongoDB
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
Apache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Reactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
Contenu connexe
Tendances
The Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Tachyon and Apache Spark
Tachyon and Apache Spark
rhatr
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
MongoDB
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
Apache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Reactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
Tendances
(20)
The Future of Apache Storm
The Future of Apache Storm
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Tachyon and Apache Spark
Tachyon and Apache Spark
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Cost-based Query Optimization
Cost-based Query Optimization
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Apache Spark and Object Stores
Apache Spark and Object Stores
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
Reactive app using actor model & apache spark
Reactive app using actor model & apache spark
En vedette
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
Spark Summit
Beyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden Karau
Spark Summit
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
Spark Summit
Spark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Spark Summit
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
Spark Summit
2do boletin emancipacion de la mujer
2do boletin emancipacion de la mujer
Colectivo chamampi
あいにきて IoT
あいにきて IoT
Yuki Higuchi
Leanforms folder panterra
Leanforms folder panterra
Anton Schaerlaeckens
Walden3 twin slideshare 01
Walden3 twin slideshare 01
Avi Dey
MSII service global
MSII service global
Gilles ROULIN
Afl presentation
Afl presentation
annacb19
Science and Nature Portfolio
Science and Nature Portfolio
ian cuming
9789740333616
9789740333616
CUPress
PJD101 First Class
PJD101 First Class
Yoshiaki Fujita
Culinary Arts Institute - programme
Culinary Arts Institute - programme
Hasmik Rostomyan
Techno-Freedom Seder Haggadah
Techno-Freedom Seder Haggadah
martine
En vedette
(19)
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
Beyond Parallelize and Collect by Holden Karau
Beyond Parallelize and Collect by Holden Karau
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
2do boletin emancipacion de la mujer
2do boletin emancipacion de la mujer
あいにきて IoT
あいにきて IoT
Leanforms folder panterra
Leanforms folder panterra
Walden3 twin slideshare 01
Walden3 twin slideshare 01
MSII service global
MSII service global
Afl presentation
Afl presentation
Science and Nature Portfolio
Science and Nature Portfolio
9789740333616
9789740333616
PJD101 First Class
PJD101 First Class
Culinary Arts Institute - programme
Culinary Arts Institute - programme
Techno-Freedom Seder Haggadah
Techno-Freedom Seder Haggadah
Similaire à Spark Summit EU talk by Steve Loughran
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
Steve Loughran
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
PUT is the new rename()
PUT is the new rename()
Steve Loughran
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Josh Elser
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
Tracing your security telemetry with Apache Metron
Tracing your security telemetry with Apache Metron
DataWorks Summit/Hadoop Summit
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
Druid deep dive
Druid deep dive
Kashif Khan
Hadoop in adtech
Hadoop in adtech
Yuta Imai
HiveWarehouseConnector
HiveWarehouseConnector
Eric Wohlstadter
Similaire à Spark Summit EU talk by Steve Loughran
(20)
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Apache Spark Crash Course
Apache Spark Crash Course
PUT is the new rename()
PUT is the new rename()
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Tracing your security telemetry with Apache Metron
Tracing your security telemetry with Apache Metron
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Druid deep dive
Druid deep dive
Hadoop in adtech
Hadoop in adtech
HiveWarehouseConnector
HiveWarehouseConnector
Plus de Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
Plus de Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Dernier
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
olyaivanovalion
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Delhi Call girls
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Delhi Call girls
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
adriantubila
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data 2023
ymrp368
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
olyaivanovalion
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
Delhi Call girls
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Call Girls in Nagpur High Profile Call Girls
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
manisha194592
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
olyaivanovalion
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
olyaivanovalion
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Anupama Kate
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Valters Lauzums
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Delhi Call girls
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
olyaivanovalion
Dernier
(20)
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data 2023
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Spark Summit EU talk by Steve Loughran
1.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Spark and Object Stores —What you need to know Steve Loughran stevel@hortonworks.com @steveloughran October 2016
2.
Steve Loughran, Hadoop committer,
PMC member, … Chris Nauroth, Apache Hadoop committer & PMC ASF member Rajesh Balamohan Tez Committer, PMC Member
3.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORC,
Parquet datasets inbound Elastic ETL HDFS external
4.
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved datasets external Notebooks library
5.
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming
6.
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved A Filesystem: Directories, Files à
Data / work pending part-00 part-01 00 00 00 01 01 01 complete part-01 rename("/work/pending/part-01", "/work/complete")
7.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Object Store: hash(name)->blob 00 00 00 01 01 s01
s02 s03 s04 hash("/work/pending/part-01") ["s02", "s03", "s04"] copy("/work/pending/part-01", "/work/complete/part01") 01 01 01 01 delete("/work/pending/part-01") hash("/work/pending/part-00") ["s01", "s02", "s04"]
8.
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved REST APIs 00 00 00 01 01 s01
s02 s03 s04 HEAD /work/complete/part-01 PUT /work/complete/part01 x-amz-copy-source: /work/pending/part-01 01 DELETE /work/pending/part-01 PUT /work/pending/part-01 ... DATA ... GET /work/pending/part-01 Content-Length: 1-8192 GET /?prefix=/work&delimiter=/
9.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Often: Eventually Consistent 00 00 00 01 01 s01
s02 s03 s04 01 DELETE /work/pending/part-00 GET /work/pending/part-00 GET /work/pending/part-00 200 200 200
10.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved org.apache.hadoop.fs.FileSystem hdfs
s3awasb adlswift gs
11.
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved s3://
—“inode on S3” s3n:// “Native” S3 s3a:// Replaces s3n swift:// OpenStack wasb:// Azure WASB s3a:// Stabilize oss:// Aliyun gs:// Google Cloud s3a:// Speed and consistency adl:// Azure Data Lake 2006 2008 2013 2014 2015 2016 s3:// Amazon EMR S3 History of Object Storage Support
12.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cloud Storage Connectors Azure
WASB ● Strongly consistent ● Good performance ● Well-tested on applications (incl. HBase) ADL ● Strongly consistent ● Tuned for big data analytics workloads Amazon Web Services S3A ● Eventually consistent - consistency work in progress by Hortonworks ● Performance improvements in progress ● Active development in Apache EMRFS ● Proprietary connector used in EMR ● Optional strong consistency for a cost Google Cloud Platform GCS ● Multiple configurable consistency policies ● Currently Google open source ● Good performance ● Could improve test coverage
13.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Four Challenges 1.
Classpath 2. Credentials 3. Code 4. Commitment Let's look At S3 and Azure
14.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use S3A to work with S3 (EMR:
use Amazon's s3:// )
15.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Classpath: fix “No FileSystem
for scheme: s3a” hadoop-aws-2.7.x.jar aws-java-sdk-1.7.4.jar joda-time-2.9.3.jar (jackson-*-2.6.5.jar) See SPARK-7481 Get Spark with Hadoop 2.7+ JARs
16.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Credentials core-site.xml
or spark-default.conf spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY spark-submit automatically propagates Environment Variables export AWS_ACCESS_KEY=MY_ACCESS_KEY export AWS_SECRET_KEY=MY_SECRET_KEY NEVER: share, check in to SCM, paste in bug reports…
17.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication Failure: 403 com.amazonaws.services.s3.model.AmazonS3Exception: The
request signature we calculated does not match the signature you provided. Check your key and signing method. 1. Check joda-time.jar & JVM version 2. Credentials wrong 3. Credentials not propagating 4. Local system clock (more likely on VMs)
18.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Code: Basic IO //
Read in public dataset val lines = sc.textFile("s3a://landsat-pds/scene_list.gz") val lineCount = lines.count() // generate and write data val numbers = sc.parallelize(1 to 10000) numbers.saveAsTextFile("s3a://hwdev-stevel-demo/counts") All you need is the URL
19.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Code: just use the URL of the object store val
csvdata = spark.read.options(Map( "header" -> "true", "inferSchema" -> "true", "mode" -> "FAILFAST")) .csv("s3a://landsat-pds/scene_list.gz") ...read time O(distance)
20.
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DataFrames val
landsat = "s3a://stevel-demo/landsat" csvData.write.parquet(landsat) val landsatOrc = "s3a://stevel-demo/landsatOrc" csvData.write.orc(landsatOrc) val df = spark.read.parquet(landsat) val orcDf = spark.read.parquet(landsatOrc)
21.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Finding dirty data with Spark SQL val
sqlDF = spark.sql( "SELECT id, acquisitionDate, cloudCover" + s" FROM parquet.`${landsat}`") val negativeClouds = sqlDF.filter("cloudCover < 0") negativeClouds.show() * filter columns and data early * whether/when to cache()? * copy popular data to HDFS
22.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved spark-default.conf spark.sql.parquet.filterPushdown
true spark.sql.parquet.mergeSchema false spark.hadoop.parquet.enable.summary-metadata false spark.sql.orc.filterPushdown true spark.sql.orc.splits.include.file.footer true spark.sql.orc.cache.stripe.details.size 10000 spark.sql.hive.metastorePartitionPruning true
23.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Notebooks?
Classpath & Credentials
24.
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Commitment Problem ⬢
rename() used for atomic commitment transaction ⬢ time to copy() + delete() proportional to data * files ⬢ S3: 6+ MB/s ⬢ Azure: a lot faster —usually spark.speculation false spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2 spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped true
25.
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What
about Direct Output Committers?
26.
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recent S3A Performance (Hadoop 2.8, HDP 2.5, CDH 5.9 (?)) //
forward seek by skipping stream spark.hadoop.fs.s3a.readahead.range 157810688 // faster backward seek for ORC and Parquet input spark.hadoop.fs.s3a.experimental.input.fadvise random // PUT blocks in separate threads spark.hadoop.fs.s3a.fast.output.enabled true
27.
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Azure Storage: wasb:// A full substitute for HDFS
28.
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Classpath: fix “No FileSystem
for scheme: wasb” wasb:// : Consistent, with very fast rename (hence: commits) hadoop-azure-2.7.x.jar azure-storage-2.2.0.jar + (jackson-core; http-components, hadoop-common)
29.
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Credentials: core-site.xml
/ spark-default.conf <property> <name>fs.azure.account.key.example.blob.core.windows.net</name> <value>0c0d44ac83ad7f94b0997b36e6e9a25b49a1394c</value> </property> spark.hadoop.fs.azure.account.key.example.blob.core.windows.net 0c0d44ac83ad7f94b0997b36e6e9a25b49a1394c wasb://demo@example.blob.core.windows.net
30.
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Azure Storage and Streaming val
streaming = new StreamingContext(sparkConf,Seconds(10)) val azure = "wasb://demo@example.blob.core.windows.net/in" val lines = streaming.textFileStream(azure) val matches = lines.map(line => { println(line) line }) matches.print() streaming.start() * PUT into the streaming directory * keep the dir clean * size window for slow scans
31.
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Not Covered ⬢
Partitioning/directory layout ⬢ Infrastructure Throttling ⬢ Optimal path names ⬢ Error handling ⬢ Metrics
32.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary ⬢
Object Stores look just like any other URL ⬢ …but do need classpath and configuration ⬢ Issues: performance, commitment ⬢ Use Hadoop 2.7+ JARs ⬢ Tune to reduce I/O ⬢ Keep those credentials secret!
33.
34.
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Backup Slides
35.
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dependencies in Hadoop 2.8 hadoop-aws-2.8.x.jar aws-java-sdk-core-1.10.6.jar aws-java-sdk-kms-1.10.6.jar aws-java-sdk-s3-1.10.6.jar joda-time-2.9.3.jar (jackson-*-2.6.5.jar) hadoop-aws-2.8.x.jar azure-storage-4.2.0.jar
36.
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved S3 Server-Side Encryption ⬢
Encryption of data at rest at S3 ⬢ Supports the SSE-S3 option: each object encrypted by a unique key using AES-256 cipher ⬢ Now covered in S3A automated test suites ⬢ Support for additional options under development (SSE-KMS and SSE-C)
37.
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Advanced authentication <property> <name>fs.s3a.aws.credentials.provider</name> <value> org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider, org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider, com.amazonaws.auth.EnvironmentVariableCredentialsProvider, com.amazonaws.auth.InstanceProfileCredentialsProvider, org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider </value> </property> +encrypted
credentials in JECKS files on HDFS
38.
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What
Next? Performance and integration
39.
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Next Steps for all
Object Stores ⬢ Output Committers – Logical commit operation decoupled from rename (non-atomic and costly in object stores) ⬢ Object Store Abstraction Layer – Avoid impedance mismatch with FileSystem API – Provide specific APIs for better integration with object stores: saving, listing, copying ⬢ Ongoing Performance Improvement ⬢ Consistency
Télécharger maintenant