Soumettre la recherche
Mettre en ligne
Getting Apache Spark Customers to Production
•
Télécharger en tant que PPTX, PDF
•
14 j'aime
•
1,622 vues
Cloudera, Inc.
Suivre
from Kostas Sakellis
Lire moins
Lire la suite
Logiciels
Signaler
Partager
Signaler
Partager
1 sur 34
Télécharger maintenant
Recommandé
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
shareddatamsft
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
Cloudera, Inc.
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
Recommandé
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
shareddatamsft
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
Cloudera, Inc.
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
Apache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
Hadoop on Docker
Hadoop on Docker
Rakesh Saha
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Road to Cloudera certification
Road to Cloudera certification
Cloudera, Inc.
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
DataWorks Summit
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Yahoo Developer Network
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Vinayak Agrawal
Contenu connexe
Tendances
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
Apache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
Hadoop on Docker
Hadoop on Docker
Rakesh Saha
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Road to Cloudera certification
Road to Cloudera certification
Cloudera, Inc.
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
DataWorks Summit
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Yahoo Developer Network
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Tendances
(20)
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Apache Accumulo Overview
Apache Accumulo Overview
Security implementation on hadoop
Security implementation on hadoop
Hadoop on Docker
Hadoop on Docker
Intro to Apache Spark
Intro to Apache Spark
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Road to Cloudera certification
Road to Cloudera certification
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
En vedette
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Vinayak Agrawal
Cancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache Spark
Mahmoud Parsian
How Totango uses Apache Spark
How Totango uses Apache Spark
Oren Raboy
Kodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
Fabrício Catae
Fighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
Miklos Christine
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
Lambda Architectures in Practice
Lambda Architectures in Practice
C4Media
Running Spark in Production
Running Spark in Production
DataWorks Summit/Hadoop Summit
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and Spark
QAware GmbH
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Tony Ng
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
How to deploy Apache Spark to Mesos/DCOS
How to deploy Apache Spark to Mesos/DCOS
Legacy Typesafe (now Lightbend)
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
En vedette
(20)
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Cancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache Spark
How Totango uses Apache Spark
How Totango uses Apache Spark
Kodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
Fighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Lambda Architectures in Practice
Lambda Architectures in Practice
Running Spark in Production
Running Spark in Production
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and Spark
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Apache Spark Model Deployment
Apache Spark Model Deployment
How to deploy Apache Spark to Mesos/DCOS
How to deploy Apache Spark to Mesos/DCOS
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Similaire à Getting Apache Spark Customers to Production
Apache Spark Operations
Apache Spark Operations
Cloudera, Inc.
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
Empower Hive with Spark
Empower Hive with Spark
DataWorks Summit
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
Spark etl
Spark etl
Imran Rashid
Kafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
YARN
YARN
Alex Moundalexis
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
Simon Haslam
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
makker_nl
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
lee tracie
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
Cloudera, Inc.
Elastic build environment
Elastic build environment
Cachet Software Solutions Ltd
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
AWS實際架構實踐演化與解決方案
AWS實際架構實踐演化與解決方案
CKmates
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
Camel Camel
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
CKmates
Similaire à Getting Apache Spark Customers to Production
(20)
Apache Spark Operations
Apache Spark Operations
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
Empower Hive with Spark
Empower Hive with Spark
Yarns About Yarn
Yarns About Yarn
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Spark etl
Spark etl
Kafka for DBAs
Kafka for DBAs
YARN
YARN
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
Elastic build environment
Elastic build environment
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
AWS實際架構實踐演化與解決方案
AWS實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
Plus de Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
Plus de Cloudera, Inc.
(20)
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Dernier
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
software pro Development
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Wave PLM
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
AmarnathKambale
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
aagamshah0812
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Fatema Valibhai
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Jhone kinadey
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
kalichargn70th171
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
shikhaohhpro
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ICS
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
Presentation.STUDIO
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
proinshot.com
Dernier
(20)
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
Getting Apache Spark Customers to Production
1.
1© Cloudera, Inc.
All rights reserved. Getting Spark Customers to Production Kostas Sakellis
2.
2© Cloudera, Inc.
All rights reserved. Me • Software Engineer at Cloudera • Contributor to Apache Spark • Before that, contributed to Cloudera Manager
3.
3© Cloudera, Inc.
All rights reserved. Our customers • Various degrees of sophistication with Spark • In all stages of development • From POC to production deployments • 95% use Spark on YARN* • Biweekly analysis of tickets
4.
4© Cloudera, Inc.
All rights reserved. WARING: This is biased!
5.
5© Cloudera, Inc.
All rights reserved. Building a proof of concept! Courtesy of: http://www.nefloridadesign.com/mbimages/6.jpg
6.
6© Cloudera, Inc.
All rights reserved. “Why is my job failing?”
7.
7© Cloudera, Inc.
All rights reserved. “Why is my job slow?”
8.
8© Cloudera, Inc.
All rights reserved. Misconfiguration accounts for 20% of job failures Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
9.
9© Cloudera, Inc.
All rights reserved. Resource Declaration • Not easy knowing what you need and how to specify it • Compute: • --num-executors vs. --num-cores • Memory • --executor-memory • Includes JVM overhead • Need to do the math yourself
10.
10© Cloudera, Inc.
All rights reserved. Dynamic Allocation • Let Spark do the work for you • Available since Spark 1.2* • No need to specify compute a priori • Limitation: Still required to specify cores • In future: • Allow specification of “task size” • Dynamically allocate cores
11.
11© Cloudera, Inc.
All rights reserved. YARN Configuration mismatch • Compute: • yarn.nodemanager.resource.cpu-vcores • yarn.scheduler.maximum-allocation.vcores • Memory: • yarn.nodemanager.resource.memory-mb • yarn.scheduler.maximum-allocation-mb
12.
12© Cloudera, Inc.
All rights reserved. YARN Configuration mismatch • Common to ask for more resources than allowed • Future work: • Exposing relevant YARN configurations in Spark UI • Requires changes to YARN itself
13.
13© Cloudera, Inc.
All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Another YARN goodie…
14.
14© Cloudera, Inc.
All rights reserved. yarn.nodemanager.resource.memory-mb Executor Container spark.yarn.executor.memoryOverhead (7%) (10% in 1.4) spark.executor.memory spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6) Memory allocation
15.
15© Cloudera, Inc.
All rights reserved. YARN Overhead • Future work: • Better understanding of off heap allocations • Improve memory usage visibility
16.
16© Cloudera, Inc.
All rights reserved. Run program through all our data Courtesy of:https://conniehallscott.files.wordpress.com/2013/01/411748_538971446114753_1125606225_o.jpg
17.
17© Cloudera, Inc.
All rights reserved. Data dependent tuning • As data rates change, re-tuning Spark is usually necessary • Spark is sensitive to shuffle spills • The most common knob we modify is…
18.
18© Cloudera, Inc.
All rights reserved. Partitions, Partitions, Partitions!
19.
19© Cloudera, Inc.
All rights reserved. GC Stalls
20.
20© Cloudera, Inc.
All rights reserved. Partitions • Smaller is often better • Parameterized partition size • reduceByKey(…, nPartitions) • Parameterize application • Future work: • Dynamically determine # of partitions (SPARK-4630)
21.
21© Cloudera, Inc.
All rights reserved. But for now? • Easy answer: • Keep multiplying by 1.5 and see what works • Harder answer:
22.
22© Cloudera, Inc.
All rights reserved. Shuffle less!
23.
23© Cloudera, Inc.
All rights reserved. Shuffles Wide DependencyNarrow Dependencies
24.
24© Cloudera, Inc.
All rights reserved. ReduceByKey when Possible •ReduceByKey allows a map-side-combine parsed .map{line =>(line.level, 1)} .reduceByKey{(a, b) => a + b} .collect() •GroupByKey transfers all the data parsed .map{line =>(line.level, 1)} .groupByKey.map{case(word,counts) => (word,counts.sum)} .collect()
25.
25© Cloudera, Inc.
All rights reserved. ReduceByKey when Possible •ReduceByKey •GroupByKey
26.
26© Cloudera, Inc.
All rights reserved. Security, now it’s getting serious. Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
27.
27© Cloudera, Inc.
All rights reserved. Authentication • Kerberos – the necessary evil • Ubiquitous amongst other services • YARN, HDFS, Hive, HBase, etc. • Spark utilizes delegation tokens
28.
28© Cloudera, Inc.
All rights reserved. Encryption • Control plane • File distribution • Block Manager • User UI / REST API • Data-at-rest (shuffle files) SPARK-6028 (Replace with netty) Replace with netty Spark 1.4 SPARK-2750 (SSL) SPARK-5682
29.
29© Cloudera, Inc.
All rights reserved. Authorization • Enterprises have sensitive data • Beyond HDFS file permissions • Partial access to data • Column level granularity • Apache Sentry • HDFS-Sentry synchronization plugin
30.
30© Cloudera, Inc.
All rights reserved. Customers often have shared infrastructure Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
31.
31© Cloudera, Inc.
All rights reserved. Multi-tenancy • Cluster utilization is top metric • Target: 70-80% utilization • Mixed workloads from mixed customers • We recommend YARN • Built in resource manager
32.
32© Cloudera, Inc.
All rights reserved. Underutilized Clusters Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
33.
33© Cloudera, Inc.
All rights reserved. Dynamic Allocation • Allows jobs to scale to size according to load • Knobs to control min, max and initial size • Future Work: • Target: Dynamic allocation enabled by default • Data locality & Caching • Open question with Streaming
34.
34© Cloudera, Inc.
All rights reserved. Thank you We’re Hiring!
Notes de l'éditeur
Lets talk about what we have seen as issues from our customers as issues as they try to get Spark into production.
In scope - Focus on operational issues - Not on building the code itself Experience from our customer support tickets
In scope - Focus on operational issues - Not on building the code itself Experience from our customer support tickets
Spark makes building a proof of concept with a subset of data relatively easy. But then things go wrong Plug for my talk at Hadoop Summit
num-executors vs. num-cores? 10 executors with 1 core, or 5 executors with 2 cores? Memory: - this is the aggregate across all cores.
This shows up in the YARN NodeManager logs
Spark makes building a proof of concept with a subset of data relatively easy.
Max partition size is 2GB Small partitions help deal w/ stragglers Small partitions avoid overhead
Fastest way to shuffle a lot of data: Don’t shuffle Second fastest way to shuffle a lot of data: Shuffle a small amount of data
Data is merged together before its serialized & sent over network Vs. Higher serialization and network transfer costs
Data is merged together before its serialized & sent over network Vs. Higher serialization and network transfer costs
Data is merged togethe before its serialized & sent over network Vs. Higher serialization and network transfer costs
Spark makes building a proof of concept with a subset of data relatively easy.
Control plane File distribution Block Manager User UI / REST API Data-at-rest (shuffle files)
Spark makes building a proof of concept with a subset of data relatively easy.
Dynamic allocation: - streaming - locality (worked on) - making it even better.
Télécharger maintenant