SlideShare une entreprise Scribd logo
1  sur  21
Past, Present and Future
of Presto on Cloud
07/15/2018
00Copyright 2017 © Qubole
Agenda
Past
• Presto Adoption
Present
• Presto at Qubole
• Key Use Cases
Future
• Area of Focus
• OS collaborations
Past | Looking Back at Presto Adoption
3Copyright 2018 © Qubole
YoY Growth
6x growth in Compute hours
~ 2 Million compute hours per month
4Copyright 2018 © Qubole
Presto surging in Growth
255%
growth in Users
365%
growth in Commands
101%
growth in Throughput
YoY % growth (January 2017 to 2018)
Current State | Presto at Qubole
00Copyright 2017 © Qubole
QDS Big Data Activation Platform
TCO
Interfaces
Intelligence
Auto Scaling
Spot Node
Alerts
Notebook
Analyze
REST API
ODBC/JDBC
BI Tools /
Clients
Insights
Recommendations
Connectors
(Cross data sources
query)
MySQL
SQL Server
Oracle
Redshift
Kinesis
Presto on Qubole
00Copyright 2017 © Qubole
Ad hoc Analytics
BI Dashboard, Reporting
Batch Workloads
Exploratory Analytic
Expected
Response Time
Typical Data Volume
High
HighLow
Key Use Cases
Area of Focus
00Copyright 2017 © Qubole
Area of Focus - Past, Current and Future Work
• Cluster Management and TCO
• Performance
• Security
12Copyright 2018 © Qubole
Completed Work
● Self Start
● Auto Terminate
● Workload Aware Auto Scaling
● Spot Node Integration
Cluster Automation | Cloud Management and TCO
00Copyright 2017 © Qubole
Autoscaling Savings | Cloud Management and TCO
Auto Terminate
Savings, 48%
Autoscaling
Savings, 39%
Spot Node
Savings, 13%
00Copyright 2017 © Qubole
In 2017, 54% of all Amazon EC2 compute hours used were spot instances, resulting in an
estimated $230 million in savings of Amazon EC2 costs.
Spot instances per cluster for
Presto in 2017
Spot Node On-Demand Nodes
29 %
4.1X
Increase !!
Current Work
● Spot Node Loss:
Retries of queries on Spot Node Loss.
● WorkFlow Manager:
Predictive Load Managing across clusters using
Cost Model to compute resource usage.
Future Work
● Predictive AutoScaling using Cost Model
Cluster Automation | Cloud Management and TCO
00Copyright 2017 © Qubole
Memory Cost Model | Performance
Completed Work
● Memory Cost Model
Cost model is within a factor of 2
of actual usage in the worst case
of memory for non-skewed data.
Evaluation of Cost model on TPC-DS benchmark (scale 10000)
00Copyright 2017 © Qubole
Dynamic Filtering and Join Reorder | Performance
Completed Work
● Dynamic Filtering and Join Reorder Evaluation of Dynamic Filtering and Join Reordering on TPC-DS benchmark
(scale 3000)
3.2X reduction in Geomean
Up to 14X performance improvement observed
00Copyright 2017 © Qubole
Rubix | Performance
Completed Work
● Rubix - Cache Engine
Open sourced for Presto and Spark
00Copyright 2017 © Qubole
Current and Future Work | Performance
Current Work
● Fast Copy – Auto Framework for Materialized Views
● Join Distribution
Future Work
● Histograms for improving Cost Model
● CPU Efficiency
00Copyright 2017 © Qubole
Security
Completed Work
● HiPPA compliant
● Internode SSL, Dual IAM Role, VPC, Qubole ACLs
● Hive Authorization
Future Work
● Ranger support
Compliance
HIPPA
GDPR ready
Infrastructure
Dual IAM Roles
Qubole ACLs
VPC Support
Internode SSL
Physical
Data Access
S3 Authentication
Logical Data
Access
Hive Authorization
Ranger Support
00Copyright 2017 © Qubole
OS Collaborations
● Presto Lens – A tool for admins to help tune Presto
● CBO – Improve Cost Model
● Cloud Specific
● S3 Optimizations like S3 Select
● Performance benchmarks for Cloud
● Integration of Product tests with S3
● Workload Management
● Failure Recovery
Questions ?
Contact me: amoghm@qubole.com
00Copyright 2017 © Qubole
Helpful Links
- Engineering Blog
https://www.qubole.com/blog/tag/presto/
- AutoScaling
https://qubole-eng.quora.com/Industry's-First-Auto-Scaling-Presto-Clusters
- Rubix
https://github.com/qubole/rubix
- Dynamic Filtering/Join Reordering
https://www.qubole.com/blog/sql-join-optimizations-qubole-presto/
- Memory Cost-Model
https://www.qubole.com/blog/memory-cost-model-qubole-presto/

Contenu connexe

Tendances

ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Shubham Tagra
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on AnythingAlluxio, Inc.
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVKevin Xu
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectiveAlluxio, Inc.
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...saadjamil31
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingVianney FOUCAULT
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidAlluxio, Inc.
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Zhenxiao Luo
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
 
Google Dataflow Intro
Google Dataflow IntroGoogle Dataflow Intro
Google Dataflow IntroIvan Glushkov
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15Zhenxiao Luo
 

Tendances (20)

ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Graphite
GraphiteGraphite
Graphite
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Google Dataflow Intro
Google Dataflow IntroGoogle Dataflow Intro
Google Dataflow Intro
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15
 

Similaire à Presto Summit 2018 - 10 - Qubole

Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyVenkata Pingali
 
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...discoversudhir
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKafkaZone
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
JetSweep & CloudHealth Tech: Journey to the Cloud
JetSweep & CloudHealth Tech: Journey to the CloudJetSweep & CloudHealth Tech: Journey to the Cloud
JetSweep & CloudHealth Tech: Journey to the CloudCloudHealth by VMware
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data scienceYan Xu
 
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, ProcessesGerd Prüßmann
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big dataJuliette Smit
 
Making Cloud Deployment A Reality For End-To-End Policy Administration
Making Cloud Deployment A Reality For End-To-End Policy AdministrationMaking Cloud Deployment A Reality For End-To-End Policy Administration
Making Cloud Deployment A Reality For End-To-End Policy AdministrationAccenture Insurance
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoHugo Aquino
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIRightScale
 

Similaire à Presto Summit 2018 - 10 - Qubole (20)

Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case Study
 
Spot at qubole
Spot at quboleSpot at qubole
Spot at qubole
 
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...
Flipkart's Hybrid Cloud Infrastructure Strategy for Optimal Cost Efficiency a...
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
JetSweep & CloudHealth Tech: Journey to the Cloud
JetSweep & CloudHealth Tech: Journey to the CloudJetSweep & CloudHealth Tech: Journey to the Cloud
JetSweep & CloudHealth Tech: Journey to the Cloud
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
 
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big data
 
Making Cloud Deployment A Reality For End-To-End Policy Administration
Making Cloud Deployment A Reality For End-To-End Policy AdministrationMaking Cloud Deployment A Reality For End-To-End Policy Administration
Making Cloud Deployment A Reality For End-To-End Policy Administration
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROI
 

Plus de kbajda

Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Presto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook GeospatialPresto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook Geospatialkbajda
 
Presto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber ElasticsearchPresto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber Elasticsearchkbajda
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedInkbajda
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkkbajda
 

Plus de kbajda (8)

Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Presto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook GeospatialPresto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook Geospatial
 
Presto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber ElasticsearchPresto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber Elasticsearch
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 

Dernier

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 

Dernier (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Presto Summit 2018 - 10 - Qubole

  • 1. Past, Present and Future of Presto on Cloud 07/15/2018
  • 2. 00Copyright 2017 © Qubole Agenda Past • Presto Adoption Present • Presto at Qubole • Key Use Cases Future • Area of Focus • OS collaborations
  • 3. Past | Looking Back at Presto Adoption
  • 4. 3Copyright 2018 © Qubole YoY Growth 6x growth in Compute hours ~ 2 Million compute hours per month
  • 5. 4Copyright 2018 © Qubole Presto surging in Growth 255% growth in Users 365% growth in Commands 101% growth in Throughput YoY % growth (January 2017 to 2018)
  • 6. Current State | Presto at Qubole
  • 7. 00Copyright 2017 © Qubole QDS Big Data Activation Platform TCO Interfaces Intelligence Auto Scaling Spot Node Alerts Notebook Analyze REST API ODBC/JDBC BI Tools / Clients Insights Recommendations Connectors (Cross data sources query) MySQL SQL Server Oracle Redshift Kinesis Presto on Qubole
  • 8. 00Copyright 2017 © Qubole Ad hoc Analytics BI Dashboard, Reporting Batch Workloads Exploratory Analytic Expected Response Time Typical Data Volume High HighLow Key Use Cases
  • 10. 00Copyright 2017 © Qubole Area of Focus - Past, Current and Future Work • Cluster Management and TCO • Performance • Security
  • 11. 12Copyright 2018 © Qubole Completed Work ● Self Start ● Auto Terminate ● Workload Aware Auto Scaling ● Spot Node Integration Cluster Automation | Cloud Management and TCO
  • 12. 00Copyright 2017 © Qubole Autoscaling Savings | Cloud Management and TCO Auto Terminate Savings, 48% Autoscaling Savings, 39% Spot Node Savings, 13%
  • 13. 00Copyright 2017 © Qubole In 2017, 54% of all Amazon EC2 compute hours used were spot instances, resulting in an estimated $230 million in savings of Amazon EC2 costs. Spot instances per cluster for Presto in 2017 Spot Node On-Demand Nodes 29 % 4.1X Increase !! Current Work ● Spot Node Loss: Retries of queries on Spot Node Loss. ● WorkFlow Manager: Predictive Load Managing across clusters using Cost Model to compute resource usage. Future Work ● Predictive AutoScaling using Cost Model Cluster Automation | Cloud Management and TCO
  • 14. 00Copyright 2017 © Qubole Memory Cost Model | Performance Completed Work ● Memory Cost Model Cost model is within a factor of 2 of actual usage in the worst case of memory for non-skewed data. Evaluation of Cost model on TPC-DS benchmark (scale 10000)
  • 15. 00Copyright 2017 © Qubole Dynamic Filtering and Join Reorder | Performance Completed Work ● Dynamic Filtering and Join Reorder Evaluation of Dynamic Filtering and Join Reordering on TPC-DS benchmark (scale 3000) 3.2X reduction in Geomean Up to 14X performance improvement observed
  • 16. 00Copyright 2017 © Qubole Rubix | Performance Completed Work ● Rubix - Cache Engine Open sourced for Presto and Spark
  • 17. 00Copyright 2017 © Qubole Current and Future Work | Performance Current Work ● Fast Copy – Auto Framework for Materialized Views ● Join Distribution Future Work ● Histograms for improving Cost Model ● CPU Efficiency
  • 18. 00Copyright 2017 © Qubole Security Completed Work ● HiPPA compliant ● Internode SSL, Dual IAM Role, VPC, Qubole ACLs ● Hive Authorization Future Work ● Ranger support Compliance HIPPA GDPR ready Infrastructure Dual IAM Roles Qubole ACLs VPC Support Internode SSL Physical Data Access S3 Authentication Logical Data Access Hive Authorization Ranger Support
  • 19. 00Copyright 2017 © Qubole OS Collaborations ● Presto Lens – A tool for admins to help tune Presto ● CBO – Improve Cost Model ● Cloud Specific ● S3 Optimizations like S3 Select ● Performance benchmarks for Cloud ● Integration of Product tests with S3 ● Workload Management ● Failure Recovery
  • 20. Questions ? Contact me: amoghm@qubole.com
  • 21. 00Copyright 2017 © Qubole Helpful Links - Engineering Blog https://www.qubole.com/blog/tag/presto/ - AutoScaling https://qubole-eng.quora.com/Industry's-First-Auto-Scaling-Presto-Clusters - Rubix https://github.com/qubole/rubix - Dynamic Filtering/Join Reordering https://www.qubole.com/blog/sql-join-optimizations-qubole-presto/ - Memory Cost-Model https://www.qubole.com/blog/memory-cost-model-qubole-presto/