Presto SQL at Wayfair meetup presentation

•

1 j'aime•748 vues

Presto SQL at Wayfair meetup in Boston https://www.meetup.com/DataCouncil-AI-Boston-Data-Engineering-Science/events/262818999/ Abstract: Presto has become the essential tool for Data Scientists and Analysts at Wayfair. Presto is relatively new at Wayfair. It was implemented about a year ago and has been actively used in the past 6 months. Attend this session to understand why Wayfair decided to implement presto, how we architected our cluster, configuration choices we made, some common issues we faced and where we are heading towards next. By breaking down our problems and approach with Presto, it will help describe some of the challenges we face, and provide color to the decisions we’ve made. Speaker Bio: Krishna Ravishankar is a DevOps Engineer working on the BigData and Messaging Platforms at Wayfair. He joined the team about 2 years ago with a little knowledge of how distributed systems work, now Krishna is one of the leads for some of the platforms (Presto, Kafka) the team own. Krishna was invloved in bringing Presto into Wayfair, he designed and implemented a production presto cluster which is now used by hundereds of users running ad-hoc queries. He’s currently now involved in building a new read/write cluster using Starburst distro.

Données & analyses

2
• Problem Statement
• Why use Presto ?
• Presto at Wayfair
• Presto Clients
• Presto adoption
• Moving towards
• Monitoring
• Q/A
Agenda

4
1. Optimize Hive queries
1. Setting up queues to prioritize batch jobs
1. Throttle users to 2 ad-hoc queries
1. Move jobs from Hive to Spark
1. Conduct SME training session for both Hive and
Spark
Remedies

5
Why Presto ?
● It’s VERY fast!
● It saves hadoop resources: by using presto, you enable more development work to
be done as other teams test their pyspark pipelines on the cluster
● Unlike spark which requires more expertise and set up, presto is quick to set up.
● You can combine data sources in different places (SQL and hive data in one place)

7
Presto ad-hoc (Read
Only Cluster)
301 VM’s (8*64)
with 1
Coordinator, 300
Workers
Total available
Memory 20TB
Total CPU
available 2800
vcores
Presto CLI
Presto at Wayfair

8 Adoption - before
140K Queries
80K Queries
40%

10 Query Throttling
● SELECT only
● 2 queries per user
● 2 queued queries
per user
● Increased the time
limit from 5 to 10
mins
avg
execution
time - 51 sec

12 OSS Presto Vs Starburst
Starburst
Presto
open
source
Note:
1. CBO turned
manually on
OSS presto
2. Starburst has
CBO turned on
by default
3. CBO improved
query
performance by
3-10X

14
1. Migrating jobs
1. Upgrading our existing presto cluster to use Starburst distribution
1. BigQuery Vs Presto
What’s Next?

15
THANK YOU
Questions?
Krishna Ravishankar
DevOps Engineer
kravishankar@wayfair.com
https://www.linkedin.com/in/ravishankarkrishnakumar/

Recommandé

Presto talk @ Global AI conference 2018 Bostonkbajda

Presto Summit 2018 - 10 - Qubolekbajda

Presto Summit 2018 - 09 - Netflix Icebergkbajda

Presto Summit 2018 - 08 - FINRAkbajda

Presto Summit 2018 - 07 - Lyftkbajda

Presto Summit 2018 - 06 - Facebook Geospatialkbajda

Presto Summit 2018 - 05 - Uber Elasticsearchkbajda

Presto Summit 2018 - 04 - Netflix Containerskbajda

Recommandé

Presto talk @ Global AI conference 2018 Bostonkbajda

Presto Summit 2018 - 10 - Qubolekbajda

Presto Summit 2018 - 09 - Netflix Icebergkbajda

Presto Summit 2018 - 08 - FINRAkbajda

Presto Summit 2018 - 07 - Lyftkbajda

Presto Summit 2018 - 06 - Facebook Geospatialkbajda

Presto Summit 2018 - 05 - Uber Elasticsearchkbajda

Presto Summit 2018 - 04 - Netflix Containerskbajda

Presto Summit 2018 - 02 - LinkedInkbajda

Presto Summit 2018 - 01 - Facebook Prestokbajda

Presto Summit 2018 - 03 - Starburst CBOkbajda

Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda

Presto at Hadoop Summit 2016kbajda

Presto Strata Hadoop SJ 2016 short talkkbajda

Edukaciniai dropshipping via API with DroFxolyaivanovalion

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

April 2024 - Crypto Market Report's Analysismanisha194592

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Data-Analysis for Chicago Crime Data 2023ymrp368

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Contenu connexe

Plus de kbajda

Presto Summit 2018 - 02 - LinkedInkbajda

Presto Summit 2018 - 01 - Facebook Prestokbajda

Presto Summit 2018 - 03 - Starburst CBOkbajda

Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda

Presto at Hadoop Summit 2016kbajda

Presto Strata Hadoop SJ 2016 short talkkbajda

Plus de kbajda (6)

Presto Summit 2018 - 02 - LinkedIn

Presto Summit 2018 - 01 - Facebook Presto

Presto Summit 2018 - 03 - Starburst CBO

Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA

Presto at Hadoop Summit 2016

Presto Strata Hadoop SJ 2016 short talk

Dernier

Edukaciniai dropshipping via API with DroFxolyaivanovalion

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

April 2024 - Crypto Market Report's Analysismanisha194592

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Data-Analysis for Chicago Crime Data 2023ymrp368

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Week-01-2.ppt BBB human Computer interactionfulawalesam

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Dernier (20)

Edukaciniai dropshipping via API with DroFx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

VidaXL dropshipping via API with DroFx.pptx

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

April 2024 - Crypto Market Report's Analysis

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

BabyOno dropshipping via API with DroFx.pptx

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Data-Analysis for Chicago Crime Data 2023

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Week-01-2.ppt BBB human Computer interaction

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

100-Concepts-of-AI by Anupama Kate .pptx

Generative AI on Enterprise Cloud with NiFi and Milvus

Presto SQL at Wayfair meetup presentation

1. Presto at Wayfair Krishna Ravishankar

2. 2 • Problem Statement • Why use Presto ? • Presto at Wayfair • Presto Clients • Presto adoption • Moving towards • Monitoring • Q/A Agenda

3. 3 Problem Statement

4. 4 1. Optimize Hive queries 1. Setting up queues to prioritize batch jobs 1. Throttle users to 2 ad-hoc queries 1. Move jobs from Hive to Spark 1. Conduct SME training session for both Hive and Spark Remedies

5. 5 Why Presto ? ● It’s VERY fast! ● It saves hadoop resources: by using presto, you enable more development work to be done as other teams test their pyspark pipelines on the cluster ● Unlike spark which requires more expertise and set up, presto is quick to set up. ● You can combine data sources in different places (SQL and hive data in one place)

6. 6 Presto at Wayfair

7. 7 Presto ad-hoc (Read Only Cluster) 301 VM’s (8*64) with 1 Coordinator, 300 Workers Total available Memory 20TB Total CPU available 2800 vcores Presto CLI Presto at Wayfair

8. 8 Adoption - before 140K Queries 80K Queries 40%

9. 9 Adoption - after

10. 10 Query Throttling ● SELECT only ● 2 queries per user ● 2 queued queries per user ● Increased the time limit from 5 to 10 mins avg execution time - 51 sec

11. 11 Moving towards Starburst

12. 12 OSS Presto Vs Starburst Starburst Presto open source Note: 1. CBO turned manually on OSS presto 2. Starburst has CBO turned on by default 3. CBO improved query performance by 3-10X

13. 13 Monitoring Presto

14. 14 1. Migrating jobs 1. Upgrading our existing presto cluster to use Starburst distribution 1. BigQuery Vs Presto What’s Next?

15. 15 THANK YOU Questions? Krishna Ravishankar DevOps Engineer kravishankar@wayfair.com https://www.linkedin.com/in/ravishankarkrishnakumar/