SlideShare une entreprise Scribd logo
1  sur  76
+
Lucas.Waye @ TiVo.com
April 5th, 2018
About Me
What’s using Presto:
Targeted Audience Delivery
TV networks, programmers,
and advertisers
What are my target
viewership segments?
Set-Top box data
Purchasing Behavior
Location-based Consumer Data
Targeted Audience Delivery
Program Metadata
TV networks, programmers,
and advertisers
What are my target
viewership segments?
Set-Top box data
Purchasing Behavior
Location-based Consumer Data
Targeted Audience Delivery
Program Metadata
brought to you (in part) by
looking to the past for inspiration for the future
Similar Products at TiVo
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
Similar Products at TiVo
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
transactional and customer-configurable data
semi-aggregated viewership data +
sets of households (e.g., “18-24 years old”, “owns minivan”)
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
MySQL
MySQL
MySQL
Many new data marts
popping up in our tech stack
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
more viewership data
OK,
storage is cheap
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
more viewership data
storage is not cheap…
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
storage is not cheap… Need finer
grain data!
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
storage is not cheap… Need finer
grain data!
Can’t aggregate
as much
New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
static,
hard to scale
Wait, what about Redshift Spectrum ?
Redshift Spectrum
Redshift: Pay per
node-hour
Spectrum: Pay per
data access
How Does it Scale?
Experiment: join on two tables
• Small Joins: join small Redshift table with (filtered-down) large table on S3

• Join across ~1M rows

• Large Joins: join large Redshift table with (unfiltered) large table table on S3

• Join across ~10M rows

Compare to: both tables on Redshift
How Does it Scale?
Time
(sec)
Concurrent queries
Redshift Spectrum for “Simple" Queries
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15
Latency (sec) vs. # Concurrent Requests
1 day 1 day (Spectrum)
Spectrum faster when cluster loaded
and can pre-filter/pre-aggregate data
small joins
Time
(sec)
Concurrent queries
Redshift Spectrum for “Simple" Queries
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15
Latency (sec) vs. # Concurrent Requests
1 day 1 day (Spectrum)
Spectrum faster when cluster loaded
and can pre-filter/pre-aggregate data
small joins
Spectrum faster
Time
(sec)
Concurrent queries
Redshift Spectrum for Complex Queries
Time
(sec)
Concurrent queries
Redshift Spectrum for Complex Queries
Spectrum slower!
Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster
Amdahl’s Law in Effect
Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster
Amdahl’s Law in Effect
“Operations that can't be pushed to the Redshift Spectrum
layer include [JOIN], DISTINCT and ORDER BY. …
When large amounts of data are returned from Amazon S3,
the processing is limited by your cluster's resources.”
https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-performance.html
Wait, what about Redshift Spectrum ?
Our queries won’t work well on Spectrum.
Well, what about [X] ?
Our Choice:
• Storage/Compute Separation

• Easy to add and remove worker nodes

• Query many different data sources (inside our VPC) 

without separate load

• Good performance for analytical queries.

Not so good for transactional and simple queries…

• Managed (e.g., Qubole, Starburst)
Coordinator
Worker Worker Worker
S3 / Hive
metastore
MySQL
Connector
Connector
SELECT SUM(v.seconds_viewed)
FROM hive.db.viewership v
JOIN mysql.db.audiences a ON a.hh_id = v.hh_id
WHERE audience_id = 42
mysql catalog à
hive catalog à
SELECT …
FROM db.audiences
WHERE audience_id = 42
DRAFT - TiVo Confidential 2018
How Presto Works
Data is streamed 

back to the workers
First Challenge:
What instance types should we use?
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
All Queries Start Using
Memory From Here
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
All Queries Start Using
Memory From Here
Query
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
Only one query allowed!
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> Fail
Query
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> Fail
Query
But there’s available
memory??
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> keep allocating
(resource overcommit)
Query
Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Query
But now a single query can
hog the entire cluster!
Presto Worker Memory
Query
Presto Worker Memory
Query Query
Presto Worker Memory
Multiple Workers
Presto Worker Memory
Query Query
Presto Worker Memory
Multiple Workers
Total Memory
max-memory=
max-memory-per-node * number of nodes
Presto Worker Memory
QueryQuery
Presto Worker Memory
Multiple Workers
Total Memory
max-memory=
max-memory-per-node * number of nodes
QueryQuery
Total Memory
max-memory=
max-memory-per-node * number of nodes
Presto Worker Memory
Query
Query
Presto Worker Memory
Multiple Workers
Query
Query
• What if memory usage varies a lot between different queries?

• Use many inexpensive instances, or a few expensive instances?

• Compute optimized or memory optimized?
Working With Reserved Memory Pool
How do we achieve that?
Conceptually, reserved memory pool should be the “high water mark” 

while most queries complete in the general pool.
• What if memory usage varies a lot between different queries?

• Use many inexpensive instances, or a few expensive instances?

• Compute optimized or memory optimized?
Working With Reserved Memory Pool
Conceptually, reserved memory pool should be the “high water mark” 

while most queries complete in the general pool.
Solution: multiple clusters based on workload
Empiric testing found smaller cluster size was slightly faster
Solution: Cost/Benefit Analysis
How do we achieve that?
Choosing the Right Instance Type
r 4 . 4 x l a r g e
Instance
Class
Generation
Multiplier
For CPU and Mem
t 2 . 2 x l a r g e
c 5 . 16x l a r g e
Choosing the Right Instance Type
r 4 . 4 x l a r g e
Instance
Class
Generation
Multiplier
For CPU and Mem
t 2 . 2 x l a r g e
c 5 . 16x l a r g e
Over 100 to choose from!
Choosing the Right Instance Type
Credit: Willard Simmons (DataXu)
Choosing the Right Instance Type
Credit: Willard Simmons (DataXu)
Older generations
are inefficient
Choosing the Right Instance Type
Credit: Willard Simmons (DataXu)
Better for larger
memory clusters
Older generations
are inefficient
Choosing the Right Instance Type
Credit: Willard Simmons (DataXu)
Better for smaller
memory clusters
Older generations
are inefficient
Second Challenge:
Elastic Scaling
More Concurrency? Add More Nodes
More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
10 Queries
When will queries complete
at current rate?
Not fast enough!
More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
10 Queries
When will queries complete
at current rate?
Qubole provisions more nodes up to a limit
(around 3 minutes)
Presto
Worker
Presto
Worker
More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Presto
Worker
Presto
Worker
Too fast!
More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Qubole decommissions more nodes up to a limit
Not so fast…
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Not fast enough!
100% CPU 100% CPU
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Upscaling only works for new queries
Presto
Worker
Presto
Worker
100% CPU 100% CPUIdle Idle
Not so fast…
Not fast enough!
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Upscaling only works for new queries
Presto
Worker
Presto
Worker
100% CPU 100% CPUIdle Idle
Not so fast…
Not fast enough!
Maybe we should have sent this query
to a more powerful cluster?
Autoscaling is for concurrency
Third Challenge:
Maturity
Query History
Presto UI is nice for watching queries as they’re happening, but not for historical auditing
Service administration portal tracks Qubole commands 

(Presto queries) and links to the Qubole web site

View and download intermediate queries and results

Presto Query Auditing
• Official Presto JDBC driver does not support Prepared Statements

• Worker loss not handled gracefully

(if one task fails, all tasks fail — we take that risk with retry logic)

• No support for upper-case table names in MySQL (Issue 2863)

• TIMESTAMP behavior does not match SQL standard (Issue 7122)

• Naïve query optimizer (talk to Starburst!)
Specific Technical Presto Issues
• Official Presto JDBC driver does not support Prepared Statements

• Worker loss not handled gracefully

(if one task fails, all tasks fail — we take that risk with retry logic)

• No support for upper-case table names in MySQL (Issue 2863)

• TIMESTAMP behavior does not match SQL standard (Issue 7122)

• Naïve query optimizer (talk to Starburst!)
Moral: you may need to get creative with workarounds
Specific Technical Presto Issues
Presto Docker container
using memory connectors
Testing
Presto Docker container
using memory connectors
Testing
Declarative syntax allows us to mock tables
in the Docker container
Presto Docker container
using memory connectors
Testing
Declarative syntax allows us to mock tables
in the Docker container
…so we can test our generated queries in isolation
using Behavior-Driven Development.
Final Takeaways
Setting expectations: Make sure everyone knows Presto is

not a full-fledged database.
Providing one logical view of the data model across many databases is great!

Favorite for many other workloads beyond its initial scope for this reason.

Presto’s simplicity resulted in widespread adoption.

Biggest (Positive) Surprise
Provocative Ending
Provocative Ending
Presto feels like an API gateway, but for data.
Behavioral Services Data Applications
Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL)
Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML)
Service implementation language :: Database technology
Publishing an endpoint :: Exposing a table or view
Service handler :: CREATE VIEW, CREATE TRIGGER
Service endpoint configuration :: Catalog/connector configuration
Provocative Ending
Presto feels like an API gateway, but for data.
Behavioral Services Data Applications
Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL)
Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML)
Service implementation language :: Database technology
Publishing an endpoint :: Exposing a table or view
Service handler :: CREATE VIEW, CREATE TRIGGER
Service endpoint configuration :: Catalog/connector configuration
What other engineering advancements can we push through the lens from
microservices (behaviors) to databases (state)?
Thanks!
Questions?

Contenu connexe

Tendances

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
Yukinori Suda
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 

Tendances (20)

Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
 
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
 
10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 
DAT402 - Deep Dive on Amazon Aurora PostgreSQL
DAT402 - Deep Dive on Amazon Aurora PostgreSQL DAT402 - Deep Dive on Amazon Aurora PostgreSQL
DAT402 - Deep Dive on Amazon Aurora PostgreSQL
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
 
Intro to hadoop
Intro to hadoopIntro to hadoop
Intro to hadoop
 

Similaire à Presto at Tivo, Boston Hadoop Meetup

Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
webhostingguy
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 

Similaire à Presto at Tivo, Boston Hadoop Meetup (20)

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Expecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance TuningExpecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance Tuning
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Geek Sync | Performance Tune Like an MVP
Geek Sync | Performance Tune Like an MVPGeek Sync | Performance Tune Like an MVP
Geek Sync | Performance Tune Like an MVP
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESSpring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 

Dernier

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Presto at Tivo, Boston Hadoop Meetup

  • 4. TV networks, programmers, and advertisers What are my target viewership segments? Set-Top box data Purchasing Behavior Location-based Consumer Data Targeted Audience Delivery Program Metadata
  • 5. TV networks, programmers, and advertisers What are my target viewership segments? Set-Top box data Purchasing Behavior Location-based Consumer Data Targeted Audience Delivery Program Metadata brought to you (in part) by
  • 6. looking to the past for inspiration for the future
  • 7. Similar Products at TiVo ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS)
  • 8. Similar Products at TiVo ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) transactional and customer-configurable data semi-aggregated viewership data + sets of households (e.g., “18-24 years old”, “owns minivan”)
  • 9. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) MySQL MySQL MySQL Many new data marts popping up in our tech stack
  • 10. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) more viewership data OK, storage is cheap
  • 11. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) more viewership data storage is not cheap…
  • 12. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) storage is not cheap… Need finer grain data!
  • 13. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) storage is not cheap… Need finer grain data! Can’t aggregate as much
  • 14. New Product, New Challenges… ETL Amazon S3 Java services on EC2 ETL Amazon Redshift MySQL (RDS) static, hard to scale
  • 15. Wait, what about Redshift Spectrum ?
  • 16. Redshift Spectrum Redshift: Pay per node-hour Spectrum: Pay per data access
  • 17. How Does it Scale?
  • 18. Experiment: join on two tables • Small Joins: join small Redshift table with (filtered-down) large table on S3 • Join across ~1M rows • Large Joins: join large Redshift table with (unfiltered) large table table on S3 • Join across ~10M rows Compare to: both tables on Redshift How Does it Scale?
  • 19. Time (sec) Concurrent queries Redshift Spectrum for “Simple" Queries 0 10 20 30 40 50 60 70 1 3 5 7 9 11 13 15 Latency (sec) vs. # Concurrent Requests 1 day 1 day (Spectrum) Spectrum faster when cluster loaded and can pre-filter/pre-aggregate data small joins
  • 20. Time (sec) Concurrent queries Redshift Spectrum for “Simple" Queries 0 10 20 30 40 50 60 70 1 3 5 7 9 11 13 15 Latency (sec) vs. # Concurrent Requests 1 day 1 day (Spectrum) Spectrum faster when cluster loaded and can pre-filter/pre-aggregate data small joins Spectrum faster
  • 22. Time (sec) Concurrent queries Redshift Spectrum for Complex Queries Spectrum slower!
  • 23. Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster Amdahl’s Law in Effect
  • 24. Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster Amdahl’s Law in Effect “Operations that can't be pushed to the Redshift Spectrum layer include [JOIN], DISTINCT and ORDER BY. … When large amounts of data are returned from Amazon S3, the processing is limited by your cluster's resources.” https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-performance.html
  • 25. Wait, what about Redshift Spectrum ? Our queries won’t work well on Spectrum.
  • 27. Our Choice: • Storage/Compute Separation • Easy to add and remove worker nodes • Query many different data sources (inside our VPC) 
 without separate load • Good performance for analytical queries.
 Not so good for transactional and simple queries… • Managed (e.g., Qubole, Starburst)
  • 28. Coordinator Worker Worker Worker S3 / Hive metastore MySQL Connector Connector SELECT SUM(v.seconds_viewed) FROM hive.db.viewership v JOIN mysql.db.audiences a ON a.hh_id = v.hh_id WHERE audience_id = 42 mysql catalog à hive catalog à SELECT … FROM db.audiences WHERE audience_id = 42 DRAFT - TiVo Confidential 2018 How Presto Works Data is streamed back to the workers
  • 29. First Challenge: What instance types should we use?
  • 30. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) All Queries Start Using Memory From Here
  • 31. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) All Queries Start Using Memory From Here Query
  • 32. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in General Pool —> Switch to Reserved Query
  • 33. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in General Pool —> Switch to Reserved Query
  • 34. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in General Pool —> Switch to Reserved Query Only one query allowed!
  • 35. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in Reserved Pool —> Fail Query
  • 36. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in Reserved Pool —> Fail Query But there’s available memory??
  • 37. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Needs more memory than in Reserved Pool —> keep allocating (resource overcommit) Query
  • 38. Presto Worker Memory System Memory reserved-system-memory = 0.4 * JVM Max Memory Reserved Memory max-memory-per-node General Memory (the rest) Query But now a single query can hog the entire cluster!
  • 40. Presto Worker Memory Query Query Presto Worker Memory Multiple Workers
  • 41. Presto Worker Memory Query Query Presto Worker Memory Multiple Workers Total Memory max-memory= max-memory-per-node * number of nodes
  • 42. Presto Worker Memory QueryQuery Presto Worker Memory Multiple Workers Total Memory max-memory= max-memory-per-node * number of nodes QueryQuery
  • 43. Total Memory max-memory= max-memory-per-node * number of nodes Presto Worker Memory Query Query Presto Worker Memory Multiple Workers Query Query
  • 44. • What if memory usage varies a lot between different queries?
 • Use many inexpensive instances, or a few expensive instances?
 • Compute optimized or memory optimized? Working With Reserved Memory Pool How do we achieve that? Conceptually, reserved memory pool should be the “high water mark” while most queries complete in the general pool.
  • 45. • What if memory usage varies a lot between different queries?
 • Use many inexpensive instances, or a few expensive instances?
 • Compute optimized or memory optimized? Working With Reserved Memory Pool Conceptually, reserved memory pool should be the “high water mark” while most queries complete in the general pool. Solution: multiple clusters based on workload Empiric testing found smaller cluster size was slightly faster Solution: Cost/Benefit Analysis How do we achieve that?
  • 46. Choosing the Right Instance Type r 4 . 4 x l a r g e Instance Class Generation Multiplier For CPU and Mem t 2 . 2 x l a r g e c 5 . 16x l a r g e
  • 47. Choosing the Right Instance Type r 4 . 4 x l a r g e Instance Class Generation Multiplier For CPU and Mem t 2 . 2 x l a r g e c 5 . 16x l a r g e Over 100 to choose from!
  • 48. Choosing the Right Instance Type Credit: Willard Simmons (DataXu)
  • 49. Choosing the Right Instance Type Credit: Willard Simmons (DataXu) Older generations are inefficient
  • 50. Choosing the Right Instance Type Credit: Willard Simmons (DataXu) Better for larger memory clusters Older generations are inefficient
  • 51. Choosing the Right Instance Type Credit: Willard Simmons (DataXu) Better for smaller memory clusters Older generations are inefficient
  • 53. More Concurrency? Add More Nodes
  • 54. More Concurrency? Add More Nodes Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate?
  • 55. More Concurrency? Add More Nodes Presto Worker Presto Worker Presto Coordinator 10 Queries When will queries complete at current rate? Not fast enough!
  • 56. More Concurrency? Add More Nodes Presto Worker Presto Worker Presto Coordinator 10 Queries When will queries complete at current rate? Qubole provisions more nodes up to a limit (around 3 minutes) Presto Worker Presto Worker
  • 57. More Concurrency? Add More Nodes Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate? Presto Worker Presto Worker Too fast!
  • 58. More Concurrency? Add More Nodes Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate? Qubole decommissions more nodes up to a limit
  • 59. Not so fast… Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate? Not fast enough! 100% CPU 100% CPU
  • 60. Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate? Upscaling only works for new queries Presto Worker Presto Worker 100% CPU 100% CPUIdle Idle Not so fast… Not fast enough!
  • 61. Presto Worker Presto Worker Presto Coordinator 1 Query When will queries complete at current rate? Upscaling only works for new queries Presto Worker Presto Worker 100% CPU 100% CPUIdle Idle Not so fast… Not fast enough! Maybe we should have sent this query to a more powerful cluster? Autoscaling is for concurrency
  • 63. Query History Presto UI is nice for watching queries as they’re happening, but not for historical auditing
  • 64. Service administration portal tracks Qubole commands (Presto queries) and links to the Qubole web site View and download intermediate queries and results Presto Query Auditing
  • 65. • Official Presto JDBC driver does not support Prepared Statements • Worker loss not handled gracefully
 (if one task fails, all tasks fail — we take that risk with retry logic) • No support for upper-case table names in MySQL (Issue 2863) • TIMESTAMP behavior does not match SQL standard (Issue 7122) • Naïve query optimizer (talk to Starburst!) Specific Technical Presto Issues
  • 66. • Official Presto JDBC driver does not support Prepared Statements • Worker loss not handled gracefully
 (if one task fails, all tasks fail — we take that risk with retry logic) • No support for upper-case table names in MySQL (Issue 2863) • TIMESTAMP behavior does not match SQL standard (Issue 7122) • Naïve query optimizer (talk to Starburst!) Moral: you may need to get creative with workarounds Specific Technical Presto Issues
  • 67. Presto Docker container using memory connectors Testing
  • 68. Presto Docker container using memory connectors Testing Declarative syntax allows us to mock tables in the Docker container
  • 69. Presto Docker container using memory connectors Testing Declarative syntax allows us to mock tables in the Docker container …so we can test our generated queries in isolation using Behavior-Driven Development.
  • 71. Setting expectations: Make sure everyone knows Presto is
 not a full-fledged database.
  • 72. Providing one logical view of the data model across many databases is great!
 Favorite for many other workloads beyond its initial scope for this reason. Presto’s simplicity resulted in widespread adoption. Biggest (Positive) Surprise
  • 74. Provocative Ending Presto feels like an API gateway, but for data. Behavioral Services Data Applications Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL) Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML) Service implementation language :: Database technology Publishing an endpoint :: Exposing a table or view Service handler :: CREATE VIEW, CREATE TRIGGER Service endpoint configuration :: Catalog/connector configuration
  • 75. Provocative Ending Presto feels like an API gateway, but for data. Behavioral Services Data Applications Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL) Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML) Service implementation language :: Database technology Publishing an endpoint :: Exposing a table or view Service handler :: CREATE VIEW, CREATE TRIGGER Service endpoint configuration :: Catalog/connector configuration What other engineering advancements can we push through the lens from microservices (behaviors) to databases (state)?