The Pushdown of Everything by Stephan Kessler and Santiago Mola

Spark Summit
29 Oct 2015
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
1 sur 27

Contenu connexe

Tendances

Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Tendances(20)

Similaire à The Pushdown of Everything by Stephan Kessler and Santiago Mola

Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkDatabricks
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
Sydney Spark Meetup - September 2015Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015Andy Huang
20170126 big data processing20170126 big data processing
20170126 big data processingVienna Data Science Group
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Lucas Jellema
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks

Plus de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

Plus de Spark Summit(20)

Dernier

apidays London 2023 - Advanced AI-powered API Security, Ricky Moorhouse (IBM)...apidays London 2023 - Advanced AI-powered API Security, Ricky Moorhouse (IBM)...
apidays London 2023 - Advanced AI-powered API Security, Ricky Moorhouse (IBM)...apidays
HR ANALYSIS pdf.pdfHR ANALYSIS pdf.pdf
HR ANALYSIS pdf.pdfMehakSethi19
apidays London 2023 - DocOps and Automation in Fintech, Kateryna Osadchenko, ...apidays London 2023 - DocOps and Automation in Fintech, Kateryna Osadchenko, ...
apidays London 2023 - DocOps and Automation in Fintech, Kateryna Osadchenko, ...apidays
PLM 200.pdfPLM 200.pdf
PLM 200.pdfMidhun Mohan
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
Using colour to elevate your data visualisation designsUsing colour to elevate your data visualisation designs
Using colour to elevate your data visualisation designsShreya Arya

The Pushdown of Everything by Stephan Kessler and Santiago Mola

Notes de l'éditeur

  1. Notes: Quick slide: about 1 minute
  2. Notes: 1 or 2 minutes about SAP HANA Vora.
  3. Notes: 30 seconds about Data Sources API intro: Data Sources API defines how Spark SQL can interact with an external source of data. The Data Source can represent a file format on HDFS, a relation database, a web service…
  4. With TableScan, everything is pulled from the data source: every row with every column. Then all further steps are performed in Spark. Clarification: Here are three columns: Logical plan. Physical plan. A SQL representation with the query that is executed in the data source and the query that is executed in Spark SQL. This is just an idealization, it does not mean that the data source actually uses SQL or that Spark SQL uses it internally.
  5. With TableScan, everything is pulled from the data source: every row with every column. Then all further steps are performed in Spark. Clarification: Here are three columns: Logical plan. Physical plan. A SQL representation with the query that is executed in the data source and the query that is executed in Spark SQL. This is just an idealization, it does not mean that the data source actually uses SQL or that Spark SQL uses it internally.
  6. With PrunedScan, we fetch all rows with a subset of columns. This can reduce I/O considerably.
  7. PrunedFilteredScan works as PrunedFilteredScan, but adding a filter on rows according to a condition. This is equivalent to adding a WHERE clause.