Introduction to Apache Drill - interactive query and analysis at scale

•

6 j'aime•4,436 vues

This document introduces Apache Drill, an open source interactive analysis engine for big data. It was inspired by Google's Dremel and supports standard SQL queries over various data sources like Hadoop and NoSQL databases. Drill provides low-latency interactive queries at scale through its distributed, schema-optional architecture and support for nested data formats. The talk outlines Drill's capabilities and status as a community-driven project under active development.

Technologie

Introduction to Apache Drill –
interactive query and analysis at scale

Michael Hausenblas, MapR EMEA
2013-02-22, HUG Munich

About Michael
• Background in large-scale data integration
• Chief Data Engineer EMEA, MapR
• Apache Drill contributor

Workloads
• Batch processing (MapReduce)
• Light-weight OLTP (HBase, Cassandra)
• Stream processing (Storm, S4)
• Search (Solr, Elasticsearch)
• Interactive analysis

Interactive Query at scale

Impala

low-latency

Use Case
• Jane, a marketing analyst
• Determine target segments
• Data from different sources

Today’s Solutions
• RDBMS-focused
– ETL data from MongoDB/Hadoop
– Query with SQL
• MapReduce-focused
– ETL from RDBMS/MongoDB
– Use Hive

Requirements
• Support for different data sources
• Support for different query interfaces
• Low-latency/real-time
• Ad-hoc queries
• Scalable and fast
• Reliable

Google’s Dremel

http://research.google.com/pubs/pub36632.html

Apache Drill Overview
• Inspired by Google Dremel
• Standard SQL2003 support
• …. other QL (DSL, etc.) possible
• Plug-able data sources
• Support for nested data (JSON, etc.)
• Schema is optional
• Community driven, open, 100’s involved

How does it work?
• Drillbits per node, maximize data locality
• Co-ordination, query planning, optimization,
scheduling, execution are distributed

Source Logical Physical
Query Parser Plan Optimizer Plan Execution

SQL 2003, query: [ { topology Scanner API
@id: "log",
DrQL, op: "sequence",
do: [ {
MongoQL, op: "scan",
source: “logs"}
DSL {
op: "filter",
condition: "x > 3"},
…

Key Features
• Full SQL
• Nested data
• Optional schema
• Extensibility points

Full SQL – ANSI SQL2003
• SQL-like is often not enough
• Integration with existing tools
– Tableau, Excel, SAP Crystal Reports
– Use standard ODBC/JDBC driver

Nested Data
• Nested data becoming prevalent
– JSON/BSON, XML, ProtoBuf, Avro
– Some data sources support it natively
(MongoDB, etc.)
– Innovation through Dremel
• Flattening nested data is error-prone
• Apache Drill supports nested data,
extension to ANSI SQL2003

Optional Schema
• Many data sources don’t have rigid schemas
– Schema changes rapidly
– Different schema per record (e.g. HBase)
• Apache Drill supports queries against
unknown schema
• user can define schema or via discovery

Extensibility Points
• Query language (parser) - UDFs
• Data sources/formats (scanner)
• Optimizer
• Custom operators (logical plan)

Source Logical Physical
Query Parser Plan Optimizer Plan Execution

$Demo { "id": "0001", "type": "donut", "name": "Cake", "batters": { { "batter”: "sales" : 700.0, [ "typeCount" : 1, { "id": "1001", "type": "Regular" }, "quantity" : 700, { "id": "1002", "type": "Chocolate" }, "ppu" : 1.0 … } { "sales" : 109.71, data source: donuts.json "typeCount" : 2, "quantity" : 159, query:[ { "ppu" : 0.69 op:"sequence", } do:[ { { "sales" : 184.25, op: "scan", "typeCount" : 2, ref: "donuts", "quantity" : 335, source: "local-logs", "ppu" : 0.55 selection: {data: "activity"} } }, { result: out.json op: "filter", expr: "donuts.ppu < 2.00" }, … logical plan: simple_plan.json https://cwiki.apache.org/confluence/display/DRILL/Demo+HowTo$

Status
• Heavy development by multiple orgs
• Logical plan, reference interpreter available
• SQL interpreter, storage engine
implementations (Accumolo, Cassandra,
Hbase, etc.) are WIP
• Schedule:
– Prototype Q1
– Alpha Q2

Engage!
• Follow @ApacheDrill on Twitter

• Sign up at mailing lists (user|dev)
http://incubator.apache.org/drill/mailing-lists.html

• Keep an eye on http://drill-user.org/

• Ping me: mhausenblas@maprtech.com

Recommandé

Apache DrillBig Data User Group Karlsruhe/Stuttgart

Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference

Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz

Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake

Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz

Hw09 Sqoop Database Import For HadoopCloudera, Inc.

Intro To CascadingNate Murray

Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov

Recommandé

Apache DrillBig Data User Group Karlsruhe/Stuttgart

Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference

Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz

Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake

Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz

Hw09 Sqoop Database Import For HadoopCloudera, Inc.

Intro To CascadingNate Murray

Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov

Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.

Using Apache Spark as ETL engine. Pros and Cons Provectus

Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI

ETL with SPARK - First Spark London meetupRafal Kwasny

PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph

Heuritech: Apache Spark REXdidmarin

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre

Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck

9b. Document-Oriented Databases labFabio Fumarola

Hive Anatomynzhang

Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks

Expand data analysis tool at scale with ZeppelinDataWorks Summit

Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.

Practical Hadoop using PigDavid Wellman

Hadoop and rdbms with sqoop Guy Harrison

Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies

How to use Parquet as a basis for ETL and analyticsJulien Le Dem

Introduction to SparkLi Ming Tsai

SQL and Search with Spark in your browserDataWorks Summit/Hadoop Summit

Merlin: The Ultimate Data Science EnvironmentCharles Givre

Strata NYC 2015 What does your smart device know about you?Charles Givre

Contenu connexe

Tendances

Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.

Using Apache Spark as ETL engine. Pros and Cons Provectus

Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI

ETL with SPARK - First Spark London meetupRafal Kwasny

PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph

Heuritech: Apache Spark REXdidmarin

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre

Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck

9b. Document-Oriented Databases labFabio Fumarola

Hive Anatomynzhang

Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks

Expand data analysis tool at scale with ZeppelinDataWorks Summit

Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.

Practical Hadoop using PigDavid Wellman

Hadoop and rdbms with sqoop Guy Harrison

Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies

How to use Parquet as a basis for ETL and analyticsJulien Le Dem

Introduction to SparkLi Ming Tsai

SQL and Search with Spark in your browserDataWorks Summit/Hadoop Summit

Tendances (20)

Big data, just an introduction to Hadoop and Scripting Languages

Using Apache Spark as ETL engine. Pros and Cons

Hive dirty/beautiful hacks in TD

ETL with SPARK - First Spark London meetup

PySpark Cassandra - Amsterdam Spark Meetup

Heuritech: Apache Spark REX

Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of

Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...

9b. Document-Oriented Databases lab

Hive Anatomy

Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...

Expand data analysis tool at scale with Zeppelin

Apache Sqoop: A Data Transfer Tool for Hadoop

Practical Hadoop using Pig

Hadoop and rdbms with sqoop

Swiss Big Data User Group - Introduction to Apache Drill

How to use Parquet as a basis for ETL and analytics

Introduction to Spark

SQL and Search with Spark in your browser

En vedette

Merlin: The Ultimate Data Science EnvironmentCharles Givre

Strata NYC 2015 What does your smart device know about you?Charles Givre

Km 65 tahun 2002Bp Nafri

NarkobaBp Nafri

RAPIM 2011Bp Nafri

PSCOBp Nafri

Apache Storm - Minando redes sociales y medios en tiempo realAndrés Mauricio Palacios

What Does Your Smart Car Know About You? Strata London 2016Charles Givre

Apache Drill WorkshopCharles Givre

RAKORNIS 2010Bp Nafri

Pristine Advisers PresentationPattyBaronowski

Killing ETL with Apache DrillCharles Givre

Drilling into Data with Apache DrillMapR Technologies

Apache Drill - Why, What, Howmcsrivas

Data Exploration with Apache Drill: Day 2Charles Givre

Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit

ISPS CodeBp Nafri

Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive

KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPALBeny Jackson Maliota

Data Exploration with Apache Drill: Day 1Charles Givre

En vedette (20)

Merlin: The Ultimate Data Science Environment

Strata NYC 2015 What does your smart device know about you?

Km 65 tahun 2002

Narkoba

RAPIM 2011

PSCO

Apache Storm - Minando redes sociales y medios en tiempo real

What Does Your Smart Car Know About You? Strata London 2016

Apache Drill Workshop

RAKORNIS 2010

Pristine Advisers Presentation

Killing ETL with Apache Drill

Drilling into Data with Apache Drill

Apache Drill - Why, What, How

Data Exploration with Apache Drill: Day 2

Spark SQL versus Apache Drill: Different Tools with Different Rules

ISPS Code

Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...

KELAIKLAUTAN KAPAL DAN DOKUMENTASI KAPAL

Data Exploration with Apache Drill: Day 1

Similaire à Introduction to Apache Drill - interactive query and analysis at scale

Building Highly Flexible, High Performance Query EnginesMapR Technologies

Gab document db scaling databaseMUG Perú

Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies

Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies

Data Processing and Aggregation with MongoDB MongoDB

MongoDB: a gentle, friendly overviewAntonio Pintus

Mongo scalingSimon Maynard

Webinar: Position and Trade Management with MongoDBMongoDB

Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan

MongoDB Performance TuningMongoDB

Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent

Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu

d3sparql.js demo at SWAT4LS 2014 in BerlinToshiaki Katayama

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys

Berlin Hadoop Get Together Apache Drill MapR Technologies

Azure DocumentDB: Advanced Features for Large Scale-AppsAndrew Liu

ELK Stack - Turn boring logfiles into sexy dashboardGeorg Sorst

Unlock the Power of Streaming Data with Kinetica and Confluent Platformconfluent

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

Similaire à Introduction to Apache Drill - interactive query and analysis at scale (20)

Building Highly Flexible, High Performance Query Engines

Gab document db scaling database

Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets

Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas

Data Processing and Aggregation with MongoDB

MongoDB: a gentle, friendly overview

Mongo scaling

Webinar: Position and Trade Management with MongoDB

Applied Machine learning using H2O, python and R Workshop

MongoDB Performance Tuning

Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...

Introducing Azure DocumentDB - NoSQL, No Problem

d3sparql.js demo at SWAT4LS 2014 in Berlin

NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...

Berlin Hadoop Get Together Apache Drill

Azure DocumentDB: Advanced Features for Large Scale-Apps

ELK Stack - Turn boring logfiles into sexy dashboard

Unlock the Power of Streaming Data with Kinetica and Confluent Platform

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

Plus de MapR Technologies

Converging your data landscapeMapR Technologies

ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies

Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies

Enabling Real-Time Business with Change Data CaptureMapR Technologies

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies

ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies

Machine Learning Success: The Key to Easier Model ManagementMapR Technologies

Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies

Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies

Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies

Live Machine Learning Tutorial: Churn PredictionMapR Technologies

An Introduction to the MapR Converged Data PlatformMapR Technologies

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies

Best Practices for Data Convergence in HealthcareMapR Technologies

Geo-Distributed Big Data and AnalyticsMapR Technologies

MapR Product Update - Spring 2017MapR Technologies

3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies

Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies

MapR and Cisco Make IT BetterMapR Technologies

Evolving from RDBMS to NoSQL + SQLMapR Technologies

Plus de MapR Technologies (20)

Converging your data landscape

ML Workshop 2: Machine Learning Model Comparison & Evaluation

Self-Service Data Science for Leveraging ML & AI on All of Your Data

Enabling Real-Time Business with Change Data Capture

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...

ML Workshop 1: A New Architecture for Machine Learning Logistics

Machine Learning Success: The Key to Easier Model Management

Data Warehouse Modernization: Accelerating Time-To-Action

Live Tutorial – Streaming Real-Time Events Using Apache APIs

Bringing Structure, Scalability, and Services to Cloud-Scale Storage

Live Machine Learning Tutorial: Churn Prediction

An Introduction to the MapR Converged Data Platform

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...

Best Practices for Data Convergence in Healthcare

Geo-Distributed Big Data and Analytics

MapR Product Update - Spring 2017

3 Benefits of Multi-Temperature Data Management for Data Analytics

Cisco & MapR bring 3 Superpowers to SAP HANA Deployments

MapR and Cisco Make IT Better

Evolving from RDBMS to NoSQL + SQL

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

WordPress Websites for Engineers: Elevate Your Brandgvaughan

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Commit 2024 - Secret Management made easyAlfredo García Lavilla

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

"ML in Production",Oleksandr BaganFwdays

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

DSPy a system for AI to Write Prompts and Do Fine Tuning

Generative AI for Technical Writer or Information Developers

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

SAP Build Work Zone - Overview L2-L3.pptx

WordPress Websites for Engineers: Elevate Your Brand

How AI, OpenAI, and ChatGPT impact business and software.

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

Commit 2024 - Secret Management made easy

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Debugging python applications inside k8s environment", Andrii Soldatenko

Streamlining Python Development: A Guide to a Modern Project Setup

Gen AI in Business - Global Trends Report 2024.pdf

Connect Wave/ connectwave Pitch Deck Presentation

DevoxxFR 2024 Reproducible Builds with Apache Maven

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Unraveling Multimodality with Large Language Models.pdf

"ML in Production",Oleksandr Bagan

Introduction to Apache Drill - interactive query and analysis at scale

1. Introduction to Apache Drill – interactive query and analysis at scale Michael Hausenblas, MapR EMEA 2013-02-22, HUG Munich

2. About Michael • Background in large-scale data integration • Chief Data Engineer EMEA, MapR • Apache Drill contributor

3. Workloads • Batch processing (MapReduce) • Light-weight OLTP (HBase, Cassandra) • Stream processing (Storm, S4) • Search (Solr, Elasticsearch) • Interactive analysis

4. Interactive Query at scale Impala low-latency

5. Use Case • Jane, a marketing analyst • Determine target segments • Data from different sources

6. Today’s Solutions • RDBMS-focused – ETL data from MongoDB/Hadoop – Query with SQL • MapReduce-focused – ETL from RDBMS/MongoDB – Use Hive

7. Requirements • Support for different data sources • Support for different query interfaces • Low-latency/real-time • Ad-hoc queries • Scalable and fast • Reliable

8. Google’s Dremel http://research.google.com/pubs/pub36632.html

9. Apache Drill Overview • Inspired by Google Dremel • Standard SQL2003 support • …. other QL (DSL, etc.) possible • Plug-able data sources • Support for nested data (JSON, etc.) • Schema is optional • Community driven, open, 100’s involved

10. Apache Drill Overview

11. High-level Architecture

12. How does it work? • Drillbits per node, maximize data locality • Co-ordination, query planning, optimization, scheduling, execution are distributed Source Logical Physical Query Parser Plan Optimizer Plan Execution SQL 2003, query: [ { topology Scanner API @id: "log", DrQL, op: "sequence", do: [ { MongoQL, op: "scan", source: “logs"} DSL { op: "filter", condition: "x > 3"}, …

13. How does it work?

14. Key Features • Full SQL • Nested data • Optional schema • Extensibility points

15. Full SQL – ANSI SQL2003 • SQL-like is often not enough • Integration with existing tools – Tableau, Excel, SAP Crystal Reports – Use standard ODBC/JDBC driver

16. Nested Data • Nested data becoming prevalent – JSON/BSON, XML, ProtoBuf, Avro – Some data sources support it natively (MongoDB, etc.) – Innovation through Dremel • Flattening nested data is error-prone • Apache Drill supports nested data, extension to ANSI SQL2003

17. Optional Schema • Many data sources don’t have rigid schemas – Schema changes rapidly – Different schema per record (e.g. HBase) • Apache Drill supports queries against unknown schema • user can define schema or via discovery

18. Extensibility Points • Query language (parser) - UDFs • Data sources/formats (scanner) • Optimizer • Custom operators (logical plan) Source Logical Physical Query Parser Plan Optimizer Plan Execution

19. Demo { "id": "0001", "type": "donut", "name": "Cake", "batters": { { "batter”: "sales" : 700.0, [ "typeCount" : 1, { "id": "1001", "type": "Regular" }, "quantity" : 700, { "id": "1002", "type": "Chocolate" }, "ppu" : 1.0 … } { "sales" : 109.71, data source: donuts.json "typeCount" : 2, "quantity" : 159, query:[ { "ppu" : 0.69 op:"sequence", } do:[ { { "sales" : 184.25, op: "scan", "typeCount" : 2, ref: "donuts", "quantity" : 335, source: "local-logs", "ppu" : 0.55 selection: {data: "activity"} } }, { result: out.json op: "filter", expr: "donuts.ppu < 2.00" }, … logical plan: simple_plan.json https://cwiki.apache.org/confluence/display/DRILL/Demo+HowTo

20. Status • Heavy development by multiple orgs • Logical plan, reference interpreter available • SQL interpreter, storage engine implementations (Accumolo, Cassandra, Hbase, etc.) are WIP • Schedule: – Prototype Q1 – Alpha Q2

21. Why do we do it?

22. Engage! • Follow @ApacheDrill on Twitter • Sign up at mailing lists (user|dev) http://incubator.apache.org/drill/mailing-lists.html • Keep an eye on http://drill-user.org/ • Ping me: mhausenblas@maprtech.com

Notes de l'éditeur

Hive: compile to MR, Aster: external tables in MPP, Oracle/MySQL: export MR results to RDBMSDrill, Impala, CitusDB: real-time
Suppose a marketing analyst trying to experiment with ways to do targeting of user segments for next campaign. Needs access to web logs stored in Hadoop, and also needs to access user profiles stored in MongoDB as well as access to transaction data stored in a conventional database.
Re ad-hoc:You might not know ahead of time what queries you will want to make. You may need to react to changing circumstances.
Two innovations: handle nested-data column style (column-striped representation) and query push-down
Source query is parsed and transformed to produce the logical planTypically, the logical plan lives in memory in the form of Java objects, but it also has a textual form. The logical query is then transformed and optimized into the physical plan.The physical plan represents the actual structure of computation as it is done by the system. One of the most important things the optimizer does is the introduction of parallel computation (other: columnar data to improve processing speed)