SlideShare une entreprise Scribd logo
How Impala Works 
Yue Chen 
http://linkedin.com/in/yuechen2 
http://dataera.wordpress.com
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
What’s Impala?
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
This is Impala…
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Goal of Impala 
A general SQL engine for distributed systems, supporting both OLTP and OLAP. 
Interactive (real-time) queries. 
Built on top of HDFS and HBase. 
Engine is written in C++, fast. 
The database execution engine is like that of massively parallel processing (MPP) databases, not using MapReduce.
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
What’s Impala? 
In-memory, distributed SQL query engine (no MapReduce) 
Native backend code (C++) 
Distributed on HDFS data nodes
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
What’s Impala?
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Why Impala?
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
FAST!
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Why HDFS? 
Low cost 
Reliability 
Easy to scale out
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Architecture
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Architecture Overview 
impalad daemon runs on HDFS nodes 
statestored for cluster metadata 
(Hive) metastore for database metadata 
Queries run on relevant nodes 
Data streamed to clients
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Architecture Overview 
Submit queries via Hue/Beeswax, Thrift API, CLI, ODBC, JDBC 
No fault tolerance (query fails if any query on any node fails) 
Intermediate data never hits disk
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
statestored 
Acts as a cluster monitor 
Not a single point of failure
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Metadata 
Uses Hive metastore 
Daemons cache metadata 
Can create tables in Hive or Impala
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Impala Architecture Summary 
impalad 
Runs on every node 
Handles client requests 
Handles query planning & execution 
statestored 
Provides name service 
Metadata distribution 
Used for finding data 
catalogd 
Relays metadata changes to all impalad’s
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Impala Architecture: Query Execution Phases
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Overview
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Impala Partition 
Example: 
create table census (name string, census_year int) partitioned by (year int); 
insert into census partition (year=2010) values ('Smith',2010),('Jones',2010); 
Each partition has its own HDFS directory, and all the data for that partition is stored in a data file in that directory 
To manually define how to partition the table (e.g., year mod 5 == 0), we have to create a new column to store the calculation result and then do the partition
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Frontend
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Flow of a SQL query
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Compilation
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Parsing
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Semantic Analysis
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Semantic Analysis
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Goals 
Generates executable plan (“tree” of operators) 
Maximize scan locality using DataNode block metadata 
Minimize data movement 
Full distribution of operators 
Query operators 
Scan, HashJoin, HashAggregation, Union, TopN, Exchange
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Overview
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Single-Node Plan 
Four major functions: 
1. Parse Tree -> Plan Tree 
2. Assigns predicates to lowest plan node 
Optimizes join order 
Prunes irrelevant columns
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Parse Tree → Single-Node Plan Tree
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Predicate Assignment & Inference
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Join-Order Optimization 
Impala only considers left-deep join trees 
(Right join input is a table, not another join) 
Find cheapest valid join order 
Relies heavily on table and column statistics
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Invalid Join Orders
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Join-Order Optimization
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Join-Order Optimization 
Impala’s Implementation: 
1. Heuristic 
Order tables descending by size 
Best plan typically has largest table on the left (if valid) 
2. Plan enumeration & costing 
Generates all possible join orders starting from a given left-most table (starting with largest one) 
Ignore invalid join orders 
Estimates intermediate result sizes (key!) 
Chooses plan that minimizes intermediate result sizes
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Overview
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Distributed Plans
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Distributed Plans
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Two Types of Hash Joins 
Default is BROADCAST (aka Replicated) 
Each node ends up with a copy of the right table(s) 
Left side, read locally and streamed through local hash join 
Good for one large table and multiple small tables 
Alternative hash join type is SHUFFLE (aka partitioned) 
Right side hashed and shuffled; each node gets 1/N of the data 
Left side hashed and shuffled, then streamed through join 
Best choice for large_table JOIN large_table
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Join Hint 
select … 
from large_table 
join [broadcast] small_table 
select … 
from large_table 
join [shuffle] large_table
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Determine Join Type from EXPLAIN
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Query Planning: Distributed Plans
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
HDFS Improvement Motivated by Impala 
Exposes HDFS block replica disk location information 
Allows for explicitly co-located block replicas across files 
In-memory caching of hot tables/files 
Reduces copies during reading, short-circuit reads
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Disk Location of Block Replica 
Problem: 
DataNode knows which DataNode blocks are on, not which disks 
Only the DNs are aware of block replica->disk mapping 
Impala wants to make sure that separate plan fragments operate on data on separate disks 
Maximizes aggregate available disk throughput
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Disk Location of Block Replica 
Solution: 
Adds new RPC call to DataNode to expose which volumes (disks) replicas are stored on 
During query planning phase, impalad… 
Determines all DNs data for query is stored on 
Queries these DNs to get volume information 
During query execution phase, impalad… 
Queues disk reads so that only 1 or 2 reads ever happen to a given disk at a given time 
With this additional information, Impala is able to ensure disk reads are large, minimizing seeks
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Co-located Block Replicas 
Problem: 
When performing a join, a single impalad may have to read from both a local file and a remote file on another DN 
Ideally all reads should be done on local disks (assuming that local read is faster than remote read)
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Co-located Block Replicas 
Solution: 
Adds features to HDFS to specify that a set of files should have their replicas placed on the same set of nodes 
Gives Impala more control of data 
Can ensure that tables/files which are joined frequently have their data co-located 
Additionally, more fine-grained block placement control allows for potential improvement in columnar storage format like Parquet
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
In-memory Caching 
Problem: 
Impala queries are IO-bound 
Memory is fast and getting cheaper
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
In-memory Caching 
Solution: 
Adds facility to HDFS to explicitly read specific HDFS files into memory 
Allows Impala to read data at full memory bandwidth speed 
Gives cluster operator control over which files/tables are queried frequently and thus could be kept in memory
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Short-circuit Reads 
Problem: 
A typical read in HDFS must be read from disk by DataNode, copied into DN memory, sent over network, copied into client buffers.
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Short-circuit Reads 
Solution: 
Reads are performed directly on local files, using direct buffers 
In HDFS, allow for reads to completely bypass DataNode when client is co-located with block replica files, avoiding overhead of HDFS API 
Reads data directly from disk to client buffers
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Code Generation
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Why Code Generation?
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Why Code Generation? 
SPEED!
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Why Code Generation? 
Code generation (codegen) lets us use query- specific information to do less work 
Remove conditionals 
Propagate constant offsets, pointers, etc. 
Inline virtual function calls
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
User-Defined Functions (UDFs) 
Allows users to extend Impala’s functionality by writing their own functions 
e.g. select my_func(c1) from table; 
Defined as C++ functions 
UDFs can be compiled to LLVM IR with Clang ⇒ inline UDFs 
IR can be just-in-time compiled (JIT’d) and replace the interpreted functions
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Performance (Jan 2014) 
3TB (TPC-DS scale factor 3,000) across five typical Hadoop DataNodes (dual- socket, 8-core, 16-thread CPU; 96GB memory; 1Gbps Ethernet; 12 x 2TB disk drives).
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Performance (Jan 2014) 
30TB set of TPC-DS data (scale factor 30,000), 20 nodes with 96GB memory per node
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
Weaknesses and Limitations 
Data is immutable, no updating 
Response time is not microsecond 
Do not support some operations, like update and delete 
No beyond SQL and advanced data structures (buckets, samples, transforms, arrays, structs, maps, xpath, json) 
When broadcast join, smaller table has to fit in aggregate memory of all executing nodes 
No custom storage format 
LIMIT required when using ORDER BY 
High memory usage
http://dataera.wordpress.com 
http://linkedin.com/in/yuechen2 
References 
Cloudera Impala official documentation and slides

Contenu connexe

Tendances

Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceDataWorks Summit
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at FacebookDatabricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 

Tendances (20)

Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 

Similaire à How Impala Works

Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comEdward D. Kim
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Amazon Web Services
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paperJethroData
 
Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawFlamer
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 

Similaire à How Impala Works (20)

Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Unit 1
Unit 1Unit 1
Unit 1
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Gg steps
Gg stepsGg steps
Gg steps
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 
Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan Law
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
td1.ppt
td1.ppttd1.ppt
td1.ppt
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 

Plus de Yue Chen

KARMA: Adaptive Android Kernel Live Patching
KARMA: Adaptive Android Kernel Live PatchingKARMA: Adaptive Android Kernel Live Patching
KARMA: Adaptive Android Kernel Live PatchingYue Chen
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionYue Chen
 
Ravel: Pinpointing Vulnerabilities
Ravel: Pinpointing VulnerabilitiesRavel: Pinpointing Vulnerabilities
Ravel: Pinpointing VulnerabilitiesYue Chen
 
Pinpointing Vulnerabilities (Ravel)
Pinpointing Vulnerabilities (Ravel)Pinpointing Vulnerabilities (Ravel)
Pinpointing Vulnerabilities (Ravel)Yue Chen
 
SecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security SystemsSecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security SystemsYue Chen
 
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)Yue Chen
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL SupportYue Chen
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisYue Chen
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 

Plus de Yue Chen (9)

KARMA: Adaptive Android Kernel Live Patching
KARMA: Adaptive Android Kernel Live PatchingKARMA: Adaptive Android Kernel Live Patching
KARMA: Adaptive Android Kernel Live Patching
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache Execution
 
Ravel: Pinpointing Vulnerabilities
Ravel: Pinpointing VulnerabilitiesRavel: Pinpointing Vulnerabilities
Ravel: Pinpointing Vulnerabilities
 
Pinpointing Vulnerabilities (Ravel)
Pinpointing Vulnerabilities (Ravel)Pinpointing Vulnerabilities (Ravel)
Pinpointing Vulnerabilities (Ravel)
 
SecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security SystemsSecPod: A Framework for Virtualization-based Security Systems
SecPod: A Framework for Virtualization-based Security Systems
 
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
Remix: On-demand Live Randomization (Fine-grained live ASLR during runtime)
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL Support
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 

Dernier

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageGlobus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of ProgrammingMatt Welsh
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfAMB-Review
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILNatan Silnitsky
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024Ortus Solutions, Corp
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
 

Dernier (20)

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 

How Impala Works