SlideShare une entreprise Scribd logo
1  sur  95
Hive User Group Meeting August 2009
Hive Overview
Why Another Data Warehousing System? ,[object Object],[object Object],[object Object]
What is HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why SQL on Hadoop? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*  Databases  – supported in metastore, but not yet in query language
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Types ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* More details about user defined types in extensibility section
Hive Architecture
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Application ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* sqoop - http://www.cloudera.com/hadoop-sqoop
Example Query (Filter) ,[object Object],[object Object]
Example Query (Aggregation) ,[object Object],[object Object]
Facebook
[object Object],Facebook
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example Query (multi-group-by)
 
Hive Metastore
Single User Mode (Default)
Multi User Mode
Remote Server
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Meta Data Model
Hive Optimizations
Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Column Pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Predicate Pushdown ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
Partition Pruning ,[object Object],[object Object]
Partition Pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
Hive QL – Join ,[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object]
Hive QL – Join ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Join Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Map Join page_view user Hash table pv_users key value 111 <1,2> 222 <2> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male Pageid age 1 25 2 25 1 32
Map Join ,[object Object],[object Object],[object Object],[object Object]
Parameters ,[object Object],[object Object],[object Object]
Future ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Group By ,[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 1 25 pageid age count 1 25 3 pageid age 2 32 1 25 key value <1,25> 2 key value <1,25> 1 <2,32> 1 key value <1,25> 2 <1,25> 1 key value <2,32> 1 Shuffle Sort pageid age count 2 32 1
Group by Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multi GroupBy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Key: userid Value: gender, age gender age userid M 25 1 M 25 2 M 25 1 M 24 1 F 24 2 F 24 1 gender dist count M 2 4 F 2 2 gender dist count M 1 3 F 1 1 age dist 24 1 25 1 gender dist count M 1 1 F 1 1 age dist 25 1 24 1 age dist 25 1 24 1
[object Object],[object Object],[object Object],[object Object]
Merging of small files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Extensibility Features
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
Hive is an open system ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
File Format Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Existing File Formats Facebook *  Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only  text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
When to add a new File Format ,[object Object],[object Object],Facebook
How to add a new File Format ,[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
SerDe Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
SerDe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Existing SerDes Facebook * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
When to add a new SerDe ,[object Object],[object Object],Facebook
How to add a new SerDe for text data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
How to add a new SerDe for binary data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
Map/Reduce Scripts Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Map/Reduce Scripts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
UDF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
More efficient UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Overloaded UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Implicit Type Conversions for UDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Variable-length Arguments (HIVE-699) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Summary for UDFs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
[object Object],Facebook
UDAF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Overloaded UDAF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
UDAF with Complex Intermediate Result ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
UDAF without Partial Aggregations ,[object Object],[object Object],[object Object],[object Object],Facebook
Summary for UDAFs ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Comparison of UDF/UDAF v.s. M/R scripts Facebook * UDTF: User-defined Table Generation Function HIVE-655 UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output not supported yet (UDTF) supported Speed faster Slower
[object Object],Facebook
How to contribute the work ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Q & A ,[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Hive – Roadmap
Branches and Releases ,[object Object],[object Object],[object Object],Facebook
Features available in 0.4 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
Future stuff ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Facebook
More Future stuff ,[object Object],[object Object],[object Object],[object Object],Facebook

Contenu connexe

Tendances

High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariDataWorks Summit
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleDataWorks Summit
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 

Tendances (20)

What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 

En vedette

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveTapan Avasthi
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 

En vedette (6)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 

Similaire à Hive User Meeting August 2009 Facebook

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008athusoo
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 HiveZheng Shao
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 HiveNamit Jain
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 
Hadoop and Hive
Hadoop and HiveHadoop and Hive
Hadoop and HiveZheng Shao
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009prasadc
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoopChirag Ahuja
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?José Lin
 

Similaire à Hive User Meeting August 2009 Facebook (20)

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hive
HiveHive
Hive
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Hadoop and Hive
Hadoop and HiveHadoop and Hive
Hadoop and Hive
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Sql
SqlSql
Sql
 
What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Hive User Meeting August 2009 Facebook

  • 1. Hive User Group Meeting August 2009
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 18.
  • 19.
  • 20.  
  • 22. Single User Mode (Default)
  • 25.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Hive QL – Map Join page_view user Hash table pv_users key value 111 <1,2> 222 <2> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male Pageid age 1 25 2 25 1 32
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 1 25 pageid age count 1 25 3 pageid age 2 32 1 25 key value <1,25> 2 key value <1,25> 1 <2,32> 1 key value <1,25> 2 <1,25> 1 key value <2,32> 1 Shuffle Sort pageid age count 2 32 1
  • 49.
  • 50.
  • 51.
  • 52. Hive QL – Group By in Map Reduce pv_users Key: userid Value: gender, age gender age userid M 25 1 M 25 2 M 25 1 M 24 1 F 24 2 F 24 1 gender dist count M 2 4 F 2 2 gender dist count M 1 3 F 1 1 age dist 24 1 25 1 gender dist count M 1 1 F 1 1 age dist 25 1 24 1 age dist 25 1 24 1
  • 53.
  • 54.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61. Existing File Formats Facebook * Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67. Existing SerDes Facebook * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Comparison of UDF/UDAF v.s. M/R scripts Facebook * UDTF: User-defined Table Generation Function HIVE-655 UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output not supported yet (UDTF) supported Speed faster Slower
  • 88.
  • 89.
  • 90.
  • 92.
  • 93.
  • 94.
  • 95.