SlideShare une entreprise Scribd logo
1  sur  22
Analyzing HBase Data with
Apache Hive
Swarnim Kulkarni, Cerner Corporation
Nick Dimiduk, Hortonworks
Brock Noland, StreamSets
May 7th, 2015
Who are we?
● Nick Dimiduk
o Apache HBase Committer and PMC member
o Co-author of HBase in Action
● Brock Noland
o Apache Hive Committer and PMC member
● Swarnim Kulkarni
o Lead Architect at Cerner Corporation
o Contributor to Apache Hive
Agenda
● Apache Hive Basics
● Hive + HBase - Architecture
● Hive + HBase - Features and Improvements
● Future Work
● Q & A
Apache Hive
● De Facto standard for ad-hoc analysis of data in
Hadoop
● SQL-like language called HiveQL for querying of data
● Scalable
o SQL queries translate to M/R jobs
● Extensible
o Plugin custom mappers/reducers
o Custom UDFs/UDAFs
o Custom FileFormats/SerDes
Apache Hive
Hive/HBase Integration
● Brings best of both world together
● Familiar analytical tooling of Hive to cover
online data stored in HBase
● No need for analysts to write M/R jobs to
analyze the data in HBase
● Uses StorageHandler to access data stored
and managed by HBase
Hive/HBase Integration
Improvements and New features
Query HBase Snapshots (HIVE-6584)
● Queries over HBase snapshots on HDFS
instead of online Region Servers
● Specify hive.hbase.snapshot.name instead
of hbase.table.name to query the snapshot
● Under the hood:
o Map tasks embed mini-RS, open snapshot regions
o Snapshot restored to a unique directory under /tmp
o Location override: hive.hbase.snapshot.restoredir
Query HBase Snapshots (HIVE-6584)
Query without snapshots
hive> CREATE EXTERNAL TABLE store_sales(...) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...;
hive> SELECT * FROM store_sales WHERE ss_item_sk > 60010
and ss_ticket_number < 60030;
Query HBase Snapshots (HIVE-6584)
Query with snapshots
hbase(main)> snapshot 'store_sales', 'store_sales_snap0'
hive> SET hive.hbase.snapshot.name=store_sales_snap0;
hive> SELECT * FROM store_sales WHERE ss_item_sk > 60010
and ss_ticket_number < 60030;
● Create HFiles with HBaseStorageHandler
● Set the following properties:
o set hive.hbase.generatehfiles=true
o set hfile.family.path=/tmp/columnfamily_name;
● hfile.family.path can also be set as a table
property
HFile support for bulk HBase uploads (HIVE-
6473)
HFile support for bulk HBase uploads (HIVE-
6473)
hive> CREATE EXTERNAL TABLE store_sales(...) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...;
hive> SET hive.hbase.generatehfiles=true;
hive> SET hfile.family.path=/tmp/new_store_sales_records/cf;
hive> INSERT OVERWRITE TABLE store_sales SELECT DISTINCT key,
value FROM some_table CLUSTER BY key;
Query HBase composite keys (HIVE-2599)
● Support simple and complex
implementations
● Delimiters for delimited composite keys
provided as a part of the DDL
● For complex implementations, custom
implementation of HBaseCompositeKey or
HBaseKeyFactory
hive> CREATE EXTERNAL TABLE hbase_table_1(key
struct<a:string,b:string,c:string>, value string)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,test-
family:test-qual")
TBLPROPERTIES ("hbase.table.name" = "SIMPLE_TABLE");
hive> select key.a,key.b,key.c from hbase_table_1;
Query HBase composite keys (HIVE-2599)
public class MyCompositeKey extends HBaseCompositeKey {
/** This is a required constructor **/
MyCompositeKey(LazySimpleStructObjectInspector oi, Properties tbl, Configuration conf){
…
}
@Override
Object getField(int n){
// override this to return the field at index “n” in the key
}
}
# Provide this class in the DDL
CREATE EXTERNAL TABLE MyTable(......)TBLPROPERTIES(..,hbase.composite.key.class=MyCompositeKey);
Query HBase composite keys (HIVE-2599)
public interface HBaseKeyFactory extends HiveStoragePredicateHandler {
/** Initialize factory with properties */
void init(HBaseSerDeParameters hbaseParam, Properties properties) throws SerDeException;
/** Create custom object inspector for hbase key */
ObjectInspector createKeyObjectInspector(TypeInfo type) throws SerDeException;
/** Create custom object for hbase key */
LazyObjectBase createKey(ObjectInspector inspector) throws SerDeException;
/** Serialize hive object in internal format of custom key */
byte[] serializeKey(Object object, StructField field) throws IOException;
}
# Provide the implementation in the DDL
CREATE EXTERNAL TABLE MyTable(......)TBLPROPERTIES(..,hbase.composite.key.factory=MyCompositeKeyFactory);
Query HBase composite keys (HIVE-2599)
Query HBase timestamps (HIVE-2828)
● First class support to query HBase
timestamps
● Use special :timestamp to pull up the
timestamps
● Specified as part of the
HBASE_COLUMN_MAPPING
Query HBase timestamps (HIVE-2828)
hive> CREATE TABLE hbase_table (key string, value
string, time timestamp)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf:string,:timestamp");
hive> SELECT key, value, cast(time as timestamp)
FROM hbase_table WHERE key > 100 AND key < 400 AND
time < 200000000000;
Additional Improvements
● Support to query avro structs stored in
HBase (HIVE-6147) - no serializing
capability yet (HIVE-8020)
● Support for pulling HBase columns with
wildcards (HIVE-3725)
● Multiple bug fixes and performance
enhancements
Coming to a Hive Release Near You!
● Query HBase Snapshots, 0.14.0
● HFile support for bulk HBase uploads, 0.14.0
● Query HBase composite keys, 0.13.0
● Query HBase timestamps, 1.1.0
● Support for pulling HBase columns with
wildcards, 0.12.0
Future Work
● Tighter integration with Phoenix
● Stronger support for salted HBase keys
(HIVE-7128)
● Support for HBase DataType API (HIVE-
6150)
● Improved HBase bulk load facility (HIVE-
4765)

Contenu connexe

Tendances

Tendances (20)

Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
6.hive
6.hive6.hive
6.hive
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 

En vedette

Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 

En vedette (6)

Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Similaire à HBaseCon 2015: Analyzing HBase Data with Apache Hive

Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
Henk van der Valk
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installation
sushantbit04
 
Offline strategies for HTML5 web applications - IPC12
Offline strategies for HTML5 web applications - IPC12Offline strategies for HTML5 web applications - IPC12
Offline strategies for HTML5 web applications - IPC12
Stephan Hochdörfer
 
Hd insight programming
Hd insight programmingHd insight programming
Hd insight programming
Casear Chu
 

Similaire à HBaseCon 2015: Analyzing HBase Data with Apache Hive (20)

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installation
 
Apache Kite
Apache KiteApache Kite
Apache Kite
 
Offline strategies for HTML5 web applications - IPC12
Offline strategies for HTML5 web applications - IPC12Offline strategies for HTML5 web applications - IPC12
Offline strategies for HTML5 web applications - IPC12
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hd insight programming
Hd insight programmingHd insight programming
Hd insight programming
 
vitepress-en.pdf
vitepress-en.pdfvitepress-en.pdf
vitepress-en.pdf
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
מיכאל
מיכאלמיכאל
מיכאל
 
Play!ng with scala
Play!ng with scalaPlay!ng with scala
Play!ng with scala
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Saving Time And Effort With QuickBase Api - Sergio Haro
Saving Time And Effort With QuickBase Api - Sergio HaroSaving Time And Effort With QuickBase Api - Sergio Haro
Saving Time And Effort With QuickBase Api - Sergio Haro
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Boost your website by running PHP on Nginx
Boost your website by running PHP on NginxBoost your website by running PHP on Nginx
Boost your website by running PHP on Nginx
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 

Plus de HBaseCon

Plus de HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Dernier (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

HBaseCon 2015: Analyzing HBase Data with Apache Hive

  • 1. Analyzing HBase Data with Apache Hive Swarnim Kulkarni, Cerner Corporation Nick Dimiduk, Hortonworks Brock Noland, StreamSets May 7th, 2015
  • 2. Who are we? ● Nick Dimiduk o Apache HBase Committer and PMC member o Co-author of HBase in Action ● Brock Noland o Apache Hive Committer and PMC member ● Swarnim Kulkarni o Lead Architect at Cerner Corporation o Contributor to Apache Hive
  • 3. Agenda ● Apache Hive Basics ● Hive + HBase - Architecture ● Hive + HBase - Features and Improvements ● Future Work ● Q & A
  • 4. Apache Hive ● De Facto standard for ad-hoc analysis of data in Hadoop ● SQL-like language called HiveQL for querying of data ● Scalable o SQL queries translate to M/R jobs ● Extensible o Plugin custom mappers/reducers o Custom UDFs/UDAFs o Custom FileFormats/SerDes
  • 6. Hive/HBase Integration ● Brings best of both world together ● Familiar analytical tooling of Hive to cover online data stored in HBase ● No need for analysts to write M/R jobs to analyze the data in HBase ● Uses StorageHandler to access data stored and managed by HBase
  • 9. Query HBase Snapshots (HIVE-6584) ● Queries over HBase snapshots on HDFS instead of online Region Servers ● Specify hive.hbase.snapshot.name instead of hbase.table.name to query the snapshot ● Under the hood: o Map tasks embed mini-RS, open snapshot regions o Snapshot restored to a unique directory under /tmp o Location override: hive.hbase.snapshot.restoredir
  • 10. Query HBase Snapshots (HIVE-6584) Query without snapshots hive> CREATE EXTERNAL TABLE store_sales(...) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...; hive> SELECT * FROM store_sales WHERE ss_item_sk > 60010 and ss_ticket_number < 60030;
  • 11. Query HBase Snapshots (HIVE-6584) Query with snapshots hbase(main)> snapshot 'store_sales', 'store_sales_snap0' hive> SET hive.hbase.snapshot.name=store_sales_snap0; hive> SELECT * FROM store_sales WHERE ss_item_sk > 60010 and ss_ticket_number < 60030;
  • 12. ● Create HFiles with HBaseStorageHandler ● Set the following properties: o set hive.hbase.generatehfiles=true o set hfile.family.path=/tmp/columnfamily_name; ● hfile.family.path can also be set as a table property HFile support for bulk HBase uploads (HIVE- 6473)
  • 13. HFile support for bulk HBase uploads (HIVE- 6473) hive> CREATE EXTERNAL TABLE store_sales(...) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...; hive> SET hive.hbase.generatehfiles=true; hive> SET hfile.family.path=/tmp/new_store_sales_records/cf; hive> INSERT OVERWRITE TABLE store_sales SELECT DISTINCT key, value FROM some_table CLUSTER BY key;
  • 14. Query HBase composite keys (HIVE-2599) ● Support simple and complex implementations ● Delimiters for delimited composite keys provided as a part of the DDL ● For complex implementations, custom implementation of HBaseCompositeKey or HBaseKeyFactory
  • 15. hive> CREATE EXTERNAL TABLE hbase_table_1(key struct<a:string,b:string,c:string>, value string) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '~' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,test- family:test-qual") TBLPROPERTIES ("hbase.table.name" = "SIMPLE_TABLE"); hive> select key.a,key.b,key.c from hbase_table_1; Query HBase composite keys (HIVE-2599)
  • 16. public class MyCompositeKey extends HBaseCompositeKey { /** This is a required constructor **/ MyCompositeKey(LazySimpleStructObjectInspector oi, Properties tbl, Configuration conf){ … } @Override Object getField(int n){ // override this to return the field at index “n” in the key } } # Provide this class in the DDL CREATE EXTERNAL TABLE MyTable(......)TBLPROPERTIES(..,hbase.composite.key.class=MyCompositeKey); Query HBase composite keys (HIVE-2599)
  • 17. public interface HBaseKeyFactory extends HiveStoragePredicateHandler { /** Initialize factory with properties */ void init(HBaseSerDeParameters hbaseParam, Properties properties) throws SerDeException; /** Create custom object inspector for hbase key */ ObjectInspector createKeyObjectInspector(TypeInfo type) throws SerDeException; /** Create custom object for hbase key */ LazyObjectBase createKey(ObjectInspector inspector) throws SerDeException; /** Serialize hive object in internal format of custom key */ byte[] serializeKey(Object object, StructField field) throws IOException; } # Provide the implementation in the DDL CREATE EXTERNAL TABLE MyTable(......)TBLPROPERTIES(..,hbase.composite.key.factory=MyCompositeKeyFactory); Query HBase composite keys (HIVE-2599)
  • 18. Query HBase timestamps (HIVE-2828) ● First class support to query HBase timestamps ● Use special :timestamp to pull up the timestamps ● Specified as part of the HBASE_COLUMN_MAPPING
  • 19. Query HBase timestamps (HIVE-2828) hive> CREATE TABLE hbase_table (key string, value string, time timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:string,:timestamp"); hive> SELECT key, value, cast(time as timestamp) FROM hbase_table WHERE key > 100 AND key < 400 AND time < 200000000000;
  • 20. Additional Improvements ● Support to query avro structs stored in HBase (HIVE-6147) - no serializing capability yet (HIVE-8020) ● Support for pulling HBase columns with wildcards (HIVE-3725) ● Multiple bug fixes and performance enhancements
  • 21. Coming to a Hive Release Near You! ● Query HBase Snapshots, 0.14.0 ● HFile support for bulk HBase uploads, 0.14.0 ● Query HBase composite keys, 0.13.0 ● Query HBase timestamps, 1.1.0 ● Support for pulling HBase columns with wildcards, 0.12.0
  • 22. Future Work ● Tighter integration with Phoenix ● Stronger support for salted HBase keys (HIVE-7128) ● Support for HBase DataType API (HIVE- 6150) ● Improved HBase bulk load facility (HIVE- 4765)