SlideShare une entreprise Scribd logo
1  sur  36
Confidential, Copyright © Quanticate
Introduction to
Apache Hive
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache and Apache Hive project logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
 Who Am I ?
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 What is HQL?
 HQL Demo
Confidential, Copyright © Quanticate
Who Am I ?
 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint
 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra
 Primary author of Apache Cassandra Cookbook (In writing )
 Csharpcorner MVP
 Frequent blogger
Confidential, Copyright © Quanticate
What is Apache Hive?
 Apache Hive - SQL on top of Hadoop
 A data warehouse infrastructure built on top of Hadoop for providing data summarization,
query, and analysis.
Confidential, Copyright © Quanticate
Apache Hive key features
 Similar to SQL
 SQL has a huge user base
 SQL is easy to code
 Rich data types (structs, lists and maps)
 Supports SQL filters, joins, group-by and Order by clause
 Extensibility – Custom Types, Custom Functions etc
Confidential, Copyright © Quanticate
Apache Hive architecture
Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
Confidential, Copyright © Quanticate
How Apache Hive works in
Apache Hadoop Eco-system
Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
Confidential, Copyright © Quanticate
Where Apache Hive is useful?
It is well suited for batch processing.
 Log processing,
 Text mining,
 Document indexing,
 Customer-facing business intelligence,
 Predictive modeling etc
Confidential, Copyright © Quanticate
Where is Apache Hive is not useful?
Hive is not designed for,
 Online transaction processing
 Real-time queries
Confidential, Copyright © Quanticate
Who uses of Apache Hive?
Apache Hive is used by,
 Bizo - Uses Hive for reporting and ad hoc queries.
 Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.
 CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.
 Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.
 HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web
analytics
 Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal
and user-facing analytics
Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
Confidential, Copyright © Quanticate
What is HQL?
HQL : Hive Query Language
• Does not conform any ANSI standard
• Very close to MySQL dialect, but with some differences
• SQL to HQL cheat Sheet http://hortonworks.com/wp-
content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
• HQL does not support transactions, so don’t compare with RDBMS
Confidential, Copyright © Quanticate
HQL – Create table
Syntax:
CREATE TABLE <table_name> (<column_definitions>)
[ROW FORMAT <row_format>]
[STORED AS <file_format>]
Example:
CREATE TABLE posts (user STRING, post STRING, time BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
Confidential, Copyright © Quanticate
HQL – Create table Demo
Confidential, Copyright © Quanticate
HQL – Describe table
Syntax :
describe <table_name>;
Example:
describe posts;
Confidential, Copyright © Quanticate
HQL – Describe table demo
Confidential, Copyright © Quanticate
HQL – Show all tables
Syntax:
show tables;
show tables [<filter>];
Example:
show tables;
Show tables ‘table*';
Confidential, Copyright © Quanticate
HQL – Show all tables demo
Confidential, Copyright © Quanticate
HQL – Alter table
Syntax:
ALTER TABLE <table_name> RENAME TO <new_table_name>
ALTER TABLE <table_name> change <old_column_name>
<new_column_name> <new_data_type>;
Example:
//Rename table name
Alter table posts rename to myposts;
// Rename column name with data type change
Alter table posts change time time1 string;
Confidential, Copyright © Quanticate
HQL – Alter table demo
Confidential, Copyright © Quanticate
HQL – How to get records into
Apache Hive tables?
There are two ways to load the data into Apache Hive tables
 Using insert statement
Used to load the data from another table using select statement
 Using Load statement
Used to load the data from a file
Confidential, Copyright © Quanticate
HQL – Insert records
Syntax:
Insert into table <tablename>
select_statement1 from <another_table>;
Example:
Insert into table posts
select “user1”, “Demo“, “123” from table1
Confidential, Copyright © Quanticate
HQL – Insert records demo
Confidential, Copyright © Quanticate
HQL – Load data
Syntax:
Load data inpath <filepath> [overwrite] into table <tablename>
Example:
Load data inpath '/user/hue/posts.csv' into table 'posts'
Confidential, Copyright © Quanticate
HQL –Load data
Confidential, Copyright © Quanticate
HQL – Update records
Syntax:
There is no specific syntax for update, but you can insert statement
with overwrite option.
Example:
Insert overwrite table posts
select “user1”, “Demo“, “123” from table1 where id = ‘123’
Confidential, Copyright © Quanticate
HQL – Update records demo
Confidential, Copyright © Quanticate
HQL – Delete records
You can not records from Apache Hive tables!
Confidential, Copyright © Quanticate
HQL – Delete records demo
Confidential, Copyright © Quanticate
HQL – Drop table
Syntax:
drop table <table_name>
Example:
drop table posts;
Confidential, Copyright © Quanticate
HQL – Drop table demo
Confidential, Copyright © Quanticate
Summary
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 Getting started with HQL
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
For the next session !!
 Partitioning
 Bucketing
 Union
 Sub queries
 Joins
 Group By
 Order By
 Aggregations
Confidential, Copyright © Quanticate
References
https://hive.apache.org/
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
https://cwiki.apache.org/confluence/display/Hive/Home
https://cwiki.apache.org/confluence/display/Hive/PoweredBy
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

Contenu connexe

Tendances

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Edureka!
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

Tendances (20)

Hive tuning
Hive tuningHive tuning
Hive tuning
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Hive
HiveHive
Hive
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Sqoop
SqoopSqoop
Sqoop
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Cqrs api v2
Cqrs api v2Cqrs api v2
Cqrs api v2
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 

En vedette

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveTapan Avasthi
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hive tutorial , introduction
Hive tutorial , introductionHive tutorial , introduction
Hive tutorial , introductionIntellipaat
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008athusoo
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Namit Jain
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 

En vedette (20)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Hive tutorial , introduction
Hive tutorial , introductionHive tutorial , introduction
Hive tutorial , introduction
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 

Similaire à Apache Hive - Introduction

Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesJanBask Training
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchVMware Tanzu
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopKai Sasaki
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureDataWorks Summit/Hadoop Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to AccumuloDonald Miner
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureAnita Luthra
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 

Similaire à Apache Hive - Introduction (20)

Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Hive
HiveHive
Hive
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 

Plus de Muralidharan Deenathayalan (10)

What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Alfresco 5.0 features
Alfresco 5.0 featuresAlfresco 5.0 features
Alfresco 5.0 features
 
Test drive on driven development process
Test drive on driven development processTest drive on driven development process
Test drive on driven development process
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Alfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisationAlfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisation
 
Introduction about Alfresco webscript
Introduction about Alfresco webscriptIntroduction about Alfresco webscript
Introduction about Alfresco webscript
 
Alfresco activiti workflows
Alfresco activiti workflowsAlfresco activiti workflows
Alfresco activiti workflows
 
Alfresco content model
Alfresco content modelAlfresco content model
Alfresco content model
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Apache Hive - Introduction

  • 1. Confidential, Copyright © Quanticate Introduction to Apache Hive Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache and Apache Hive project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda  Who Am I ?  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  What is HQL?  HQL Demo
  • 3. Confidential, Copyright © Quanticate Who Am I ?  7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint  2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra  Primary author of Apache Cassandra Cookbook (In writing )  Csharpcorner MVP  Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Apache Hive?  Apache Hive - SQL on top of Hadoop  A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
  • 5. Confidential, Copyright © Quanticate Apache Hive key features  Similar to SQL  SQL has a huge user base  SQL is easy to code  Rich data types (structs, lists and maps)  Supports SQL filters, joins, group-by and Order by clause  Extensibility – Custom Types, Custom Functions etc
  • 6. Confidential, Copyright © Quanticate Apache Hive architecture Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
  • 7. Confidential, Copyright © Quanticate How Apache Hive works in Apache Hadoop Eco-system Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
  • 8. Confidential, Copyright © Quanticate Where Apache Hive is useful? It is well suited for batch processing.  Log processing,  Text mining,  Document indexing,  Customer-facing business intelligence,  Predictive modeling etc
  • 9. Confidential, Copyright © Quanticate Where is Apache Hive is not useful? Hive is not designed for,  Online transaction processing  Real-time queries
  • 10. Confidential, Copyright © Quanticate Who uses of Apache Hive? Apache Hive is used by,  Bizo - Uses Hive for reporting and ad hoc queries.  Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.  CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.  Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.  HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web analytics  Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal and user-facing analytics Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
  • 11. Confidential, Copyright © Quanticate What is HQL? HQL : Hive Query Language • Does not conform any ANSI standard • Very close to MySQL dialect, but with some differences • SQL to HQL cheat Sheet http://hortonworks.com/wp- content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf • HQL does not support transactions, so don’t compare with RDBMS
  • 12. Confidential, Copyright © Quanticate HQL – Create table Syntax: CREATE TABLE <table_name> (<column_definitions>) [ROW FORMAT <row_format>] [STORED AS <file_format>] Example: CREATE TABLE posts (user STRING, post STRING, time BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
  • 13. Confidential, Copyright © Quanticate HQL – Create table Demo
  • 14. Confidential, Copyright © Quanticate HQL – Describe table Syntax : describe <table_name>; Example: describe posts;
  • 15. Confidential, Copyright © Quanticate HQL – Describe table demo
  • 16. Confidential, Copyright © Quanticate HQL – Show all tables Syntax: show tables; show tables [<filter>]; Example: show tables; Show tables ‘table*';
  • 17. Confidential, Copyright © Quanticate HQL – Show all tables demo
  • 18. Confidential, Copyright © Quanticate HQL – Alter table Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name> ALTER TABLE <table_name> change <old_column_name> <new_column_name> <new_data_type>; Example: //Rename table name Alter table posts rename to myposts; // Rename column name with data type change Alter table posts change time time1 string;
  • 19. Confidential, Copyright © Quanticate HQL – Alter table demo
  • 20. Confidential, Copyright © Quanticate HQL – How to get records into Apache Hive tables? There are two ways to load the data into Apache Hive tables  Using insert statement Used to load the data from another table using select statement  Using Load statement Used to load the data from a file
  • 21. Confidential, Copyright © Quanticate HQL – Insert records Syntax: Insert into table <tablename> select_statement1 from <another_table>; Example: Insert into table posts select “user1”, “Demo“, “123” from table1
  • 22. Confidential, Copyright © Quanticate HQL – Insert records demo
  • 23. Confidential, Copyright © Quanticate HQL – Load data Syntax: Load data inpath <filepath> [overwrite] into table <tablename> Example: Load data inpath '/user/hue/posts.csv' into table 'posts'
  • 24. Confidential, Copyright © Quanticate HQL –Load data
  • 25. Confidential, Copyright © Quanticate HQL – Update records Syntax: There is no specific syntax for update, but you can insert statement with overwrite option. Example: Insert overwrite table posts select “user1”, “Demo“, “123” from table1 where id = ‘123’
  • 26. Confidential, Copyright © Quanticate HQL – Update records demo
  • 27. Confidential, Copyright © Quanticate HQL – Delete records You can not records from Apache Hive tables!
  • 28. Confidential, Copyright © Quanticate HQL – Delete records demo
  • 29. Confidential, Copyright © Quanticate HQL – Drop table Syntax: drop table <table_name> Example: drop table posts;
  • 30. Confidential, Copyright © Quanticate HQL – Drop table demo
  • 31. Confidential, Copyright © Quanticate Summary  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  Getting started with HQL
  • 32. Confidential, Copyright © Quanticate Q & A
  • 33. Confidential, Copyright © Quanticate For the next session !!  Partitioning  Bucketing  Union  Sub queries  Joins  Group By  Order By  Aggregations
  • 34. Confidential, Copyright © Quanticate References https://hive.apache.org/ https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/Home https://cwiki.apache.org/confluence/display/Hive/PoweredBy http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  • 35. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand