SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Prepared by
Vetri.V
What is Hive?
 Hive is a data warehouse system for Hadoop that facilitates easy data
summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop
compatible file systems.
 Hive provides a mechanism to project structure onto this data and query the data
using a SQL-like language called HiveQL.
 At the same time this language also allows traditional map/reduce programmers to
plug in their custom mappers and reducers when it is inconvenient or inefficient to
express this logic in HiveQL.
 Hive is an abstraction on top of MapReduce it allows users to query data in the
Hadoop cluster without knowing Java or MapReduce.It Uses the HiveQLlanguage,
Very similar to SQL.
Some Points about Hive:-
 Hive was originally developed at Facebook
 Provides a very SQL-like language
 Can be used by people who know SQL
 Enabling Hive requires almost no extra work by the system administrator
 Hive ‘layers’ table definitions on top of data in HDFS
 Hive tables are stored in Hive’s ‘warehouse’ directory in HDFS, By default,
/user/hive/warehouse
 Tables are stored in subdirectories of the warehouse directory
 Actual data is stored in flat files- Control character-delimited text, or SequenceFiles
Hive is Data warehousing tool on top of Hadoop. It same as SQL:
 SQL like Queries
 SHOW TABLES, DESCRIBE, DROPTABLE
Prepared by
Vetri.V
 CREATE TABLE, ALTER TABLE
 SELECT, INSERT
Hive Limitations:
 Not all ‘standard’ SQL is supported
 No support for UPDATE or DELETE
 No support for INSERTing single rows
 Relatively limited number of built-in functions
 No datatypes for date or time - Use the STRING datatypeinstead.In new version date
or time datatype will support.
Hive Architecture
Metastore:
 Stores system catalog.
Driver:
 Manages life cycle of HiveQL query as it moves thru’ HIVE; also manages session
handle and session statistics
Query compiler:
 Compiles HiveQL into a directed acyclic graph of map/reduce tasks
Execution engines:
 The component executes the tasks in proper dependency order; interacts with
Hadoop
Hive Server:
Prepared by
Vetri.V
 Provides Thrift interface and JDBC/ODBC for integrating other applications.
Client components:
CLI, web interface, jdbc/odbc interface Extensibility interface include Server , User Defined
Functions and User Defined Aggregate Function.
Hive Installations and Metastore with Mysql:
 Store Hive Metadata into RDBMS.
 The Metastore stores all the information about the tables, their partitions, the
schemas, the columns and their types, the table locations etc.
 This information can be queried or modified using a thrift interface and as a result
it can be called from clients in different programming languages.
 As this information needs to be served fast to the compiler, they have chosen to
store this information on a traditional RDBMS.
 By default, hive use an embedded Derby database to store metadata information.
The metastore is the "glue" between Hive and HDFS.
 It tells Hive where your data files live in HDFS, what type of data they contain, what
tables they belong to, etc.
 The Metastore is an application that runs on an RDBMS and uses an open source
ORM layer called DataNucleus, to convert object representations into a relational
schema and vice versa.
 They chose this approach as opposed to storing this information in hdfs as they
need the Metastore to be very low latency. The DataNucleus layer allows them to
plugin many different RDBMS technologies.
 Facebook uses MySQL to store metadata information.
Store Hive metadata into MySQL:
 write on /etc/profile the following:
export JAVA_HOME=/usr/java/jdk1.7.0_17
export PATH=$PATH:/usr/java/jdk1.7.0_17/bin
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:/opt/hadoop/bin
export HIVE_HOME=/opt/hive
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
 Go to hadoop home and perform the following tasks:
 Commands to perform this setup are as follows:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
Prepared by
Vetri.V
We need to perform following steps to setup a metastore in a MySQL server.
1. Install mysql-server.
sudo apt-get install mysql-server
(or)
yum install mysql-server <---centOS------>
2. Use su command. You will see password promt. Type your root
userpassword.
3. If your password correct, you'll see like this; root@yourdesktopname#.
4. Now, Start mysql server.
/etc/init.d/mysql start.
5. When mysql server started, type mysql -u root mysql.
6. Next, I'm going to create a new MySQL user for hadoop/hive.
mysql> CREATE USER 'hadoop'@'localhost' IDENTIFIED BY 'hadoop';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost' WITH
GRANT OPTION;
mysql> exit;
7. su – userName // In my case userName is hadoop. So, switch to that user.
8. mysql -h localhost -u userName -p.
9. Now, we need to change the hive configuration so it can use MySQL:
10. Go to hive/conf directory, then open hive-default.xml and perform
following changes.
<!-- In local metastore setup, each Hive Client will open a connection to
the datastore and make SQL queries against it.-->
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
Prepared by
Vetri.V
<name>javax.jdo.option.ConnectionUserName</name>
<value>Username</value><!-- In my case UserName is hadoop-->
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>UserPassword</value><!-- In my case password is hadoop-->
</property>
(OR)
create hive-site.xml and paste the followings:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- WARNING!!! This file is provided for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<!-- In local metastore setup, each Hive Client will open a connection to
the datastore and make SQL queries against it.-->
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/eattributes?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value><!-- In my case UserName is hadoop-->
</property>
<property>
Prepared by
Vetri.V
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value><!-- In my case password is hadoop-->
</property>
</configuration>
11. Hive needs to have the MySQL jdbc drivers, so we need to download and
12. If all settings are done correctly, we can do this:-
copy it to hive/lib folder.
impetus@ubuntu:~$ cd HIVE_HOME
impetus@ubuntu:~/hive$ bin/hive
Hive history file=/tmp/ankit
/hive_job_log_ankit_201102211937_456962737.txt
hive>
hive>create table temp(info INT);
OK
Time taken: 4.745 seconds
hive> show tables;
OK
temp
Time taken: 0.137 seconds
hive>
13. Again, start MySQL server.
vetri@ubuntu:~$mysql -h localhost -u hadoop -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 84
Server version: 5.1.41-3ubuntu12.8 (Ubuntu)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql>
14. Use following command to view all existing databases.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
Prepared by
Vetri.V
| hive |
| mysql |
+--------------------+
3 rows in set (0.05 sec)
mysql> use hive; //Select database.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
15. Use following command to view all exiting tables.
mysql> show tables;
+--------------------+
| Tables_in_hive |
+--------------------+
| BUCKETING_COLS |
| COLUMNS |
| DBS |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SORT_COLS |
| TABLE_PARAMS |
| TBLS |
+--------------------+
16 rows in set (0.00 sec)
mysql>
16. Use following command to view the metadata information of hive table
columns.
Prepared by
Vetri.V
mysql> select * from COLUMNS;
+-------+---------+-------------+-----------+-------------+
|SD_ID|COMMENT|COLUMN_NAME|TYPE_NAME|INTEGER_IDX|
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | info | int | 0 |
+-------+---------+-------------+-----------+-------------+
1 rows in set (0.00 sec)
mysql>
//Where info is the column of temp table.
17. Similarly, you can View the information of partiotions, Bucketing,
tables, etc.
Hive Applications, Components, Model and Layout
Hive Applications:
 Log processing
 Text mining
 Document indexing
 Customer-facing business intelligence (e.g., Google Analytics)
 Predictive modeling, hypothesis testing
Hive Components:
 Shell: allows interactive queries like MySQL shell connected to database – Also
supports web and JDBC clients
 Driver: session handles, fetch, execute
 Compiler: parse, plan, optimize
 sExecution engine: DAG of stages (M/R,HDFS, or metadata)
 Metastore: schema, location in HDFS,SerDe
Data Model:
 Tables – Typed columns (int, float, string, date,boolean) – Also, list: map (for JSON-
like data)
 Partitions – e.g., to range-partition tables by date
 Buckets – Hash partitions within ranges (useful for sampling, join optimization)
Metastore:
 Database: namespace containing a set of tables
 Holds table definitions (column types,physical layout)
 Partition data
Prepared by
Vetri.V
 Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other
relational databases
Physical Layout:
 Warehouse directory in HDFS – e.g., /home/hive/warehouse
 Tables stored in subdirectories of warehouse – Partitions, buckets form
subdirectories of tables
 Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With
custom Server, can use arbitrary format
Hive Command Line Interface:
To launch the Hive shell, start a terminal and run $ hive
Note: example is the table name for all qurey
hive>
Hive : Creating Tables
hive> CREATE TABLE example (id INT, name STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY 't' STORED AS TEXTFILE;
hive> DESCRIBE example;
hive> SHOW TABLES;
Hive : Loading Data Into Hive
Data is loaded into Hive with the LOAD DATA INPATH statement – Assumes that the data is
already in HDFS
hive> LOAD DATA INPATH “file_txtdata.txt” INTO TABLE example;
If the data is on the local filesystem, use LOAD DATA LOCAL INPATH – Automatically loads
it into HDFS
hive> LOAD DATA LOCAL INPATH "file_txtdata.txt" INTO TABLE example;
Hive : SELECT Queries
Hive supports most familiar SELECT syntax
hive> SELECT * FROM example LIMIT 10;
hive> SELECT * FROM example WHERE id > 100 ORDER BY name ASC LIMIT 10;
Joining Tables
SELECT e.name, e.dep, s.id FROM example e JOIN sample s ON (e.dep = s.dep) WHERE e.id
>= 20;
Creating User-Defined Functions
INSERT OVERWRITE TABLE u_data_new
SELECT TRANSFORM (userid, movieid, rating, unixtime) USING 'python
weekday_mapper.py' AS (userid, movieid, rating, weekday) FROM u_data;
Prepared by
Vetri.V
Join Query: sample
1.Create table
CREATE TABLE example(ID
int,SUBJECTstring,PRODUCTstring,PERIODint,START_TIMEint,OPERATION string)ROW
FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE;
2.Load data (save the file in related folder)
hive> LOAD DATA LOCAL INPATH "file_txtdata.txt" INTO TABLE example;
3.Join Query
select A.*
from example A
join (
select id, max(start_time) as start_time
from example B
where start_time< 25
group by id ) MAXSP
ON A.id=MAXSP.id and A.start_time = MAXSP.start_time;
Using NOT IN / IN hive query
SELECT * FROM example WHERE NOT array_contains(array(7,6,5,4,2,12), id)
---THANK YOU---

Contenu connexe

Tendances (18)

02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
01 hbase
01 hbase01 hbase
01 hbase
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Working with Databases and MySQL
Working with Databases and MySQLWorking with Databases and MySQL
Working with Databases and MySQL
 
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
lab56_db
lab56_dblab56_db
lab56_db
 
Postgresql
PostgresqlPostgresql
Postgresql
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
MYSQL - PHP Database Connectivity
MYSQL - PHP Database ConnectivityMYSQL - PHP Database Connectivity
MYSQL - PHP Database Connectivity
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
MYSQL-Database
MYSQL-DatabaseMYSQL-Database
MYSQL-Database
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Mysql
MysqlMysql
Mysql
 
Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
MySQL lecture
MySQL lectureMySQL lecture
MySQL lecture
 

Similaire à Hive

Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystemmashoodsyed66
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperVinay Kumar
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 

Similaire à Hive (20)

Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
6.hive
6.hive6.hive
6.hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
מיכאל
מיכאלמיכאל
מיכאל
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Unit 5
Unit  5Unit  5
Unit 5
 
hadoop-spark.ppt
hadoop-spark.ppthadoop-spark.ppt
hadoop-spark.ppt
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 

Dernier

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Dernier (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Hive

  • 1. Prepared by Vetri.V What is Hive?  Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.  Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.  At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.  Hive is an abstraction on top of MapReduce it allows users to query data in the Hadoop cluster without knowing Java or MapReduce.It Uses the HiveQLlanguage, Very similar to SQL. Some Points about Hive:-  Hive was originally developed at Facebook  Provides a very SQL-like language  Can be used by people who know SQL  Enabling Hive requires almost no extra work by the system administrator  Hive ‘layers’ table definitions on top of data in HDFS  Hive tables are stored in Hive’s ‘warehouse’ directory in HDFS, By default, /user/hive/warehouse  Tables are stored in subdirectories of the warehouse directory  Actual data is stored in flat files- Control character-delimited text, or SequenceFiles Hive is Data warehousing tool on top of Hadoop. It same as SQL:  SQL like Queries  SHOW TABLES, DESCRIBE, DROPTABLE
  • 2. Prepared by Vetri.V  CREATE TABLE, ALTER TABLE  SELECT, INSERT Hive Limitations:  Not all ‘standard’ SQL is supported  No support for UPDATE or DELETE  No support for INSERTing single rows  Relatively limited number of built-in functions  No datatypes for date or time - Use the STRING datatypeinstead.In new version date or time datatype will support. Hive Architecture Metastore:  Stores system catalog. Driver:  Manages life cycle of HiveQL query as it moves thru’ HIVE; also manages session handle and session statistics Query compiler:  Compiles HiveQL into a directed acyclic graph of map/reduce tasks Execution engines:  The component executes the tasks in proper dependency order; interacts with Hadoop Hive Server:
  • 3. Prepared by Vetri.V  Provides Thrift interface and JDBC/ODBC for integrating other applications. Client components: CLI, web interface, jdbc/odbc interface Extensibility interface include Server , User Defined Functions and User Defined Aggregate Function. Hive Installations and Metastore with Mysql:  Store Hive Metadata into RDBMS.  The Metastore stores all the information about the tables, their partitions, the schemas, the columns and their types, the table locations etc.  This information can be queried or modified using a thrift interface and as a result it can be called from clients in different programming languages.  As this information needs to be served fast to the compiler, they have chosen to store this information on a traditional RDBMS.  By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS.  It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.  The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa.  They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies.  Facebook uses MySQL to store metadata information. Store Hive metadata into MySQL:  write on /etc/profile the following: export JAVA_HOME=/usr/java/jdk1.7.0_17 export PATH=$PATH:/usr/java/jdk1.7.0_17/bin export HADOOP_HOME=/opt/hadoop export PATH=$PATH:/opt/hadoop/bin export HIVE_HOME=/opt/hive export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH  Go to hadoop home and perform the following tasks:  Commands to perform this setup are as follows: $ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
  • 4. Prepared by Vetri.V We need to perform following steps to setup a metastore in a MySQL server. 1. Install mysql-server. sudo apt-get install mysql-server (or) yum install mysql-server <---centOS------> 2. Use su command. You will see password promt. Type your root userpassword. 3. If your password correct, you'll see like this; root@yourdesktopname#. 4. Now, Start mysql server. /etc/init.d/mysql start. 5. When mysql server started, type mysql -u root mysql. 6. Next, I'm going to create a new MySQL user for hadoop/hive. mysql> CREATE USER 'hadoop'@'localhost' IDENTIFIED BY 'hadoop'; mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost' WITH GRANT OPTION; mysql> exit; 7. su – userName // In my case userName is hadoop. So, switch to that user. 8. mysql -h localhost -u userName -p. 9. Now, we need to change the hive configuration so it can use MySQL: 10. Go to hive/conf directory, then open hive-default.xml and perform following changes. <!-- In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it.--> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value> jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true </value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property>
  • 5. Prepared by Vetri.V <name>javax.jdo.option.ConnectionUserName</name> <value>Username</value><!-- In my case UserName is hadoop--> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>UserPassword</value><!-- In my case password is hadoop--> </property> (OR) create hive-site.xml and paste the followings: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- WARNING!!! This file is provided for documentation purposes ONLY! --> <!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --> <!-- WARNING!!! You must make your changes in hive-site.xml instead. --> <!-- Hive Execution Parameters --> <!-- In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it.--> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/eattributes?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value><!-- In my case UserName is hadoop--> </property> <property>
  • 6. Prepared by Vetri.V <name>javax.jdo.option.ConnectionPassword</name> <value>hadoop</value><!-- In my case password is hadoop--> </property> </configuration> 11. Hive needs to have the MySQL jdbc drivers, so we need to download and 12. If all settings are done correctly, we can do this:- copy it to hive/lib folder. impetus@ubuntu:~$ cd HIVE_HOME impetus@ubuntu:~/hive$ bin/hive Hive history file=/tmp/ankit /hive_job_log_ankit_201102211937_456962737.txt hive> hive>create table temp(info INT); OK Time taken: 4.745 seconds hive> show tables; OK temp Time taken: 0.137 seconds hive> 13. Again, start MySQL server. vetri@ubuntu:~$mysql -h localhost -u hadoop -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 84 Server version: 5.1.41-3ubuntu12.8 (Ubuntu) Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> 14. Use following command to view all existing databases. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema |
  • 7. Prepared by Vetri.V | hive | | mysql | +--------------------+ 3 rows in set (0.05 sec) mysql> use hive; //Select database. Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> 15. Use following command to view all exiting tables. mysql> show tables; +--------------------+ | Tables_in_hive | +--------------------+ | BUCKETING_COLS | | COLUMNS | | DBS | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SORT_COLS | | TABLE_PARAMS | | TBLS | +--------------------+ 16 rows in set (0.00 sec) mysql> 16. Use following command to view the metadata information of hive table columns.
  • 8. Prepared by Vetri.V mysql> select * from COLUMNS; +-------+---------+-------------+-----------+-------------+ |SD_ID|COMMENT|COLUMN_NAME|TYPE_NAME|INTEGER_IDX| +-------+---------+-------------+-----------+-------------+ | 1 | NULL | info | int | 0 | +-------+---------+-------------+-----------+-------------+ 1 rows in set (0.00 sec) mysql> //Where info is the column of temp table. 17. Similarly, you can View the information of partiotions, Bucketing, tables, etc. Hive Applications, Components, Model and Layout Hive Applications:  Log processing  Text mining  Document indexing  Customer-facing business intelligence (e.g., Google Analytics)  Predictive modeling, hypothesis testing Hive Components:  Shell: allows interactive queries like MySQL shell connected to database – Also supports web and JDBC clients  Driver: session handles, fetch, execute  Compiler: parse, plan, optimize  sExecution engine: DAG of stages (M/R,HDFS, or metadata)  Metastore: schema, location in HDFS,SerDe Data Model:  Tables – Typed columns (int, float, string, date,boolean) – Also, list: map (for JSON- like data)  Partitions – e.g., to range-partition tables by date  Buckets – Hash partitions within ranges (useful for sampling, join optimization) Metastore:  Database: namespace containing a set of tables  Holds table definitions (column types,physical layout)  Partition data
  • 9. Prepared by Vetri.V  Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases Physical Layout:  Warehouse directory in HDFS – e.g., /home/hive/warehouse  Tables stored in subdirectories of warehouse – Partitions, buckets form subdirectories of tables  Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With custom Server, can use arbitrary format Hive Command Line Interface: To launch the Hive shell, start a terminal and run $ hive Note: example is the table name for all qurey hive> Hive : Creating Tables hive> CREATE TABLE example (id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE; hive> DESCRIBE example; hive> SHOW TABLES; Hive : Loading Data Into Hive Data is loaded into Hive with the LOAD DATA INPATH statement – Assumes that the data is already in HDFS hive> LOAD DATA INPATH “file_txtdata.txt” INTO TABLE example; If the data is on the local filesystem, use LOAD DATA LOCAL INPATH – Automatically loads it into HDFS hive> LOAD DATA LOCAL INPATH "file_txtdata.txt" INTO TABLE example; Hive : SELECT Queries Hive supports most familiar SELECT syntax hive> SELECT * FROM example LIMIT 10; hive> SELECT * FROM example WHERE id > 100 ORDER BY name ASC LIMIT 10; Joining Tables SELECT e.name, e.dep, s.id FROM example e JOIN sample s ON (e.dep = s.dep) WHERE e.id >= 20; Creating User-Defined Functions INSERT OVERWRITE TABLE u_data_new SELECT TRANSFORM (userid, movieid, rating, unixtime) USING 'python weekday_mapper.py' AS (userid, movieid, rating, weekday) FROM u_data;
  • 10. Prepared by Vetri.V Join Query: sample 1.Create table CREATE TABLE example(ID int,SUBJECTstring,PRODUCTstring,PERIODint,START_TIMEint,OPERATION string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE; 2.Load data (save the file in related folder) hive> LOAD DATA LOCAL INPATH "file_txtdata.txt" INTO TABLE example; 3.Join Query select A.* from example A join ( select id, max(start_time) as start_time from example B where start_time< 25 group by id ) MAXSP ON A.id=MAXSP.id and A.start_time = MAXSP.start_time; Using NOT IN / IN hive query SELECT * FROM example WHERE NOT array_contains(array(7,6,5,4,2,12), id) ---THANK YOU---