Apache Hive

 Hive is a data warehouse application which is used for
summarizing, querying and anzalyzing large amount of
data stored on HDFS.

 Used to run batch queries on structured data which is
similar to SQL.
 It’s a non-procedural language.
 It resides on top of Hadoop to summarize Big Data, and
makes querying and analyzing easy.
 Used by Data analyst.
 The language used here is called HQL.

 External Table
This type of table is used when you want to store the data
file in the HDFS even after dropping the table.
When you drop an external table, it only drops the meta
data. That means hive is ignorant of that data now. It does
not touch the data itself.

 Syntax:
CREATE EXTERNAL TABLE Table_Name
(Column_Name Datatype)
[ROW FORMAT row_format]
[STORED AS file_format];

 Internal Table
This type of table is used when you don’t want to store the
data file in the HDFS after dropping the table.
When you drop an internal table, it drops the data, and it
also drops the metadata.

 Syntax:
CREATE TABLE Table_Name(Column_Name Datatype)
[ROW FORMAT row_format]
[STORED AS file_format];

 Partitioned Table:
As in Big Data Concept we deal with large dataset, It takes
huge amount of time to process and query those datasets.
To process query faster Hive organizes tables into
partitions. It is a way of dividing a table into related parts
based on the values of partitioned columns such as date,
city, and department. Using partition, it is easy to query a
portion of the data.

 Example:
CREATE TABLE Client(id INT,Name STRING, City STRING)
PARTITIONED BY(country STRING)
STORED AS TEXTFILE;

 Bucket Table:
With partitioning, there is a possibility that you can create
multiple small partitions based on column values. If you go
for bucketing, you are restricting number of buckets to
store the data. This number is defined during table
creation.

 Example:
CREATE TABLE Client(id INT, Name STRING, City STRING)
PARTITIONED BY(country STRING)
CLUSTERED BY (City) INTO 32 BUCKETS
STORED AS TEXTFILE;

 Create:
Syntax : CREATE TABLE
Table_Name(Column_NameDATATYPE);
 ALTER:
Syntax : ALTER TABLE Table_Name Operations;
We can add partition to a table which is already created
by using ALTER Operation.
Syntax : ALTER TABLE Table_Name ADD PARTITION(
Col_Name DATATYPE);

 DROP :
Syntax : DROP TABLE Table_Name;
 SELECT Query:
Syntax : SELECT [ALL | DISTINCT] * FROM
Table_Name WHERE where_condition ;
We Can limit the number of rows produced as result of
SELECT Query statement by Using LIMIT Option.
Syntax : SELECT [ALL | DISTINCT] * FROM
Table_Name WHERE where_condition
LIMIT 10;

 GROUP BY:
GROUP BY clause is used to group the result based on
the Column named in the GROUP BY clause.
Syntax : SELECT Col_list FROM Table_Name
WHERE where_condition
GROUP BY col_list ;

 ORDER BY:
ORDER BY clause is used to organize the result in
ascending or descending order based on the Column
named in the ORDER BY clause. Ascending order
segregation is by default, to organize the result in
descending order we have to use DESC clause.
Syntax : SELECT Col_list FROM Table_Name
WHERE where_condition
GROUP BY col_list
ORDER BY col_name DESC ;

 JOIN:
JOIN clause is used to get result from two or more table
which have a common column.
Syntax: SELECT a.col_name,b.col_name
FROM Table1_Name a
JOIN Table2_Name b
ON a.col_Name=b.col_Name;

Apache Hive

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Apache Hive

Similaire à Apache Hive (20)

Plus de Abhishek Gautam

Plus de Abhishek Gautam (7)

Dernier

Dernier (20)

Apache Hive