Contenu connexe Similaire à Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive (20) Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive1. Page 1 © Hortonworks Inc. 2014
Discover HDP 2.1
Interactive SQL Query in Hadoop with Apache Hive
Hortonworks. We do Hadoop.
2. Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Carter Shanklin
Hortonworks Director of Product
Management & PM for Apache Hive in
Hortonworks Data Platform
Owen O’Malley
Hortonworks Co-Founder, Engineer &
Committer for Apache Hive project
3. Page 3 © Hortonworks Inc. 2014
OPERATIONS
TOOLS
Provision,
Manage &
Monitor
DEV
&
DATA
TOOLS
Build &
Test
A Modern Data ArchitectureAPPLICATIONS
DATA
SYSTEM
REPOSITORIES
RDBMS
EDW
MPP
Business
Analy<cs
Custom
Applica<ons
Packaged
Applica<ons
Governance
&Integration
ENTERPRISE HADOOP
Security
Operations
Data Access
Data Management
SOURCES
OLTP,
ERP,
CRM
Systems
Documents,
Emails
Web
Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
GeolocaCon
Data
4. Page 4 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1
Hortonworks Data Platform
HDP 2.1
Hortonworks Data Platform
Provision,
Manage
&
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data
Workflow,
Lifecycle
&
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN
:
Data
Opera<ng
System
DATA
MANAGEMENT
DATA
ACCESS
GOVERNANCE
&
INTEGRATION
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-‐Memory
AnalyCcs,
ISV
engines
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
HDFS
(Hadoop
Distributed
File
System)
Batch
Map
Reduce
SECURITY
Authen<ca<on
Authoriza<on
Accoun<ng
Data
Protec<on
Storage:
HDFS
Resources:
YARN
Access:
Hive,
…
Pipeline:
Falcon
Cluster:
Knox
5. Page 5 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1
Hortonworks Data Platform
HDP 2.1
Hortonworks Data Platform
Provision,
Manage
&
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data
Workflow,
Lifecycle
&
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
DATA
MANAGEMENT
GOVERNANCE
&
INTEGRATION
OPERATIONS
Script
Pig
Search
Solr
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-‐Memory
AnalyCcs,
ISV
engines
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
HDFS
(Hadoop
Distributed
File
System)
Batch
Map
Reduce
SECURITY
Authen<ca<on
Authoriza<on
Accoun<ng
Data
Protec<on
Storage:
HDFS
Resources:
YARN
Access:
Hive,
…
Pipeline:
Falcon
Cluster:
Knox
YARN
:
Data
Opera<ng
System
DATA
ACCESS
SQL
Hive/Tez,
HCatalog
6. Page 6 © Hortonworks Inc. 2014
Apache Hive After the Stinger Initiative:
Speed, Scale & SQL Compliance
7. Page 7 © Hortonworks Inc. 2014
Hive: SQL Analytics For Any Data Size
Sensor
Mobile
Weblog
OperaConal
/
MPP
Store
and
Query
all
Data
in
Hive
Use
Exis<ng
SQL
Tools
and
Exis<ng
SQL
Processes
SQL
Queries
8. Page 8 © Hortonworks Inc. 2014
The Stinger Initiative: Complete
• Community initiative around Hive
• Enables Hive to support interactive workloads
• Enhances Hive’s standard SQL interface for Hadoop
• Improves existing tools & preserves investments
Query
Processing
Vectorized
Query
Execution
Engine
Tez
= 100X+ +
File
Format
ORCFile
9. Page 9 © Hortonworks Inc. 2014
New in Hive HDP 2.1: Speed
New Features for Speed
Interactive query using Hive on Tez
Vectorized query execution
Cost-based optimizer
10. Page 10 © Hortonworks Inc. 2014
New in HDP 2.1: More Than 10 New SQL Features
New SQL Features
Subquery for IN / NOT IN
Support for EXISTS and NOT EXISTS
Common table expressions (CTEs)
Support for CHAR datatype
Scale and precision support for DECIMAL datatype
JOIN conditions in the WHERE clause
Cancel jobs via ODBC / JDBC
Support for Unicode column names
Permanent functions
Stream data into Hive from Flume (Experimental feature)
11. Page 11 © Hortonworks Inc. 2014
Hive’s Journey to SQL Compliance
Evolu<on
of
SQL
Compliance
in
Hive
SQL
Datatypes
SQL
SemanCcs
INT/TINYINT/SMALLINT/BIGINT
SELECT,
INSERT
FLOAT/DOUBLE
GROUP
BY,
ORDER
BY,
HAVING
BOOLEAN
JOIN
on
explicit
join
key
ARRAY,
MAP,
STRUCT,
UNION
Inner,
outer,
cross
and
semi
joins
STRING
Sub-‐queries
in
the
FROM
clause
BINARY
ROLLUP
and
CUBE
TIMESTAMP
UNION
DECIMAL
Standard
aggregaCons
(sum,
avg,
etc.)
DATE
Custom
Java
UDFs
VARCHAR
Windowing
funcCons
(OVER,
RANK,
etc.)
CHAR
Advanced
UDFs
(ngram,
XPath,
URL)
Interval
Types
Sub-‐queries
for
IN/NOT
IN,
HAVING
JOINs
in
WHERE
Clause
Common
Table
Expressions
(WITH
Clause)
INSERT
/
UPDATE
/
DELETE
Legend
Available
Roadmap
Hive
11
Hive
12
Hive
13
12. Page 12 © Hortonworks Inc. 2014
New in HDP 2.1: Other Improvements
Other New Hive Features
SQL standard authorization
Hive job visualizer in Ambari
PAM authentication support
SSL encryption support in HiveServer2
Dynamic partition scalability
14. Page 14 © Hortonworks Inc. 2014
FoodMart Dataset
• FoodMart Dataset, replicated 275 times (~ 10GB data)
• Queries run locally on an HDP 2.1 Sandbox.
• Queries to do some customer analytics.
sales_fact_1997 customer
Other
Dimension
Tables
time_by_day
15. Page 15 © Hortonworks Inc. 2014
Learn More About Hive & The Stinger Initiative
Hortonworks.com/labs/stinger/
Register for the remaining 5
Discover HDP 2.1 Webinars
Hortonworks.com/
webinars
Next Webinar:
Apache Falcon for
Data Governance in Hadoop
Wednesday, May 21, 10am
Pacific