Contenu connexe Similaire à Hbase mhug 2015 (20) Hbase mhug 20151. Page 1 © Hortonworks Inc. 2014
Online Data with HBase
2. Page 2 © Hortonworks Inc. 2014
What is HBase?
• HBase is a NoSQL database that stores its data in HDFS
• Inherits the characteristics of HDFS:
– Distributed
– Linearly scalable
– Reliable
– Big Data!
• Column-oriented
• Use HBase when you need random, realtime (as opposed
to batch) read/write access to your Big Data
3. Page 3 © Hortonworks Inc. 2014
HBase is not…
• Not a relational database
• Not a standalone solution – it relies on HDFS
• Not a replacement for a traditional RDBMS
• Not optimized for classic, traditional applications
• Not ACID compliant
4. Page 4 © Hortonworks Inc. 2014
HBase Use Cases
Page 4
Flexible
Schema
Huge
Data
Volume
High
Read
Rate
High
Write
Rate
Machine-‐Generated
Data
Distributed
Messaging
Real-‐Time
Analy@cs
Object
Store
User
Profile
Management
5. Page 5 © Hortonworks Inc. 2014
Other Example HBase Use Cases
• Facebook messaging and counts
• SalesForce Dashboards
• Time series data (OpenTSDB)
• Oil companies streaming Sensor Data for RealTime Comparisons
• Exposing Machine Learning models (like risk sets)
• Enable Storm to change models and access 1000’s without going offline
• High Performance Blob Storage
• Imagine Shack
• Geospatial indexing
• Farm Field Analysis (Down to the sqft)
• Indexing the Internet
• Graph Problems
• Ticker Plants
• Streaming stock ticker data to 10,000’s of end clients
Page 5
6. Page 6 © Hortonworks Inc. 2014
What data semantics does HBase provide?
GET, PUT, DELETE key-value operations
SCAN for queries
INCREMENT server-side atomic operations
Row-level write atomicity
MapReduce integration
Page 6
7. Page 7 © Hortonworks Inc. 2014
Logical Architecture
Distributed, persistent partitions of a BigTable
a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
Legend:
- A single table is partitioned into Regions of roughly equal size.
- Regions are assigned to Region Servers across the cluster.
- Region Servers host roughly the same number of regions.
Page 7
8. Page 8 © Hortonworks Inc. 2014
Page 8
Physical Architecture
Distribution and Data Path
...
Zoo
Keeper
Zoo
Keeper
Zoo
Keeper
HBase
Client
JavaApp
HBase
Client
JavaApp
HBase
Client
HBase Shell
HBase
Client
REST/Thrift
Gateway
HBase
Client
JavaApp
HBase
Client
JavaApp
Region
Server
Data
Node
Region
Server
Data
Node
...
Region
Server
Data
Node
Region
Server
Data
Node
HBase
Master
Name
Node
Legend:
- An HBase RegionServer is collocated with an HDFS DataNode.
- HBase clients communicate directly with Region Servers for sending and receiving data.
- HMaster manages Region assignment and handles DDL operations.
- Online configuration state is maintained in ZooKeeper.
- HMaster and ZooKeeper are NOT involved in data path.
9. Page 9 © Hortonworks Inc. 2014
Page 9
Logical Data Model
A sparse, multi-dimensional, sorted map
Legend:
- Rows are sorted by rowkey.
- Within a row, values are located by column family and qualifier.
- Values also carry a timestamp; there can me multiple versions of a value.
- Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualifier
timestamp value
10. Page 10 © Hortonworks Inc. 2014
Apache Phoenix
A SQL Skin for HBase
• Provides a SQL interface for managing data in HBase.
• Large subset of SQL:1999 mandatory featureset.
• Create tables, insert and update data and perform low-latency point lookups through JDBC.
• Phoenix JDBC driver easily embeddable in any app that supports JDBC.
Phoenix Makes HBase Better
• Oriented toward online / semi-transactional apps.
• If HBase is a good fit for your app, Phoenix makes it even better.
• Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.
11. Page 11 © Hortonworks Inc. 2014
Apache Phoenix: Current Capabilities
Feature Supported?
Common SQL Datatypes Yes
Inserts and Updates Yes
SELECT, DISTINCT, GROUP BY, HAVING Yes
NOT NULL and Primary Key constrants Yes
Inner and Outer JOINs Yes
Views Yes
Subqueries Yes
Robust Secondary Indexes Yes
12. Page 12 © Hortonworks Inc. 2014
Phoenix Provides Familiar SQL Constructs
Compare: Phoenix versus Native API
Code Notes
//
HBase
Native
API.
HBaseAdmin
hbase
=
new
HBaseAdmin(conf);
HTableDescriptor
desc
=
new
HTableDescriptor("us_population");
HColumnDescriptor
state
=
new
HColumnDescriptor("state".getBytes());
HColumnDescriptor
city
=
new
HColumnDescriptor("city".getBytes());
HColumnDescriptor
population
=
new
HColumnDescriptor("population".getBytes());
desc.addFamily(state);
desc.addFamily(city);
desc.addFamily(population);
hbase.createTable(desc);
//
Phoenix
DDL.
CREATE
TABLE
us_population
(
state
CHAR(2)
NOT
NULL,
city
VARCHAR
NOT
NULL,
population
BIGINT
CONSTRAINT
my_pk
PRIMARY
KEY
(state,
city));
• Familiar SQL syntax.
• Provides additional constraint
checking.