1. HBase
status:
0.94,
0.96,
0.98,
and
future
releases
Ma6eo.Bertozzi
|
@Cloudera
Speaker
Name
or
Subhead
Goes
Here
17
February
2014
(HBase
London
Meetup)
!1
2. What
is
HBase?
App
ZooKeeper
!2
MR
HDFS
Apache
HBase
is
an
Open
Source,
distributed,
consistent,
non-‐relaQonal
database
that
provides
low-‐latency,
random
read/write
operaQons
on
top
of
HDFS
3. Open
Source
-‐
Developer
Community
• Vibrant,)Highly)
Ac1ve)community!))
• We’re)Growing!)
!3
What
is
HBase?
4. non-‐relaQonal
•
Key:Column/Value
Interface
• Dynamic
columns
(qualifiers),
“no
schema
required”
• “Fixed”
column
groups
(families)
• table[row:family:column]
=
value
Key
Qualifier
Value
User-A
info
name
Theo
User-A
info
address
3 Abbey Rd - London NW8 9AY
User-B
info
name
Dave
User-C
!4
Family
info
. . .
. . .
5. Distributed
create,
delete
table
opera=ons
HMaster
put,
get,
scan
Client/App
ZooKeeper
•
Region
Server
•
•
•
Server
that
contains
a
set
of
Regions
Handle
reads
and
writes
requests
Region
•
Basic
unit
of
scalability
Region
Server
Region
Server
Region
Server
•
Subset
of
the
table’s
data
Region
Region
Region
•
Region
Region
Region
ConQguous,
sorted
range
of
rows
stored
together
Region
Region
Region
HDFS
!5
What
is
HBase?
•
Master
•
Coordinate
the
cluster
(e.g.
Balancing)
•
Admin
Ops
(create/delete
table,
…)
10. 0.96:
Major
Changes,
Minimal
disturbance
…more
than
a
year
in
the
making
• Lots
of
changes
under
the
hood
• HadoopWritables
replaced
with
protobuf
(RPC,
metadata,
…)
• -‐ROOT-‐
Table
Removed
• /hbase
dir
Layout
Changes
• Minimal
disturbance
to
the
API
• Improved
stability
• Mean
Time
To
Recovery
(MTTR)
•
!10
11. 0.96:
New
Features
Online
Region
Merge
• Online
“Schema”
Change
• Snapshots
• MTTR
• Favored
Nodes
• New
Balancers
• Namespaces
•
h6ps://blogs.apache.org/hbase/entry/hbase_0_96_0_released
!11
12. Namespaces
AbstracQon
for
mulQple
tenants
to
create
and
manage
their
own
tables
within
a
large
HBase
instance.
Separate
ACLs
• Performance
IsolaQon
*
• Region
Server
groups
*
•
RSG$blue$
!12
Namespace(blue(
RSG$green$orange$
Namespace(green(
Namespace(orange(
13. Mean
Time
to
Recovery
(MTTR)
Region'available''
for'RW'
Region'
unavailable'
detect'
split'
hdfs'
hdfs'
replay'
assign'
recovered'
hdfs'
Machine
failures
happen
in
distributed
systems
• Repair
==
split,
assign,
replay
• Distributed
log
replay
with
fast
write
recovery
• Writes
in
HBase
do
not
incur
reads.
• regions
open
for
write,
during
distributed
log
replay
•
!13