Based on "HBase, dances on the elephant back" presentation success I have prepared its update for JavaDay 2014 Kyiv. Again, it is about the product which revolutionary changes everything inside Hadoop infrastructure: Apache HBase. But here focus is shifted to integration and more advanced topics keeping presentation yet understandable for technology newcomers.
3. FIRST EVER
DATA OS
10.000 nodes computer...
Recent technology changes are focused on
higher scale. Better resource usage and
control, lower MTTR, higher security,
redundancy, fault tolerance.
www.vitech.com.ua 3
4. ● Hadoop is open source
framework for big
data. Both distributed
storage and
processing.
● Hadoop is reliable and
fault tolerant with no
rely on hardware for
these properties.
● Hadoop has unique
horisontal scalability.
Currently — from
single computer up to
thousands of cluster
nodes.
www.vitech.com.ua 4
5. What is HADOOP INDEED?
BIG
DATA BIG
=
+
x MAX
DATA
BIG
DATA
BIG
DATA
BIG
DATA
BIG
DATA
BIG
DATA
BIG
DATA
BIG
DATA
BIG
DATA
www.vitech.com.ua Why hadoop? 5
6. HBase
motivation
Beware...
● Hadoop is designed for
throughput, not for latency.
● HDFS blocks are expected to be
large. There is issue with lot of
small files.
● Write once, read many times
ideology.
● MapReduce is not so flexible so
any database built on top of it.
● How about realtime?
www.vitech.com.ua 6
7. LATENCY, SPEED and all
Hadoop properties.
HBase
motivation
BUT WE
OFTEN
NEED...
www.vitech.com.ua 7
8. HBASE as is
Architecture, data
model, features.
Something
special
But we are
always special,
don't you?
INTEGRATION
It's all not only
about Hbase.
www.vitech.com.ua Agenda 8
9. MANIFEST
● Open source Google BigTable implementation with
appropriate infrastructure place.
● Limited but strict ACID guarantees.
● Realtime, low latency, linear scalability.
● Distributed, reliable and fault tolerant.
● Natural integration with Hadoop infrastructure.
● Really good for massive scans.
● Server side user operations.
● No any SQL.
● Secondary indexing is pretty complex.
www.vitech.com.ua 9
10. High layer applications
Resource management
YARN
Distributed file system
www.vitech.com.ua 10
12. HBase: the story begins with ...
2006 2007 2008 2009 2010 … 2014 … future
November 2010, Facebook
elected HBase to implement
new messaging platform
2010, HBase becomes
Apache top-level project
HBase 0.92 is considered
production ready release
2006, Google BigTable
paper is published. HBase
development starts
2007, First code is
released as part of
Hadoop 0.15. Focus is on
offline, crawl data storage
2008, HBase goes OLTP (online transaction
processing). 0.20 is first performance release
www.vitech.com.ua 12
13. Loose data structure
HBase: it is NoSQL
Book: title, author,
pages, price
Ball: color, size,
material, price
Toy car: color, type,
radio control, price
Kind Price Title Author Pages Color Size Material Type Radio
control
Book + + + +
Ball + + + +
Toy car + + + +
Book #1: Kind, Price, Title, Author, Pages
Book #2: Kind, Price, Title, Author
Ball #1: Kind, Price, Color, Size, Material
Toy car #1: Price, Color, Type +Radio control
● Data looks like tables with large number of columns.
● Columns set can vary from row to row.
● No table modification is needed to add column to row.
www.vitech.com.ua 13
14. Logical data model
Table
Region
Region
Every row
consists of
columns.
Row
Key Family #1 Family #2 ...
Column Column ... ...
...
...
...
Data is
placed in
tables.
Tables are split
into regions
based on row
key ranges.
Columns are
grouped into
Every table row families.
is identified by
unique row key.
www.vitech.com.ua 14
15. Table
Region
Real data
model
● Data is stored in HFile.
● Families are stored on
disk in separate files.
● Row keys are
indexed in memory.
● Column includes key,
qualifier, value and timestamp.
● No column limit.
● Storage is block based.
Region
Row
Key Family #1 Family #2 ...
Column Column ... ...
...
HFile: family #1
Row key Column Value TS
... ... ... ...
... ... ... ...
● Delete is just another
marker record.
● Periodic compaction is
required.
HFile: family #2
Row key Column Value TS
... ... ... ...
... ... ... ...
www.vitech.com.ua 15
16. Hbase: infrastructure view
Zookeeper coordinates
distributed elements and
is primary contact point
for client.
META
DATA
Master server keeps metadata and
manages data distribution over
Region servers.
Zookeeper Master
RS RS RS RS
Client
Region servers
manage data
table regions.
Clients directly
communicate
with region
server for data.
Clients locate master
through ZooKeeper
then needed regions
through master.
www.vitech.com.ua 16
17. Zookeeper
coordinates
distributed
elements and is
primary contact
point for client.
META
DATA
RS RS
DN DN
Rack
RS RS
DN DN
Rack
RS RS
DN DN
Rack
NameNode
www.vitech.com.ua 17
Client
Master
Zookeeper
Master server keeps
metadata and manages data
distribution over Region
servers.
Region servers
manage data
table regions.
Actual data
storage service
including
replication is on
HDFS data
nodes.
Clients directly
communicate
with region
server for data.
Clients locate
master through
ZooKeeper then
needed regions
through master.
Together with HDFS
18. KEY
OPERATIONS
GET
PUT
SCAN
DELETE
No difference if we add data
or replace existing one.
Get data eleent by key: rows,
columns.
Massive GET with key range.
DELETE single object
BATCH OPERATIONS ARE POSSIBLE
www.vitech.com.ua 18
20. ● Actual write is to region server. Master is not involved.
● All requests are coming to WAL (write ahead log) to
provide recovery.
● Region server keeps MemStore as temporary storage.
● Only when needed write is flushed to disk (into HFile).
www.vitech.com.ua 20
21. WHY CRUD: IS Put IT and FAST?
Delete
Memory is intensively used.
Writes are logged and cached in
memory. Reads are just cached.
● Lower layer is WRITE ONLY filesystem
(HDFS). So both PUT and DELETE path
is identical. DELETE is just another
marker added.
● Both PUT and DELETE requests are per
row key. No row key range for DELETE.
● Actual DELETE is performed during
compactions.
www.vitech.com.ua 21
22. CRUD: Get and Scan
Get operation is
implemented through Scan.
● Get operation is simple data
request by row key.
● Scan operation is performed
based on row key range which
could involve several table
regions.
● Both Get and Scan can include client filters —
expressions that are processed on server side
and can seriously limit results so traffic.
● Both Scan and Get operations can be performed
on several column families.
www.vitech.com.ua 22
23. SERVER SIDE TRICKS
● Coprocessors is feature that allows to extend
HBase without product code modification.
● RegionObserver can attach code to operations
on region level.
● Similar functionality exists for Master.
● Endpoints is the way to provide functionality
equal to stored procedure.
● Together coprocessor infrastructure can bring
realtime distributed processing framework
(lightweight MapReduce).
www.vitech.com.ua 23
24. Region observers
can be stacked.
Region observer
works like hook on
region operations. Region observer Region observer Region observer Region observer
Request
Coprocessors:
Region observer
Client
Table
Region observer Region observer
Result
Region Region
RegionServer RegionServer
www.vitech.com.ua 24
25. Coprocessors:
Endpoints
Direct communication
via separate protocol.
Endpoint Endpoint
Region Region
RegionServer RegionServer
Request (RPC)
Response
Client Table
Your commands
can have effect on
table regions.
www.vitech.com.ua 25
26. WHY SERVER SIDE
IS BLACK MAGIC?
YOU ARE MODIFYING REGION
SERVER OR MASTER CODE
ANY MISTAKE
LEADS TO HELL
JAVA CLASS LOADER REQUIRES
SERVICE RESTART ON RELOAD
ANY MODIFICATION
LEADS TO HELL
www.vitech.com.ua 26
28. MAP+REDUCE + HBASE
HMaster RegionServer Ofen single node
JobTracker NameNode TaskTracker
www.vitech.com.ua 28
META
DATA
Integration with MapReduce
● HBase provides number of classes for native
MapReduce integration. Main point is data locality.
● TableInputFormat allows massive MapReduce table
processing (maps table with one region per mapper).
● HBase classes like Result (Get / Scan result) or Put (Put
request) can be passed between MapReduce job stages.
● Not so much difference between MR1 and YARN here.
DataNode
so data is local
29. Bulk load
MAP REDUCE CLASSICS
● Hbase table data is mapped. One mapper per table
region so mapped data are processed locally.
● After local (!) mapping data is reduced. This can be
non-local processing but it is much more light.
● So we receive almost 100% distributed local data
processing around the Hadoop cluster.
HBase
table
Mappers Reducers
Mapper
Mapper
Mapper
Table region
Table region
Table region
Reducer
www.vitech.com.ua 29
30. BULK LOAD
Bulk load
● There is ability to load data in table MUCH FASTER.
● Hbase internal storage files (HFile) are prepared.
● It is preferable to generate one HFile per table
region. MapReduce can be used.
● Prepared HFile is merged with table storage on
maximum speed.
Mappers Reducers
Data
importers
HFile generator
HFile generator
HFile generator
Table region
Table region
Table region
HFile
HFile
HFile
www.vitech.com.ua 30
31. Table
SECONDARY INDEX
THROUGH COPROCESSORS
Region
Region
Client Index
table
Put / Delete observer Index update
Scan with filter
Index search
● HBase has no secondary indexing out-of-the-box.
● Coprocessor (RegionObserver) is used to track Put
and Delete operations and update index table.
● Scan operations with index column filter are
intercepted and processed based on index table
content.
www.vitech.com.ua 31
32. INDEX ALTERNATIVE: SOLR
INDEX UPDATE
INDEX QUERY
Search responses
Index update request is
analyzed, tokenized,
transformed... and the
same is for queries.
● SOLR indexes documents. What is stored into
SOLR index is not what you index. SOLR is NOT A
STORAGE, ONLY INDEX
● But it can index ANYTHING. Search result is
document ID
www.vitech.com.ua 32
33. ● HBase handles user data change online
requests.
● NGData Lily indexer handles stream of changes
and transforms them into SOLR index change
requests.
● Indexes are built on SOLR so HBase data are
searchable.
www.vitech.com.ua 33
34. HBase: Data and search integration
Replication can be
set up to column
HBase regions
HDFS
Data update
www.vitech.com.ua 34
Client
User just puts (or
deletes) data.
Search responses
Lily HBase
NRT indexer
family level.
REPLICATION
HBase
cluster
Translates data
changes into SOLR
index updates.
SOLR cloud
Search requests (HTTP)
Apache
Zookeeper does
all coordination
Finally provides
search
Serves low level
file system.