HBase is an open source, distributed, column-oriented database modeled after Google's BigTable. It sits atop Hadoop, using HDFS for storage. HBase scales horizontally and supports fast random reads and writes. It is well-suited for large tables and high throughput access. Facebook uses HBase extensively for messaging and other applications due to its high write throughput and low latency reads. Other users include Flurry and Yahoo.
2. What is HBase?
S HBase is an open source, distributed, sorted map modeled
after Google’s Big Table
S NoSQL solution built atop Apache Hadoop
S Top level Apache Project
3. CAP Theorem
S Consistency (all nodes see the same data at the same time)
S Availability (a guarantee that every request receives a
response about whether it was successful or failed)
S Partition tolerance (the system continues to operate despite
arbitrary message loss or failure of part of the system)
According to the theorem, a distributed system can satisfy any
two of these guarantees at the same time, but not all three.
5. Usage Scenarios
S Lots of Data - 100s of Gigs to Petabytes
S High Throughput – 1000’s of records/sec
S Scalable cache capacity – Adding several nodes adds to
available cache
S Data Layout – Excels at key lookup and no penalty for sparse
columns
6. Column Oriented Databases
S HBase belongs to family of databases called as Column-oriented
S Column-oriented databases save their data grouped by columns.
S The reason to store values on a per-column basis instead is based on
the assumption that, for specific queries, not all of the values are
needed.
S Reduced I/O is one of the primary reasons for this new layout
S Specialized algorithms—for example, delta and/or prefix
compression—selected based on the type of the column (i.e., on the
data stored) can yield huge improvements in compression ratios.
Better ratios result in more efficient bandwidth usage.
7. HBase as a Column Oriented
Database
S HBase is not a column-oriented database in the typical RDBMS
sense, but utilizes an on-disk column storage format.
S This is also where the majority of similarities end, because although
HBase stores data on disk in a column-oriented format, it is distinctly
different from traditional columnar databases.
S Whereas columnar databases excel at providing real-time analytical
access to data, HBase excels at providing key-based access to a
specific cell of data, or a sequential range of cells.
8. HBase and Hadoop
S Hadoop excels at storing data of arbitrary, semi-, or even
unstructured formats, since it lets you decide how to interpret
the data at analysis time, allowing you to change the way you
classify the data at any time: once you have updated the
algorithms, you simply run the analysis again.
S HBase sits atop Hadoop using all the best features of HDFS
such as scalability and data replication
9. HBase UnUsage
S When data access patterns are unknown – HBase follows a
data centric model rather than relationship centric, Hence it
does not make sense doing an ERD model for HBase
S Small amounts of data – Just use an RDBMS
S Limited/No random reads and writes – Just use HDFS directly
10. HBase Use Cases - Facebook
S One of the earliest and largest users of HBase
S Facebook messaging platform built atop HBase in 2010
S Chosen because of the high write throughput and low latency
random reads
S Other features such as Horizontal Scalability, Strong
Consistency and High Availability via Automatic Failover.
11. HBase Use Cases - Facebook
S In addition to online transaction processing workloads like
messages, it is also used for online analytic processing
workloads where large data scans are prevalent.
S Also used in production by other Facebook services, including
the internal monitoring system, the recently launched Nearby
Friends feature, search indexing, streaming data analysis, and
data scraping for their internal data warehouses.
12. Seek vs. Transfer
S One of the fundamental differences
between typical RDBMS and nosql
ones is the use of B or B+ trees and
Log Structured Merge Trees(LSM)
which was the basis of Google’s
BigTable
13. B+ Trees
S B+ Trees allow for efficient insertion, lookup and deletion of
records that are identified by keys.
S Represent dynamic, multilevel indexes with lower and upper
bounds per segment or page
S This allows for higher fanout compared to binary trees
resulting in lower number of I/O operations
S Range scans are also very efficient
14. LSM Trees
S Incoming data first stored in logfile, completely sequentially
S Once the log has modification saved, updates an in-memory
store.
S Once enough updates are accrued in the in-memory store, it
flushes a sorted list of key->record pairs to disks creating store
files.
S At this point all updates to log can be deleted since
modifications have been persisted.
15. Fundamental Difference
S Disk Drives
S Too Many Modifications force costly optimizations.
S More Data at random locations cause faster fragmentation
S Updates and deletes are done at disk seek rates rather than
disk transfer rates
16. Fundamental Difference(Contd)
S Works at disk transfer rates
S Scales better to handle large amounts of data.
S Guarantees consistent insert rate
S Transform random writes into sequential writes using logfiles
plus in-memory store
S Reads independent from writes so no contention between the
two
17. HBase Basics
S When data is added to HBase, it is first written to the WAL(Write
ahead log) called HLog.
S Once the write is done, it is then written to an in memory called
MemStore
S Once the memory exceeds a certain threshold, it flushes to disk as
an HFile
S Over time HBase merges smaller HFiles into larger ones. This process
is called compaction
22. Facebook-Hydrabase
S In HBase, when a regionserver fails, all regions hosted by that
regionserver are moved to another regionserver.
S Depending on HBase has been setup, this typically entails
splitting and replaying the WAL files which could take time and
lengthens the failover
S Hydrabase differs from HBase in this. Instead of having a
region by a single region server, it is hosted by a set of
regionservers.
S When a regionserver fails, there are standby regionservers
ready to take over
23. Facebook-Hydrabase
S The standby region servers can be spread across different
racks or even data centers, providing availability.
S The set of region servers serving each region form a quorum.
Each quorum has a leader that services read and write
requests from the client.
S HydraBase uses the RAFT consensus protocol to ensure
consistency across the quorum.
S With a quorum of 2F+1, HydraBase can tolerate up to F
failures.
S Increases reliability from 99.99% to 99.999% ~ 5 mins
downtime/year.
24. HBase Users - Flurry
S Mobile analytics, monetization and advertising company
founded in 2005
S Recently acquired by Yahoo
S 2 data centers with 2 clusters each, bi directional replication
S 1000 slave nodes per cluster – 32 GB RAM, 4 drives(1 or 2 TB),
1 Gig E, Dual Quad Core processors *2 HT = 16 procs
S ~30 tables, 250k regions, 430TB(after LZO compression)
S 2 big tables are approx 90% of that, 1 wide table with 3 CF, 4
billion rows with 1 MM cells per row. The other tall table with
1 CF, 1 trillion rows and 1 cell per row
25. HBase Security – 0.98
S Cell Tags – All values in HBase are now written in cells, can also
carry arbitrary no. of tags such as metadata
S Cell ACLs – enables the checking of (R)ead, (W)rite, E(X)excute,
(A)dmin & (C)reate
S Cell Labels – Visibility expression support via new security
coprocessor
S Transparent Encryption – data is encrypted on disk – HFiles are
encrypted when written and decrypted when read
S RBAC – Uses Hadoop Group Mapping Service and ACL’s to
implement
26. Apache Phoenix
S SQL layer atop HBase – Has a query engine, metadata
repository & embedded JDBC driver, top level apache project,
currently only for HBase
S Fastest way to access HBase data – HBase specific push down,
compiles queries into native, direct HBase calls(no map-
reduce), executes scans in parallel
S Integrates with Pig, Flume & Sqoop
S Phoenix maps HBase data model to relational world
28. Open TSDB 2.0
S Distributed, Scalable Time Series Database on top of HBase
S Time Series – data points for identity over time.
S Stores trillions of data points, never loses precision, scales
using HBase
S Good for system monitoring & measurement – servers &
networks, Sensor data – The internet of things, SCADA,
Financial data, Results of Scientific experiments, etc.
29. Open TSDB 2.0
S Users – OVH(3rd largest cloud/hosting provider) to monitor
everything from networking, temperature, voltage to resource
utilization, etc.
S Yahoo uses it to monitor application performance & statistics
S Arista networking uses it for high performance networking
S Other users such as Pinterest, Ebay, Box, etc.
30. Apache Slider(Incubator)
S YARN application to deploy existing distributed applications on
YARN, monitor them and make them larger or smaller as
desired -even while the application is running.
S Incubator Apache Project; Similar to Tez for Hive/Pig
S Applications can be stopped, "frozen" and restarted, "thawed"
later; It allows users to create and run multiple instances of
applications, even with different application versions if needed
S Applications such as HBase, Accumulo & Storm can run atop it
31. Thanks!!
S Credits – Apache, Cloudera, Hortonworks, MapR, Facebook,
Flurry & HBaseCon
S @sawjd22
S www.linkedin.com/in/sawjd/
S Q & A
Editor's Notes
How their architecture makes use of modern hardware especially disk drives.
B+ Trees work well until there are too many modifications, because they force you to perform costly optimizations to retain that advantage for a limited amount of time.
The more and faster you add data at random locations, the faster the pages become fragmented again. Eventually, you may take in data at a higher rate than the optimization process takes to rewrite the existing files. The updates and deletes are done at disk seek rates, rather than disk transfer rates.
LSM-trees work at disk transfer rates and scale much better to handle large amounts of data. They also guarantee a very consistent insert rate, as they transform random writes into sequential writes using the logfile plus in-memory store. The reads are independent from the writes, so you also get no contention between these two operations.