SlideShare une entreprise Scribd logo
1  sur  61
HBase
RDBMS Scaling 
• Cannot scale for large distributed data sets 
• Vendors Offers replication and partition solutions to 
grow the database beyond the confines of single node, 
but generally complicated to install and maintain 
• Such techniques compromise RDBMS features such as 
– Joins, Complex queries, Views, Triggers and foreign key 
constraints 
– These queries becomes expensive
Why BigTable? 
• Performance of RDBMS system is good for transaction 
processing but for very large scale analytic processing, the 
solutions are expensive, and specialized. 
• Very large scale analytic processing 
– Big queries – typically range or table scans. 
– Big databases (100s of TB) 
• Map reduce on Bigtable with optionally Cascading on top to 
support some relational algebras may be a cost effective 
solution. 
• Sharding (Shared nothing horizontal partitioning) is not a 
solution to scale open source RDBMS platforms 
• Application specific 
• Labor intensive (re)partitionaing
Key concept 
HBase is a distributed column-oriented database built 
on top of HDFS. 
• At its core, HBase / BigTable is a map. 
• It is a persistent storage. 
• HBase and BigTable are built upon distributed file-systems. 
• Unlike most map implementations, in 
HBase/BigTable the key/value pairs are kept in strict 
alphabetical order. 
• Multidimensional map. 
• Sparse.
Map 
• A map is "an abstract data type composed of a 
collection of keys and a collection of values, where 
each key is associated with one value." 
{ 
"Name" : "Subhas", 
"Mail" : "subhas.ghosh@siemens.com", 
"Location" : "9F-TA-WS-21", 
"Phone" : "+918025113529", 
"Sal" : ************ 
} 
In this example "Name" is a key, and "Subhas" is the 
corresponding value.
Persistent 
• Persistence merely means that the data you put 
in this special map "persists" after the program 
that created or accessed it is finished. 
• This is no different in concept than any other 
kind of persistent storage such as a file on a file-system. 
• Each value can be versioned in HBase
Distributed 
• Built upon distributed file-systems 
– file storage can be spread out among an array of 
independent machines. 
– HBase sits atop either Hadoop's Distributed File System 
(HDFS) or Amazon's Simple Storage Service (S3), 
– BigTable makes use of the Google File System (GFS). 
• Data is replicated across a number of participating 
nodes in an analogous manner to how data is striped 
across discs in a RAID system.
Sorted 
Continuing our example, the sorted version looks like this: 
{ 
"Location" : "9F-TA-WS-21", 
"Mail" : "subhas.ghosh@siemens.com", 
"Name" : "Subhas", 
"Phone" : "+918025113529", 
"Sal" : ************ 
} 
Sorting can ensure that items of greatest interest to you are 
near each other
Multidimensional 
A map of maps 
{ 
"Location" : 
{ 
"FL" : "9F", 
"TOWER" : "A", 
"WS" : "21“ 
}, 
"Mail" : "subhas@xyz.com", 
"Name" : 
{ 
"FIRST": "Subhas", 
"MID" : "Kumar", 
"LAST" : "Ghosh“ 
}, 
"Phone" : "+918025113529", 
"Sal" : ************ 
} 
Each key points to a map 
with one or more keys: 
"FL", "TOWER", "WS" e.g. 
Top-level key/map pair is a 
"row". 
Also, in BigTable/HBase 
nomenclature, the "FL" and 
"TOWER" mappings would 
be called "Column 
Families".
Multidimensional 
• A table's column families are specified when 
the table is created, and are difficult or 
impossible to modify later. 
• It can also be expensive to add new column 
families, so it's a good idea to specify all the 
ones you'll need up front. 
• Fortunately, a column family may have any 
number of columns, denoted by a column 
"qualifier" or "label".
Multidimensional 
… 
"aaaaa" : { 
"A" : { 
"foo" : "y", 
"bar" : "d" 
}, 
"B" : { 
"" : "w" } 
}, 
"aaaab" : { 
"A" : { 
"foo" : "world", 
"bar" : "domination" 
}, 
"B" : { 
"" : "ocean" } 
} 
}, 
… 
Column family with two 
columns: "foo" and 
"bar", 
When asking HBase/BigTable for 
data provide the full column name 
in the form "<family>:<qualifier>“, 
e.g. "A:foo", "A:bar" and "B:". 
"B" column family has just 
one column whose qualifier 
is the empty string ("").
Multidimensional 
• Labeled tables of rows X columns X timestamp 
– Cells addressed by row/column/timestamp 
– As (perverse) java declaration: 
SortedMap<byte [], SortedMap<byte [], 
List<Cell>>>> hbase = new TreeMap<ditto>(new RawByteComparator()); 
• Row keys uninterpreted byte arrays: E.g. an URL 
– Rows are ordered by Comparator (Default: byte-order) 
– Row updates are atomic; even if hundreds of columns 
• Columns grouped into column-families 
– Columns have column-family prefix and then qualifier 
• E.g. webpage:mimetype, webpage:language 
– Column-family 'printable', qualifier arbitrary bytes 
– Column-families in table schema but not qualifiers
Multidimensional 
• Cell is uninterpreted byte array and a 
timestamp 
– E.g. webpage content 
• Tables partitioned into Regions 
– Region defined by start & end row 
– Regions are the 'atoms' of distribution 
deployed around the cluster. 
– start < end - in lexicographic sense
Multidimensional 
Time Stamp value 
Column Family 
Row key 
Qualifier
Sparse 
• Not all columns in all rows are filled
What HBase Is Not 
• Tables have one primary index, the row key. 
• No join operators. 
• Scans and queries can select a subset of available columns, 
perhaps by using a wildcard. 
• There are three types of lookups: 
– Fast lookup using row key and optional timestamp. 
– Full table scan 
– Range scan from region start to end. 
• Limited atomicity and transaction support. 
– HBase supports multiple batched mutations of single rows only. 
– Data is unstructured and untyped. 
• Not accessed or manipulated via SQL. 
– Programmatic access via Java, REST, or Thrift APIs. 
– Scripting via JRuby. 
– No JOIN, No sophisticated query engine, No column typing, no 
ODBC/JDBC, No Crystal Reports, No transactions, No secondary indices
Map-Reduce With HBase 
• When we use a map-reduce framework with HBase 
table, a map function is executed for each region 
independently in parallel. 
• Within each map query is answered by scanning the 
rows in a ordered manner starting with low ordered 
key to higher ordered key. 
• Optionally, certain rows and columns (column families) 
can be filtered out for better performance.
Architecture
Elements 
– Table : a list of tuples sorted by row key ascending, column 
name ascending and timestamp descending. 
– Regions: A Table is broken up into row ranges called regions. 
Each row range contains rows from start-key to end-key. (A set 
of regions, sorted appropriately, forms an entire table.) 
– HStore: Each column family in a region is managed by an 
HStore. 
– HFile: Each HStore may have one or more HFile (a Hadoop 
HDFS file type).
Components 
• Master 
o Responsible for monitoring region servers 
o Load balancing for regions 
o Redirect client to correct region servers 
o The current SPOF (single point of failure) 
• Regionserver slaves 
o Serving requests(Write/Read/Scan) of Client 
o Send HeartBeat to Master 
o Throughput and Region numbers are scalable by 
region servers
Components 
• ZooKeeper 
– centralized service for maintaining 
• configuration information, 
• naming, 
• providing distributed synchronization, and 
• providing group services. 
– ZooKeeper allows distributed processes to coordinate with each other 
through a shared hierarchal namespace 
• organized similarly to a standard file system. 
• The name space consists of data registers - called znodes 
• in ZooKeeper parlance - and these are similar to files and directories. 
• Unlike a typical file system, which is designed for storage, ZooKeeper 
data is kept in-memory, which means ZooKeeper can acheive high 
throughput and low latency numbers.
Distributed Coordination 
Data model and the hierarchical namespace
Distributed Coordination 
• The replicated database is in-memory. 
• Updates are logged to disk for recoverability. 
• Writes are serialized to disk before they are applied to the in-memory 
database. 
• Clients connect to exactly one server to submit requests. 
• Read requests are serviced from the local replica of each server database. 
• Requests that change the state of the service, write requests, are 
processed by an agreement protocol.
Distributed Coordination 
• As part of the agreement protocol all write requests from 
clients are forwarded to a single server, called the leader. 
• The rest of the ZooKeeper servers, called followers, receive 
message proposals from the leader and agree upon message 
delivery. 
• The messaging layer takes care of replacing leaders on failures 
and syncing followers with leaders. 
• ZooKeeper uses a custom atomic messaging protocol. 
– ZooKeeper can guarantee that the local replicas never diverge. 
– When the leader receives a write request, it calculates what the state of 
the system is when the write is to be applied and transforms this into a 
transaction that captures this new state.
The general protocol flow
The general protocol flow 
1. Client contacts the Zookeeper to find where it shall put the data. 
2. For this purpose, HBase maintains two catalog tables, namely, -ROOT-, and 
.META.. 
3. First HBase finds information from the -ROOT- table about location of 
.META. Table. 
4. Subsequently about the server location of the assigned region of a table 
from the .META. table. 
5. Client caches this information and contacts the HRegionServer. 
6. Next the HRegionServer creates a HRegion object corresponding to the 
opened region. 
1. When the HRegion is "opened" it sets up a HStore instance for each 
HColumnFamily for every table as defined by the user beforehand. 
2. Each of the Store instances have one or more StoreFile instances 
3. StoreFile are lightweight wrappers around the actual storage file called HFile.
Where is my data? 
Zookeeper 
.META. 
-ROOT- MyRow 
MyTable 
Row per table region 
Row per META region 
Client
The general protocol flow 
7. The client issues a HTable.put(Put) request to the HRegionServer which hands 
the details to the matching HRegion instance. 
8. The first step is to decide if the data should be first written to the "Write-Ahead- 
Log" (WAL) represented by the HLog class. The WAL is a standard Hadoop 
SequenceFile and it stores HLogKey's. 
9. These keys contain a sequential number as well as the actual data and are used 
to replay not yet persisted data after a server crash. 
10. Once the data is written (or not) to the WAL it is placed in the MemStore. At the 
same time it is checked if the MemStore is full and in that case a flush to disk is 
requested. 
11. The store files created on disk are immutable. Sometimes the store files are 
merged together; this is done by a process called compaction. This buffer-flush-merge 
strategy is a common pattern described in Log-Structured Merge-Tree. 
12. After a compaction, if a newly written store file size is greater than the size 
specified in hbase.hregion.max.filesize (default 256 MB), the region is split into 
two new regions. 
Flush Flush Flush Compact Flush Flush Compact Flush Flush Flush Compact
Log Structured Merge Trees 
• Random IO for writes is bad in HDFS. 
• LSM Trees convert random writes to sequential writes. 
• Writes go to a commit log and in-memory storage 
(MemStore) 
• The MemStore is occasionally flushed to disk 
(StoreFile) 
• The disk stores are periodically compacted to HFile (on 
HDFS) 
• Use Bloom Filters with merge.
Buffer-Flush-Compact (minor) 
Region 
Memstore 
HLog 
(Append only WAL on 
HDFS) 
(Sequence file) 
(One per region) 
HFile on 
HDFS 
Compact 
HFile on 
HDFS 
StoreFile 
HFile on 
HDFS 
Buffer 
Read 
Flush 
HFile: immutable sorted map (byte[]  byte[]) 
(row, column, timestamp  cell value)
Compaction 
• Major compaction: 
– The most important difference between minor and major compactions is 
that major compactions processes delete markers, max versions, etc, 
while minor compactions don't. 
– This is because delete markers might also affect data in the non-merged 
files, so it is only possible to do this when merging all files. 
• When a delete is performed in HBase table, nothing gets 
deleted immediately, rather a delete marker (a.k.a. tombstone) 
is written. 
– This is because HBase does not modify files once they are written. 
– The deletes are processed during the major compaction process; at 
which point the data they hide and the delete marker itself will not be 
present in the merged file.
In Short
Java Example 
HBaseConfiguration config = new HBaseConfiguration(); 
HTable table = new HTable(config, "myTable"); 
Cell cell = table.get("myRow", 
"myColumnFamily:columnQualifier1");
Java Example: A Table Mapper 
Scan scan = new Scan(); scan.addColumns(COLUMN_FAMILIY_NAME); 
//add some more filters to acan here as scan.setFilter(...); 
TableMapReduceUtil.initTableMapperJob(TABLE_NAME, scan, Mapper.class, 
ImmutableBytesWritable.class, IntWritable.class, job); 
TableMapper<ImmutableBytesWritable, IntWritable> 
{ 
@Override 
public void map(ImmutableBytesWritable row, Result values, Context context) throws 
IOException 
{ 
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get()); 
for (KeyValue value: values.list()) 
{ 
ByteBuffer b = ByteBuffer.wrap(value.getValue()); 
String column = Bytes.toString(value.getColumn()); 
//compute something and put in the int res 
try { context.write(userKey, res); } 
catch (InterruptedException e) { throw new IOException(e); } 
} 
} 
} 
KeyValue in the HFile is a low-level byte array that allows for "zero-copy" access to the data, 
even with lazy or custom parsing if necessary.
Map-Reduce with HBase
Map-Reduce with Hbase - Classes
InputFormat 
• InputFormat class is responsible for the actual splitting of the input data as 
well as returning a RecordReader instance that defines the classes of 
the key and value objects as well as providing a next() method that is used to 
iterate over each input record. 
• In HBase implementation is called TableInputFormatBase as well as its 
subclass TableInputFormat. 
• TableInputFormat is a light-weight concrete version. 
• You can provide the name of the table to scan and the columns you want to 
process during the Map phase. 
• It splits the table into proper pieces for you and hands them over to the 
subsequent classes.
Mapper 
• The Mapper class(es) are for the next stage of the MapReduce. 
• In this step each record read using the RecordReader is processed using 
the map() method. 
• A TableMap class that is specific to iterating over a HBase table. 
• Once specific implementation is the IdentityTableMap which is also a good 
example on how to add your own functionality to the supplied classes. 
• The TableMap class itself does not implement anything but only adds the 
signatures of what the actual key/value pair classes are. 
• The IdentityTableMap is simply passing on the records to the next stage of 
the processing.
Reducer 
• The Reduce stage and class layout is very similar to the Mapper 
one explained above. 
• This time we get the output of a Mapper class and process it 
after the data was shuffled and sorted.
OutputFormat 
• The final stage is the OutputFormat class and its job to persist the data in 
various locations. 
• There are specific implementations that allow output to files or to HBase 
tables in case of the TableOutputFormat. 
• It uses a RecordWriter to write the data into the specific HBase output 
table. 
• It is important to note the cardinality as well. 
• While there are many Mappers handing records to many Reducers, there is 
only one OutputFormat that takes each output record from its Reducer 
subsequently. 
• It is the final class handling the key/value pairs and writes them to their final 
destination, this being a file or a table. 
• The name of the output table is specified when the job is created.
Map-reduce options with HBase 
Raw data Table-A Table-B 
Raw Data 
Map + 
Reduce 
(Hadoop) 
Map only or 
Map + 
Reduce 
Map only or 
Map + 
Reduce 
Table-A 
Map only or 
Map + 
Reduce 
Map + 
Reduce Map 
Table-B 
Map only or 
Map + 
Reduce Map 
Map + 
Reduce 
Output 
Input 
Reading and writing into same table: hinder the proper distribution of regions 
across the servers (open scanners block regions splits) and may or may not see the 
new data as you scan. must write in the TableReduce.reduce() 
Read from one table and write to another: can write updates directly in the 
TableMap.map() 
Map stage completely reads a table and then passes the data on in 
intermediate files to the Reduce stage. 
Reducer reads from DFS and writes into the now idle HBase table
Usage
Classes 
• HBaseAdmin 
• HBaseConfiguration 
• HTable 
• HTableDescriptor 
• Put 
• Get 
• Scanner 
• Filters 
Database Admin 
Table 
Family 
Column Qualifier
Using HBase API 
HBaseConfiguration: Adds HBase configuration files to a Configuration 
new HBaseConfiguration ( ) 
new HBaseConfiguration (Configuration c) 
<property> 
<name> name 
</name> 
<value> value 
</value> 
</property> 
HBaseAdmin: new HBaseAdmin( HBaseConfiguration conf ) 
• Ex: 
HBaseAdmin admin = new HBaseAdmin(config); 
admin.disableTable (“tablename”);
Using HBase API 
HTableDescriptor: HTableDescriptor contains the name of an HTable, and its 
column families. 
new HTableDescriptor() 
new HTableDescriptor(String name) 
• Ex: HTableDescriptor htd = new HTableDescriptor(tablename); 
htd.addFamily ( new HColumnDescriptor (“Family”)); 
HColumnDescriptor: An HColumnDescriptor contains information about a column family 
new HColumnDescriptor(String familyname) 
• Ex: 
HTableDescriptor htd = new HTableDescriptor(tablename); 
HColumnDescriptor col = new HColumnDescriptor("content:"); 
htd.addFamily(col);
Using HBase API 
HTable: Used for communication with a single HBase table. 
new HTable(HBaseConfiguration conf, String tableName) 
• Ex: 
HTable table = new HTable (conf, Bytes.toBytes ( tablename )); 
ResultScanner scanner = table.getScanner ( family ); 
Put: Used to perform Put operations for a single row. 
new Put(byte[] row) 
new Put(byte[] row, RowLock rowLock) 
• Ex: 
HTable table = new HTable (conf, Bytes.toBytes ( tablename )); 
Put p = new Put ( brow ); 
p.add (family, qualifier, value); 
table.put ( p );
Using HBase API 
Get: Used to perform Get operations on a single row. 
new Get (byte[] row) 
new Get (byte[] row, RowLock rowLock) 
• Ex: 
HTable table = new HTable(conf, Bytes.toBytes(tablename)); 
Get g = new Get(Bytes.toBytes(row)); 
Result: Single row result of a Get or Scan query. 
new Result() 
• Ex: 
HTable table = new HTable(conf, Bytes.toBytes(tablename)); 
Get g = new Get(Bytes.toBytes(row)); 
Result rowResult = table.get(g); 
Bytes[] ret = rowResult.getValue( (family + ":"+ column ) );
Using HBase API 
Scanner 
• All operations are identical to Get 
– Rather than specifying a single row, an optional startRow and stopRow 
may be defined. 
• If rows are not specified, the Scanner will iterate over all rows. 
– = new Scan () 
– = new Scan (byte[] startRow, byte[] stopRow) 
– = new Scan (byte[] startRow, Filter filter)
HBase Shell 
• Non-SQL (intentional) “DSL” 
• list : List all tables in hbase 
• get : Get row or cell contents; pass table name, row, and optionally a 
dictionary of column(s), timestamp and versions. 
• put : Put a cell 'value' at specified table/row/column and optionally 
timestamp coordinates. 
• create : hbase> create 't1', {NAME => 'f1', VERSIONS => 5} 
• scan : Scan a table; pass table name and optionally a dictionary of 
scanner specifications. 
• delete : Put a delete cell value at specified table/row/column and 
optionally timestamp coordinates. 
• enable : Enable the named table 
• disable : Disable the named table: e.g. "hbase> disable 't1'" 
• drop : Drop the named table.
HBase non-java access 
• Languages talking to the JVM: 
– Jython interface to HBase 
– Groovy DSL for HBase 
– Scala interface to HBase 
• Languages with a custom protocol 
– REST gateway specification for HBase 
– Thrift gateway specification for HBase
Example: Frequency Counter 
• Hbase has records of web_access_logs -We record each web page access by 
a user. 
• The schema looks like this: 
userID_timestamp => { 
details => { 
page: 
} 
} 
• We want to count how many times 
we have seen each user 
row details:page 
user1_t1 a.html 
user2_t2 b.html 
user3_t4 a.html 
user1_t5 c.html 
user1_t6 b.html 
user2_t7 c.html 
user4_t8 a.html 
user count (frequency) 
user1 3 
user2 2 
user3 1 
user4 1
Tutorial 
• hbase shell 
create 'access_logs', 'details' 
create 'summary_user', {NAME=>'details', VERSIONS=>1} 
• Add some data using Importer 
• scan 'access_logs', {LIMIT => 5} 
• Run 'FreqCounter' 
• scan 'summary_user', {LIMIT => 5} 
• Show output with PrintUserCount
coprocessors 
• HBase 0.92 release provides coprocessors functionality which includes 
– observers (similar to triggers for certain events) and 
– endpoints (similar to stored procedures to be invoked from the client) 
• Observers can be at the region, master or at the WAL (Write Ahead Log) 
level. 
• Once a Region Observer has been created, it can be specified in the hbase-default. 
xml which applies to all the regions and the tables in it or else the 
Region Observer can be specified on a table in which case it applies only to 
that table. 
• Arbitrary code can run at each tablet in table server 
• High-level call interface for clients 
– Calls are addressed to rows or ranges of rows and the coprocessor client library 
resolves them to actual locations; 
– Calls across multiple rows are automatically split into multiple parallelized RPC 
• Provides a very flexible model for building distributed services 
• Automatic scaling, load balancing, request routing for applications
Three observer interfaces 
• RegionObserver: Provides hooks for data manipulation events, Get, Put, 
Delete, Scan, and so on. There is an instance of a RegionObserver 
coprocessor for every table region and the scope of the observations they 
can make is constrained to that region. 
• WALObserver: Provides hooks for write-ahead log (WAL) related operations. 
This is a way to observe or intercept WAL writing and reconstruction events. 
A WALObserver runs in the context of WAL processing. There is one such 
context per region server. 
• MasterObserver: Provides hooks for DDL-type operation, i.e., create, delete, 
modify table, etc. The MasterObserver runs within the context of the HBase 
master.
Example 
package org.apache.hadoop.hbase.coprocessor; 
import java.util.List; 
import org.apache.hadoop.hbase.KeyValue; 
import org.apache.hadoop.hbase.client.Get; 
// Sample access-control coprocessor. It utilizes RegionObserver 
// and intercept preXXX() method to check user privilege for the given table 
// and column family. 
public class AccessControlCoprocessor extends BaseRegionObserver { 
@Override 
public void preGet(final ObserverContext<RegionCoprocessorEnvironment> c, 
final Get get, final List<KeyValue> result) throws IOException 
throws IOException { 
// check permissions.. 
if (!permissionGranted()) { 
throw new AccessDeniedException("User is not allowed to access."); 
} 
} 
// override prePut(), preDelete(), etc. 
}
Avoiding long pause from The Garbage Collector 
• Stop-the-world garbage collections is common in HBase, 
especially during loading. 
• There are two issues to be addressed 
– concurrent mark and sweep (CMS) performance, and 
– fragmentation of memstore. 
• To address the first, start the CMS earlier than default by adding 
-XX:CMSInitiatingOccupancyFraction and setting it down from 
defaults. Start at 60 or 70 percent (The lower you bring down 
the threshold, the more GCing is done, the more CPU used). 
• To address the second fragmentation issue, there is an 
experimental facility hbase.hregion.memstore.mslab.enabled 
(memstore local allocation buffer) to be set to true in 
configuration.
For loading data Pre-Create Regions 
• Tables in HBase are initially created with one region by 
default. 
• For bulk imports, this means that all clients will write 
to the same region until it is large enough to split and 
become distributed across the cluster. 
• A useful pattern to speed up the bulk import process is 
to pre-create empty regions. 
• Note that too-many regions can actually degrade 
performance.
Enable Scan Caching 
• When HBase is used as an input source for a MapReduce job, 
set setCaching to something greater than the default (which is 
1). 
• Using the default value => map-task will make call back to the 
region-server for every record processed. 
– Setting this value to 80, for example, will transfer 80 rows at a time to 
the client to be processed. 
• There is a cost/benefit to have the cache value be large because 
it costs more in memory for both client and RegionServer, so 
bigger isn't always better. 
• It appears from the experimentation that selecting a value 
between 50 and 100 gives good performance in our setup.
Right Scan Attribute Selection 
• Whenever a Scan is used to process large numbers of 
rows (and especially when used as a MapReduce 
source), we shall select the right set of attributes. 
• If scan.addFamily is called then all of the attributes in 
the specified ColumnFamily will be returned to the 
client. 
• If only a small number of the available attributes are to 
be processed, then only those attributes should be 
specified in the input scan because attribute over-selection 
is a non-trivial performance penalty over 
large datasets.
Optimize handler.count 
• Count of RPC Listener instances spun up on 
RegionServers. Same property is used by the Master 
for count of master handlers. 
– Default is 10. 
• This setting in essence sets how many requests are 
concurrently being processed inside the RegionServer 
at any one time. 
• If multiple map-reduce job is running in the cluster 
and there is enough map capacity to handle the jobs 
concurrently, then this parameter needs to be tuned.
End of session 
Day – 4: HBase

Contenu connexe

Tendances

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 

Tendances (19)

Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
6.hive
6.hive6.hive
6.hive
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Unit 1
Unit 1Unit 1
Unit 1
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 

En vedette

En vedette (6)

Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Simplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolSimplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query Tool
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 

Similaire à 01 hbase

Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 

Similaire à 01 hbase (20)

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Hbase
HbaseHbase
Hbase
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Hbase
HbaseHbase
Hbase
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hbase
HbaseHbase
Hbase
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 

Plus de Subhas Kumar Ghosh

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
Subhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 

Plus de Subhas Kumar Ghosh (17)

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Greedy embedding problem
Greedy embedding problemGreedy embedding problem
Greedy embedding problem
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Dernier (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

01 hbase

  • 2. RDBMS Scaling • Cannot scale for large distributed data sets • Vendors Offers replication and partition solutions to grow the database beyond the confines of single node, but generally complicated to install and maintain • Such techniques compromise RDBMS features such as – Joins, Complex queries, Views, Triggers and foreign key constraints – These queries becomes expensive
  • 3. Why BigTable? • Performance of RDBMS system is good for transaction processing but for very large scale analytic processing, the solutions are expensive, and specialized. • Very large scale analytic processing – Big queries – typically range or table scans. – Big databases (100s of TB) • Map reduce on Bigtable with optionally Cascading on top to support some relational algebras may be a cost effective solution. • Sharding (Shared nothing horizontal partitioning) is not a solution to scale open source RDBMS platforms • Application specific • Labor intensive (re)partitionaing
  • 4. Key concept HBase is a distributed column-oriented database built on top of HDFS. • At its core, HBase / BigTable is a map. • It is a persistent storage. • HBase and BigTable are built upon distributed file-systems. • Unlike most map implementations, in HBase/BigTable the key/value pairs are kept in strict alphabetical order. • Multidimensional map. • Sparse.
  • 5. Map • A map is "an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value." { "Name" : "Subhas", "Mail" : "subhas.ghosh@siemens.com", "Location" : "9F-TA-WS-21", "Phone" : "+918025113529", "Sal" : ************ } In this example "Name" is a key, and "Subhas" is the corresponding value.
  • 6. Persistent • Persistence merely means that the data you put in this special map "persists" after the program that created or accessed it is finished. • This is no different in concept than any other kind of persistent storage such as a file on a file-system. • Each value can be versioned in HBase
  • 7. Distributed • Built upon distributed file-systems – file storage can be spread out among an array of independent machines. – HBase sits atop either Hadoop's Distributed File System (HDFS) or Amazon's Simple Storage Service (S3), – BigTable makes use of the Google File System (GFS). • Data is replicated across a number of participating nodes in an analogous manner to how data is striped across discs in a RAID system.
  • 8. Sorted Continuing our example, the sorted version looks like this: { "Location" : "9F-TA-WS-21", "Mail" : "subhas.ghosh@siemens.com", "Name" : "Subhas", "Phone" : "+918025113529", "Sal" : ************ } Sorting can ensure that items of greatest interest to you are near each other
  • 9. Multidimensional A map of maps { "Location" : { "FL" : "9F", "TOWER" : "A", "WS" : "21“ }, "Mail" : "subhas@xyz.com", "Name" : { "FIRST": "Subhas", "MID" : "Kumar", "LAST" : "Ghosh“ }, "Phone" : "+918025113529", "Sal" : ************ } Each key points to a map with one or more keys: "FL", "TOWER", "WS" e.g. Top-level key/map pair is a "row". Also, in BigTable/HBase nomenclature, the "FL" and "TOWER" mappings would be called "Column Families".
  • 10. Multidimensional • A table's column families are specified when the table is created, and are difficult or impossible to modify later. • It can also be expensive to add new column families, so it's a good idea to specify all the ones you'll need up front. • Fortunately, a column family may have any number of columns, denoted by a column "qualifier" or "label".
  • 11. Multidimensional … "aaaaa" : { "A" : { "foo" : "y", "bar" : "d" }, "B" : { "" : "w" } }, "aaaab" : { "A" : { "foo" : "world", "bar" : "domination" }, "B" : { "" : "ocean" } } }, … Column family with two columns: "foo" and "bar", When asking HBase/BigTable for data provide the full column name in the form "<family>:<qualifier>“, e.g. "A:foo", "A:bar" and "B:". "B" column family has just one column whose qualifier is the empty string ("").
  • 12. Multidimensional • Labeled tables of rows X columns X timestamp – Cells addressed by row/column/timestamp – As (perverse) java declaration: SortedMap<byte [], SortedMap<byte [], List<Cell>>>> hbase = new TreeMap<ditto>(new RawByteComparator()); • Row keys uninterpreted byte arrays: E.g. an URL – Rows are ordered by Comparator (Default: byte-order) – Row updates are atomic; even if hundreds of columns • Columns grouped into column-families – Columns have column-family prefix and then qualifier • E.g. webpage:mimetype, webpage:language – Column-family 'printable', qualifier arbitrary bytes – Column-families in table schema but not qualifiers
  • 13. Multidimensional • Cell is uninterpreted byte array and a timestamp – E.g. webpage content • Tables partitioned into Regions – Region defined by start & end row – Regions are the 'atoms' of distribution deployed around the cluster. – start < end - in lexicographic sense
  • 14. Multidimensional Time Stamp value Column Family Row key Qualifier
  • 15. Sparse • Not all columns in all rows are filled
  • 16. What HBase Is Not • Tables have one primary index, the row key. • No join operators. • Scans and queries can select a subset of available columns, perhaps by using a wildcard. • There are three types of lookups: – Fast lookup using row key and optional timestamp. – Full table scan – Range scan from region start to end. • Limited atomicity and transaction support. – HBase supports multiple batched mutations of single rows only. – Data is unstructured and untyped. • Not accessed or manipulated via SQL. – Programmatic access via Java, REST, or Thrift APIs. – Scripting via JRuby. – No JOIN, No sophisticated query engine, No column typing, no ODBC/JDBC, No Crystal Reports, No transactions, No secondary indices
  • 17. Map-Reduce With HBase • When we use a map-reduce framework with HBase table, a map function is executed for each region independently in parallel. • Within each map query is answered by scanning the rows in a ordered manner starting with low ordered key to higher ordered key. • Optionally, certain rows and columns (column families) can be filtered out for better performance.
  • 19. Elements – Table : a list of tuples sorted by row key ascending, column name ascending and timestamp descending. – Regions: A Table is broken up into row ranges called regions. Each row range contains rows from start-key to end-key. (A set of regions, sorted appropriately, forms an entire table.) – HStore: Each column family in a region is managed by an HStore. – HFile: Each HStore may have one or more HFile (a Hadoop HDFS file type).
  • 20. Components • Master o Responsible for monitoring region servers o Load balancing for regions o Redirect client to correct region servers o The current SPOF (single point of failure) • Regionserver slaves o Serving requests(Write/Read/Scan) of Client o Send HeartBeat to Master o Throughput and Region numbers are scalable by region servers
  • 21. Components • ZooKeeper – centralized service for maintaining • configuration information, • naming, • providing distributed synchronization, and • providing group services. – ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace • organized similarly to a standard file system. • The name space consists of data registers - called znodes • in ZooKeeper parlance - and these are similar to files and directories. • Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.
  • 22. Distributed Coordination Data model and the hierarchical namespace
  • 23. Distributed Coordination • The replicated database is in-memory. • Updates are logged to disk for recoverability. • Writes are serialized to disk before they are applied to the in-memory database. • Clients connect to exactly one server to submit requests. • Read requests are serviced from the local replica of each server database. • Requests that change the state of the service, write requests, are processed by an agreement protocol.
  • 24. Distributed Coordination • As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. • The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. • The messaging layer takes care of replacing leaders on failures and syncing followers with leaders. • ZooKeeper uses a custom atomic messaging protocol. – ZooKeeper can guarantee that the local replicas never diverge. – When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.
  • 26. The general protocol flow 1. Client contacts the Zookeeper to find where it shall put the data. 2. For this purpose, HBase maintains two catalog tables, namely, -ROOT-, and .META.. 3. First HBase finds information from the -ROOT- table about location of .META. Table. 4. Subsequently about the server location of the assigned region of a table from the .META. table. 5. Client caches this information and contacts the HRegionServer. 6. Next the HRegionServer creates a HRegion object corresponding to the opened region. 1. When the HRegion is "opened" it sets up a HStore instance for each HColumnFamily for every table as defined by the user beforehand. 2. Each of the Store instances have one or more StoreFile instances 3. StoreFile are lightweight wrappers around the actual storage file called HFile.
  • 27. Where is my data? Zookeeper .META. -ROOT- MyRow MyTable Row per table region Row per META region Client
  • 28. The general protocol flow 7. The client issues a HTable.put(Put) request to the HRegionServer which hands the details to the matching HRegion instance. 8. The first step is to decide if the data should be first written to the "Write-Ahead- Log" (WAL) represented by the HLog class. The WAL is a standard Hadoop SequenceFile and it stores HLogKey's. 9. These keys contain a sequential number as well as the actual data and are used to replay not yet persisted data after a server crash. 10. Once the data is written (or not) to the WAL it is placed in the MemStore. At the same time it is checked if the MemStore is full and in that case a flush to disk is requested. 11. The store files created on disk are immutable. Sometimes the store files are merged together; this is done by a process called compaction. This buffer-flush-merge strategy is a common pattern described in Log-Structured Merge-Tree. 12. After a compaction, if a newly written store file size is greater than the size specified in hbase.hregion.max.filesize (default 256 MB), the region is split into two new regions. Flush Flush Flush Compact Flush Flush Compact Flush Flush Flush Compact
  • 29. Log Structured Merge Trees • Random IO for writes is bad in HDFS. • LSM Trees convert random writes to sequential writes. • Writes go to a commit log and in-memory storage (MemStore) • The MemStore is occasionally flushed to disk (StoreFile) • The disk stores are periodically compacted to HFile (on HDFS) • Use Bloom Filters with merge.
  • 30. Buffer-Flush-Compact (minor) Region Memstore HLog (Append only WAL on HDFS) (Sequence file) (One per region) HFile on HDFS Compact HFile on HDFS StoreFile HFile on HDFS Buffer Read Flush HFile: immutable sorted map (byte[]  byte[]) (row, column, timestamp  cell value)
  • 31. Compaction • Major compaction: – The most important difference between minor and major compactions is that major compactions processes delete markers, max versions, etc, while minor compactions don't. – This is because delete markers might also affect data in the non-merged files, so it is only possible to do this when merging all files. • When a delete is performed in HBase table, nothing gets deleted immediately, rather a delete marker (a.k.a. tombstone) is written. – This is because HBase does not modify files once they are written. – The deletes are processed during the major compaction process; at which point the data they hide and the delete marker itself will not be present in the merged file.
  • 33. Java Example HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTable(config, "myTable"); Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
  • 34. Java Example: A Table Mapper Scan scan = new Scan(); scan.addColumns(COLUMN_FAMILIY_NAME); //add some more filters to acan here as scan.setFilter(...); TableMapReduceUtil.initTableMapperJob(TABLE_NAME, scan, Mapper.class, ImmutableBytesWritable.class, IntWritable.class, job); TableMapper<ImmutableBytesWritable, IntWritable> { @Override public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException { ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get()); for (KeyValue value: values.list()) { ByteBuffer b = ByteBuffer.wrap(value.getValue()); String column = Bytes.toString(value.getColumn()); //compute something and put in the int res try { context.write(userKey, res); } catch (InterruptedException e) { throw new IOException(e); } } } } KeyValue in the HFile is a low-level byte array that allows for "zero-copy" access to the data, even with lazy or custom parsing if necessary.
  • 37. InputFormat • InputFormat class is responsible for the actual splitting of the input data as well as returning a RecordReader instance that defines the classes of the key and value objects as well as providing a next() method that is used to iterate over each input record. • In HBase implementation is called TableInputFormatBase as well as its subclass TableInputFormat. • TableInputFormat is a light-weight concrete version. • You can provide the name of the table to scan and the columns you want to process during the Map phase. • It splits the table into proper pieces for you and hands them over to the subsequent classes.
  • 38. Mapper • The Mapper class(es) are for the next stage of the MapReduce. • In this step each record read using the RecordReader is processed using the map() method. • A TableMap class that is specific to iterating over a HBase table. • Once specific implementation is the IdentityTableMap which is also a good example on how to add your own functionality to the supplied classes. • The TableMap class itself does not implement anything but only adds the signatures of what the actual key/value pair classes are. • The IdentityTableMap is simply passing on the records to the next stage of the processing.
  • 39. Reducer • The Reduce stage and class layout is very similar to the Mapper one explained above. • This time we get the output of a Mapper class and process it after the data was shuffled and sorted.
  • 40. OutputFormat • The final stage is the OutputFormat class and its job to persist the data in various locations. • There are specific implementations that allow output to files or to HBase tables in case of the TableOutputFormat. • It uses a RecordWriter to write the data into the specific HBase output table. • It is important to note the cardinality as well. • While there are many Mappers handing records to many Reducers, there is only one OutputFormat that takes each output record from its Reducer subsequently. • It is the final class handling the key/value pairs and writes them to their final destination, this being a file or a table. • The name of the output table is specified when the job is created.
  • 41. Map-reduce options with HBase Raw data Table-A Table-B Raw Data Map + Reduce (Hadoop) Map only or Map + Reduce Map only or Map + Reduce Table-A Map only or Map + Reduce Map + Reduce Map Table-B Map only or Map + Reduce Map Map + Reduce Output Input Reading and writing into same table: hinder the proper distribution of regions across the servers (open scanners block regions splits) and may or may not see the new data as you scan. must write in the TableReduce.reduce() Read from one table and write to another: can write updates directly in the TableMap.map() Map stage completely reads a table and then passes the data on in intermediate files to the Reduce stage. Reducer reads from DFS and writes into the now idle HBase table
  • 42. Usage
  • 43. Classes • HBaseAdmin • HBaseConfiguration • HTable • HTableDescriptor • Put • Get • Scanner • Filters Database Admin Table Family Column Qualifier
  • 44. Using HBase API HBaseConfiguration: Adds HBase configuration files to a Configuration new HBaseConfiguration ( ) new HBaseConfiguration (Configuration c) <property> <name> name </name> <value> value </value> </property> HBaseAdmin: new HBaseAdmin( HBaseConfiguration conf ) • Ex: HBaseAdmin admin = new HBaseAdmin(config); admin.disableTable (“tablename”);
  • 45. Using HBase API HTableDescriptor: HTableDescriptor contains the name of an HTable, and its column families. new HTableDescriptor() new HTableDescriptor(String name) • Ex: HTableDescriptor htd = new HTableDescriptor(tablename); htd.addFamily ( new HColumnDescriptor (“Family”)); HColumnDescriptor: An HColumnDescriptor contains information about a column family new HColumnDescriptor(String familyname) • Ex: HTableDescriptor htd = new HTableDescriptor(tablename); HColumnDescriptor col = new HColumnDescriptor("content:"); htd.addFamily(col);
  • 46. Using HBase API HTable: Used for communication with a single HBase table. new HTable(HBaseConfiguration conf, String tableName) • Ex: HTable table = new HTable (conf, Bytes.toBytes ( tablename )); ResultScanner scanner = table.getScanner ( family ); Put: Used to perform Put operations for a single row. new Put(byte[] row) new Put(byte[] row, RowLock rowLock) • Ex: HTable table = new HTable (conf, Bytes.toBytes ( tablename )); Put p = new Put ( brow ); p.add (family, qualifier, value); table.put ( p );
  • 47. Using HBase API Get: Used to perform Get operations on a single row. new Get (byte[] row) new Get (byte[] row, RowLock rowLock) • Ex: HTable table = new HTable(conf, Bytes.toBytes(tablename)); Get g = new Get(Bytes.toBytes(row)); Result: Single row result of a Get or Scan query. new Result() • Ex: HTable table = new HTable(conf, Bytes.toBytes(tablename)); Get g = new Get(Bytes.toBytes(row)); Result rowResult = table.get(g); Bytes[] ret = rowResult.getValue( (family + ":"+ column ) );
  • 48. Using HBase API Scanner • All operations are identical to Get – Rather than specifying a single row, an optional startRow and stopRow may be defined. • If rows are not specified, the Scanner will iterate over all rows. – = new Scan () – = new Scan (byte[] startRow, byte[] stopRow) – = new Scan (byte[] startRow, Filter filter)
  • 49. HBase Shell • Non-SQL (intentional) “DSL” • list : List all tables in hbase • get : Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions. • put : Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. • create : hbase> create 't1', {NAME => 'f1', VERSIONS => 5} • scan : Scan a table; pass table name and optionally a dictionary of scanner specifications. • delete : Put a delete cell value at specified table/row/column and optionally timestamp coordinates. • enable : Enable the named table • disable : Disable the named table: e.g. "hbase> disable 't1'" • drop : Drop the named table.
  • 50. HBase non-java access • Languages talking to the JVM: – Jython interface to HBase – Groovy DSL for HBase – Scala interface to HBase • Languages with a custom protocol – REST gateway specification for HBase – Thrift gateway specification for HBase
  • 51. Example: Frequency Counter • Hbase has records of web_access_logs -We record each web page access by a user. • The schema looks like this: userID_timestamp => { details => { page: } } • We want to count how many times we have seen each user row details:page user1_t1 a.html user2_t2 b.html user3_t4 a.html user1_t5 c.html user1_t6 b.html user2_t7 c.html user4_t8 a.html user count (frequency) user1 3 user2 2 user3 1 user4 1
  • 52. Tutorial • hbase shell create 'access_logs', 'details' create 'summary_user', {NAME=>'details', VERSIONS=>1} • Add some data using Importer • scan 'access_logs', {LIMIT => 5} • Run 'FreqCounter' • scan 'summary_user', {LIMIT => 5} • Show output with PrintUserCount
  • 53. coprocessors • HBase 0.92 release provides coprocessors functionality which includes – observers (similar to triggers for certain events) and – endpoints (similar to stored procedures to be invoked from the client) • Observers can be at the region, master or at the WAL (Write Ahead Log) level. • Once a Region Observer has been created, it can be specified in the hbase-default. xml which applies to all the regions and the tables in it or else the Region Observer can be specified on a table in which case it applies only to that table. • Arbitrary code can run at each tablet in table server • High-level call interface for clients – Calls are addressed to rows or ranges of rows and the coprocessor client library resolves them to actual locations; – Calls across multiple rows are automatically split into multiple parallelized RPC • Provides a very flexible model for building distributed services • Automatic scaling, load balancing, request routing for applications
  • 54. Three observer interfaces • RegionObserver: Provides hooks for data manipulation events, Get, Put, Delete, Scan, and so on. There is an instance of a RegionObserver coprocessor for every table region and the scope of the observations they can make is constrained to that region. • WALObserver: Provides hooks for write-ahead log (WAL) related operations. This is a way to observe or intercept WAL writing and reconstruction events. A WALObserver runs in the context of WAL processing. There is one such context per region server. • MasterObserver: Provides hooks for DDL-type operation, i.e., create, delete, modify table, etc. The MasterObserver runs within the context of the HBase master.
  • 55. Example package org.apache.hadoop.hbase.coprocessor; import java.util.List; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Get; // Sample access-control coprocessor. It utilizes RegionObserver // and intercept preXXX() method to check user privilege for the given table // and column family. public class AccessControlCoprocessor extends BaseRegionObserver { @Override public void preGet(final ObserverContext<RegionCoprocessorEnvironment> c, final Get get, final List<KeyValue> result) throws IOException throws IOException { // check permissions.. if (!permissionGranted()) { throw new AccessDeniedException("User is not allowed to access."); } } // override prePut(), preDelete(), etc. }
  • 56. Avoiding long pause from The Garbage Collector • Stop-the-world garbage collections is common in HBase, especially during loading. • There are two issues to be addressed – concurrent mark and sweep (CMS) performance, and – fragmentation of memstore. • To address the first, start the CMS earlier than default by adding -XX:CMSInitiatingOccupancyFraction and setting it down from defaults. Start at 60 or 70 percent (The lower you bring down the threshold, the more GCing is done, the more CPU used). • To address the second fragmentation issue, there is an experimental facility hbase.hregion.memstore.mslab.enabled (memstore local allocation buffer) to be set to true in configuration.
  • 57. For loading data Pre-Create Regions • Tables in HBase are initially created with one region by default. • For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. • A useful pattern to speed up the bulk import process is to pre-create empty regions. • Note that too-many regions can actually degrade performance.
  • 58. Enable Scan Caching • When HBase is used as an input source for a MapReduce job, set setCaching to something greater than the default (which is 1). • Using the default value => map-task will make call back to the region-server for every record processed. – Setting this value to 80, for example, will transfer 80 rows at a time to the client to be processed. • There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better. • It appears from the experimentation that selecting a value between 50 and 100 gives good performance in our setup.
  • 59. Right Scan Attribute Selection • Whenever a Scan is used to process large numbers of rows (and especially when used as a MapReduce source), we shall select the right set of attributes. • If scan.addFamily is called then all of the attributes in the specified ColumnFamily will be returned to the client. • If only a small number of the available attributes are to be processed, then only those attributes should be specified in the input scan because attribute over-selection is a non-trivial performance penalty over large datasets.
  • 60. Optimize handler.count • Count of RPC Listener instances spun up on RegionServers. Same property is used by the Master for count of master handlers. – Default is 10. • This setting in essence sets how many requests are concurrently being processed inside the RegionServer at any one time. • If multiple map-reduce job is running in the cluster and there is enough map capacity to handle the jobs concurrently, then this parameter needs to be tuned.
  • 61. End of session Day – 4: HBase