4. Bigtable at google
• "Bigtable is a distributed storage system for
managing structured data that is designed to scale to
a very large size: petabytes of data across thousands
of commodity servers. Many projects at Google store
data in Bigtable including web indexing, Google
Earth, and Google Finance.”
4
6. 1. The map is indexed by a
– <row key, column key, and a timestamp>
1. each value in the map is an uninterpreted array of
bytes.
6
(row key, column key, timestamp) => value
10. HBase
• Use HBase when you need random, realtime read/
write access to your Big Data.This project's goal is
the hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.
http://hbase.apache.org
10
11. HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-
oriented storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
11
12. HBase shell
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN
content:
info:author
info:category
info:title
4 row(s) in 0.0140 seconds
CELL
timestamp=1239135042862, value=CouchDB is a doc...
timestamp=1239135042755, value=Bob Smith
timestamp=1239135042982, value=Persistence
timestamp=1239135042623, value=Document-oriented...
12
15. Admin API
// Create a new table
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName);
desc.addFamily(new HColumnDescriptor("personal"));
desc.addFamily(new HColumnDescriptor("contactinfo"));
desc.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(desc);
System.out.printf("%s is available? %bn", tableName,
admin.isTableAvailable(tableName));
15
16. Client API
import static org.apache.hadoop.hbase.util.Bytes.toBytes;
// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"),
toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"),
toBytes("M")); put.add(toBytes("personal"),
toBytes("surname"), toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.connor@gmail.com")); table.put(put);
table.flushCommits(); table.close();
16
17. Finding Data
• GET (by row key)
• Scan (by row key ranges, filtering)
17
18. Get
// Get a row. Ask for only the data you need.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"),
toBytes("email"));
Result result = table.get(get);
18
19. Update
// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
19
20. Scans
// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes(”jhon-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}
20