SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
A P A C H E
HBASE
             Scott
          Leberknight
BACKGROUND
Google




Bigtable
"Bigtable is a distributed storage
system for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity
servers. Many projects at Google
store data in Bigtable including web
indexing, Google Earth, and Google
Finance."


                  - Bigtable: A Distributed Storage System
                                        for Structured Data
                                 http://labs.google.com/papers/bigtable.html
"A Bigtable is a sparse, distributed, persistent
                    multidimensional sorted map"



               - Bigtable: A Distributed Storage System
                                     for Structured Data
                              http://labs.google.com/papers/bigtable.html
wtf?
distributed


    sparse


column-oriented


   versioned
The map is indexed by a row key,
column key, and a timestamp; each
value in the map is an uninterpreted array
of bytes.
                   - Bigtable: A Distributed Storage System
                                         for Structured Data
                       http://labs.google.com/papers/bigtable.html




 (row key, column key, timestamp) => value
Key Concepts:
row key => 20120407152657

column family => "personal:"

column key => "personal:givenName",
              "personal:surname"

timestamp => 1239124584398
Row Key       Timestamp         Column Family "info:"                ColumN Family
                                                                          "content:"
20120407145045      t7       "info:summary"     "An intro to..."
                    t6        "info:author"       "John Doe"
                    t5                                               "Google's Bigtable is..."
                    t4                                               "Google Bigtable is..."
                    t3       "info:category"     "Persistence"
                    t2        "info:author"          "John"
                    t1         "info:title"    "Intro to Bigtable"
20120320162535      t4       "info:category"     "Persistence"
                    t3                                                   "CouchDB is..."
                    t2        "info:author"       "Bob Smith"
                    t1         "info:title"    "Doc-oriented..."
Get row 20120407145045...
   Row Key       Timestamp         Column Family "info:"                Column Family
                                                                          "content:"
20120407145045      t7       "info:summary"     "An intro to..."
                    t6        "info:author"       "John Doe"
                    t5                                               "Google's Bigtable is..."
                    t4                                               "Google Bigtable is..."
                    t3       "info:category"     "Persistence"
                    t2        "info:author"          "John"
                    t1         "info:title"    "Intro to Bigtable"
20120320162535      t4       "info:category"     "Persistence"
                    t3                                                   "CouchDB is..."
                    t2        "info:author"       "Bob Smith"
                    t1         "info:title"    "Doc-oriented..."
Use HBase when you need random, realtime read/
write access to your Big Data. This project's goal is the
hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.

                                   - http://hbase.apache.org/
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented
storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN                       CELL
 content:                    timestamp=1239135042862, value=CouchDB is a doc...
 info:author                 timestamp=1239135042755, value=Bob Smith
 info:category               timestamp=1239135042982, value=Persistence
 info:title                  timestamp=1239135042623, value=Document-oriented...
4 row(s) in 0.0140 seconds
HBase Shell



hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW                     COLUMN+CELL
 20120320162535         column=content:, timestamp=1239135042862, value=CouchDB is...
 20120320162535         column=info:author, timestamp=1239135042755, value=Bob Smith
 20120320162535         column=info:category, timestamp=1239135042982, value=Persistence
 20120320162535         column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
Got byte[]?
// Create a new table
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);

String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName);
desc.addFamily(new HColumnDescriptor("personal"));
desc.addFamily(new HColumnDescriptor("contactinfo"));
desc.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(desc);

System.out.printf("%s is available? %bn",
  tableName, admin.isTableAvailable(tableName));
import static org.apache.hadoop.hbase.util.Bytes.toBytes;

// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"),
        toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"), toBytes("M"));
put.add(toBytes("personal"), toBytes("surname"),
        toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"),
        toBytes("john.connor@gmail.com"));
table.put(put);
table.flushCommits();
table.close();
Finding data:

    get (by row key)


    scan (by row key ranges, filtering)
// Get a row. Ask for only the data you need.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);
// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
        toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
        toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
        toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes("smith-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner sacnner = table.getScanner(scan);
for (Result result : scanner) {
  // process result...
}
DAta Modeling


   Row key design


   MATCH TO DATA ACCESS PATTERNS


   WIDE VS. NARROW ROWS
REferences


                   shop.oreilly.com/product/0636920014348.do




                                     http://shop.oreilly.com/product/0636920021773.do
                                     (3rd edition pub date is May 29, 2012)
hbase.apache.org
(my info)




scott.leberknight at nearinfinity.com
www.nearinfinity.com/blogs/
twitter: sleberknight

Contenu connexe

Tendances

Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Markus Lanthaler
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 

Tendances (20)

Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
A Semantic Description Language for RESTful Data Services to Combat Semaphobia
A Semantic Description Language for RESTful Data Services to Combat SemaphobiaA Semantic Description Language for RESTful Data Services to Combat Semaphobia
A Semantic Description Language for RESTful Data Services to Combat Semaphobia
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
NOSQL: il rinascimento dei database?
NOSQL: il rinascimento dei database?NOSQL: il rinascimento dei database?
NOSQL: il rinascimento dei database?
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 

Similaire à HBase Lightning Talk

Similaire à HBase Lightning Talk (20)

Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
Forbes MongoNYC 2011
Forbes MongoNYC 2011Forbes MongoNYC 2011
Forbes MongoNYC 2011
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databases
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision Maker
 
Event stream processing using Kafka streams
Event stream processing using Kafka streamsEvent stream processing using Kafka streams
Event stream processing using Kafka streams
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Mongo db presentation
Mongo db presentationMongo db presentation
Mongo db presentation
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
Valtech  - Big Data & NoSQL : au-delà du nouveau buzzValtech  - Big Data & NoSQL : au-delà du nouveau buzz
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
 
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new 2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
 

Plus de Scott Leberknight

Plus de Scott Leberknight (20)

JShell & ki
JShell & kiJShell & ki
JShell & ki
 
JUnit Pioneer
JUnit PioneerJUnit Pioneer
JUnit Pioneer
 
JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)
 
Unit Testing
Unit TestingUnit Testing
Unit Testing
 
SDKMAN!
SDKMAN!SDKMAN!
SDKMAN!
 
JUnit 5
JUnit 5JUnit 5
JUnit 5
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Dropwizard
DropwizardDropwizard
Dropwizard
 
RESTful Web Services with Jersey
RESTful Web Services with JerseyRESTful Web Services with Jersey
RESTful Web Services with Jersey
 
httpie
httpiehttpie
httpie
 
jps & jvmtop
jps & jvmtopjps & jvmtop
jps & jvmtop
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
Google Guava
Google GuavaGoogle Guava
Google Guava
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
iOS
iOSiOS
iOS
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Hadoop
HadoopHadoop
Hadoop
 
wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

HBase Lightning Talk

  • 1. A P A C H E HBASE Scott Leberknight
  • 4. "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance." - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 5. "A Bigtable is a sparse, distributed, persistent multidimensional sorted map" - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 7. distributed sparse column-oriented versioned
  • 8. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html (row key, column key, timestamp) => value
  • 9. Key Concepts: row key => 20120407152657 column family => "personal:" column key => "personal:givenName", "personal:surname" timestamp => 1239124584398
  • 10. Row Key Timestamp Column Family "info:" ColumN Family "content:" 20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Google's Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable" 20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  • 11. Get row 20120407145045... Row Key Timestamp Column Family "info:" Column Family "content:" 20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Google's Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable" 20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  • 12. Use HBase when you need random, realtime read/ write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. - http://hbase.apache.org/
  • 13. HBase Shell hbase(main):001:0> create 'blog', 'info', 'content' 0 row(s) in 4.3640 seconds hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB' 0 row(s) in 0.0330 seconds hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith' 0 row(s) in 0.0030 seconds hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a document-oriented...' 0 row(s) in 0.0030 seconds hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence' 0 row(s) in 0.0030 seconds hbase(main):006:0> get 'blog', '20120320162535' COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds
  • 14. HBase Shell hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 } timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 seconds hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' } ROW COLUMN+CELL 20120320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20120320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20120320162535 column=info:category, timestamp=1239135042982, value=Persistence 20120320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds
  • 16. // Create a new table Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); String tableName = "people"; HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc); System.out.printf("%s is available? %bn", tableName, admin.isTableAvailable(tableName));
  • 17. import static org.apache.hadoop.hbase.util.Bytes.toBytes; // Add some data into 'people' table Configuration conf = HBaseConfiguration.create(); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("givenName"), toBytes("John")); put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.connor@gmail.com")); table.put(put); table.flushCommits(); table.close();
  • 18. Finding data: get (by row key) scan (by row key ranges, filtering)
  • 19. // Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal")); get.addColumn(toBytes("contactinfo"), toBytes("email")); Result result = table.get(get);
  • 20. // Update existing values, and add a new one Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.m.smith@gmail.com")); put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA")); table.put(put); table.flushCommits(); table.close();
  • 21. // Scan rows... Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Scan scan = new Scan(toBytes("smith-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner sacnner = table.getScanner(scan); for (Result result : scanner) { // process result... }
  • 22. DAta Modeling Row key design MATCH TO DATA ACCESS PATTERNS WIDE VS. NARROW ROWS
  • 23. REferences shop.oreilly.com/product/0636920014348.do http://shop.oreilly.com/product/0636920021773.do (3rd edition pub date is May 29, 2012) hbase.apache.org
  • 24. (my info) scott.leberknight at nearinfinity.com www.nearinfinity.com/blogs/ twitter: sleberknight