SlideShare une entreprise Scribd logo
1  sur  21
Introduction to HBase
Gokuldas K Pillai
@gokool
HBase - The Hadoop Database
• Based on Google’s BigTable (OSDI’06)
• Runs on top of Hadoop but provides real time
read/write access
• Distributed Column Oriented Database
HBase Strengths
• Can scale to billions of rows X millions of
columns
• Relatively cheap & easy to scale
• Random real time access read/write access to
very large data
• Support for update, delete
Who is using it
• StumpleUpon/ su.pr
– Uses Hbase as a realtime data storage and analytics platform
• Twitter
– Distributed read/write backup of all mySQL instances. Powers
“people search”.
• Powerset (Now part of MS)
• Adobe
• Yahoo
• Ning
• Meetup
• More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Key features
• Column Oriented store
– Table costs only for the data stored
– NULLs in rows are free
• Rows stored in sorted order
• Can scale to Petabytes (At Google)
Comparing to RDBMS
• No Joins
• No Query engine
• No transactions
• No column typing
• No SQL, No ODBC/JDBC (Hbql is there now)
Data Model - Tables
• Tables consisting of rows and columns
• Table cells are versioned (by timestamp)
• Tables are sorted by row keys
• Table access is via primary key
• Row updates lock the row no matter how
many columns are involved
Column Families
• Row’s columns are grouped into families
• Column family members identified by a
common ‘printable’ prefix
• Column family should be predefined
– but column family members can be added
dynamically
– member name can be bytes
• All column family members are collocated on
disk
Server Architecture
• Similar to HDFS
– HbaseMaster ~ NameNode
– RegionServer ~ DataNode
• HBase stores state via the Hadoop FS API
• Can persist to :
– Local
– Amazon S3
– HDFS (Default)
HBaseMaster
What it does:
• Bootstrapping a new instance
• Assignment and handling RegionServer problems
– Each region from every table is assigned to a RegionServer
• When machines fail, move regions
• When regions split, move regions to balance
What it does NOT do:
– Handle write requests (Not a DB Master)
– Handle location finding requests (handled by RegionServer)
RegionServer
• Carry the regions
• Handle client read/write requests
• Manage region splits (inform the Master)
Regions
• Horizontal Partitioning
• Every region has a subset of the table’s rows
• Region identified as
– [table, first row(+), last row(-)]
• Table starts on a single region
• Splits into two equal sized regions as the
original region grows bigger and so on..
Zookeeper
• Master election and server availability
• Cluster management
– Assignment transaction state management
• Client contacts ZooKeeper to bootstrap
connection to the Hbase cluster
• Region key ranges, region server addresses
• Guarantees consistency of data across clients
Workflow (Client connecting first time)
• Client  ZooKeeper (returns –ROOT- )
• Client  -ROOT- (returns .META.)
• Client  .META. (returns RegionServer)
• To avoid 3-lookups everytime, client caches
this info.
– Recache on fault
Write/Read Operation
• Write request from Client  RegionServer
 Commit log (on HDFS), memstore
• Flush to filesystem when memstore fills
• Read request from Client  RegionServer
Lookup the memstore if available
If not, lookup flush files (reverse chrono. Order)
Integration
• Java HBase Client API
• High performance Thrift gateway
• A REST-ful Web service gateway (Stargate)
– Supports XML, binary dat encoding options
• Cascading, Hive and Pig integration
• HBase shell (jruby)
• TableInput/TableOutputFormat for MR
Main Classes
• HBaseAdmin
– Create table, drop table, list and alter table
• HTable
– Put
– Get
– Scan
Alternatives to HBase
• Cassandra (From Facebook)
– Based on Amazon’s Dynamo
– No Master-slave but P2P
– Tunable: Consistency Vs Latency
• Yahoo’s PNUTS
– Not Open source
– Works well for multi DC/geographical disbursed servers
References
• Hadoop – The Definitive Guide
• Cloudera website
• http://wiki.hbase.apache.org
• Lars George,
– http://www.larsgeorge.com/2009/10/hbase-architecture-
101-storage.html
• Comparing Hbase, Cassandra and PNUTS
– http://blog.amandeepkhurana.com/2010/05/comparing-
pnuts-hbase-and-cassandra.html
• ACID compliance of Hbase -
http://hbase.apache.org/docs/r0.89.20100621/acid-
semantics.html

Contenu connexe

Tendances

Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBaseFelipe Ferreira
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...IndicThreads
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 
Using database object relational storage
Using database object relational storageUsing database object relational storage
Using database object relational storageDalibor Blazevic
 
Espresso - Shahnawaz Saifi & Kiran Chand - DevOps Bangalore meetup March 28...
Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28...Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28...
Espresso - Shahnawaz Saifi & Kiran Chand - DevOps Bangalore meetup March 28...DevOpsBangalore
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookHadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookCloudera, Inc.
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
Hadoop based etl and solr based semantic search
Hadoop based etl and solr based semantic searchHadoop based etl and solr based semantic search
Hadoop based etl and solr based semantic searchZoltan Varju
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionDong Ngoc
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLzenyk
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
 

Tendances (20)

HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
Using database object relational storage
Using database object relational storageUsing database object relational storage
Using database object relational storage
 
Espresso - Shahnawaz Saifi & Kiran Chand - DevOps Bangalore meetup March 28...
Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28...Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28...
Espresso - Shahnawaz Saifi & Kiran Chand - DevOps Bangalore meetup March 28...
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - FacebookHadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
Hadoop based etl and solr based semantic search
Hadoop based etl and solr based semantic searchHadoop based etl and solr based semantic search
Hadoop based etl and solr based semantic search
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
菜鸟看Hbase
菜鸟看Hbase菜鸟看Hbase
菜鸟看Hbase
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 

En vedette

公司概况介绍 博云森Blog版 20120206
公司概况介绍 博云森Blog版 20120206公司概况介绍 博云森Blog版 20120206
公司概况介绍 博云森Blog版 20120206yuanhangqdsd
 
Canada gabriela power
Canada gabriela powerCanada gabriela power
Canada gabriela poweriyela07
 
Neo4j gokuldaspillai-121018170144-phpapp01
Neo4j gokuldaspillai-121018170144-phpapp01Neo4j gokuldaspillai-121018170144-phpapp01
Neo4j gokuldaspillai-121018170144-phpapp01Gokuldas Pillai
 
Ingles celebration
Ingles celebrationIngles celebration
Ingles celebrationiyela07
 
Coslene charles autobiography
Coslene charles autobiographyCoslene charles autobiography
Coslene charles autobiographycoslene
 
徐远航 企业核心理念介绍 博云森_20110818
徐远航 企业核心理念介绍 博云森_20110818徐远航 企业核心理念介绍 博云森_20110818
徐远航 企业核心理念介绍 博云森_20110818yuanhangqdsd
 
「ほとんど同じ」画像を簡単に整理するために
「ほとんど同じ」画像を簡単に整理するために「ほとんど同じ」画像を簡単に整理するために
「ほとんど同じ」画像を簡単に整理するためにturugina
 
Automating functional testing of Flex applications.
Automating functional testing of Flex applications.Automating functional testing of Flex applications.
Automating functional testing of Flex applications.Gokuldas Pillai
 
Turbo tax presentation_v04
Turbo tax presentation_v04Turbo tax presentation_v04
Turbo tax presentation_v04buckydodd
 
Yapcasia2012 ltthon
Yapcasia2012 ltthonYapcasia2012 ltthon
Yapcasia2012 ltthonturugina
 
女性にモテるためのIT系男子的部屋の片付け術
女性にモテるためのIT系男子的部屋の片付け術女性にモテるためのIT系男子的部屋の片付け術
女性にモテるためのIT系男子的部屋の片付け術やまもと さをん
 
Flextestingautomation 111116190906-phpapp02
Flextestingautomation 111116190906-phpapp02Flextestingautomation 111116190906-phpapp02
Flextestingautomation 111116190906-phpapp02Gokuldas Pillai
 
日常業務にperlを使おう
 日常業務にperlを使おう 日常業務にperlを使おう
日常業務にperlを使おうturugina
 
Terrassa ingles
Terrassa inglesTerrassa ingles
Terrassa inglesiyela07
 
Research Trends: Smart Phone Applications Development,
Research Trends: Smart Phone Applications Development, Research Trends: Smart Phone Applications Development,
Research Trends: Smart Phone Applications Development, Monir Bhuiyan
 
Soal ipa kelas viii bab 1 pertumbuhan dan perkembangan
Soal ipa kelas viii bab 1 pertumbuhan dan perkembanganSoal ipa kelas viii bab 1 pertumbuhan dan perkembangan
Soal ipa kelas viii bab 1 pertumbuhan dan perkembanganuniversitas samawa
 
Penyelesaian soal uts statistika dan probabilitas 2013
Penyelesaian soal uts statistika dan probabilitas 2013Penyelesaian soal uts statistika dan probabilitas 2013
Penyelesaian soal uts statistika dan probabilitas 2013andibutsiawan
 

En vedette (17)

公司概况介绍 博云森Blog版 20120206
公司概况介绍 博云森Blog版 20120206公司概况介绍 博云森Blog版 20120206
公司概况介绍 博云森Blog版 20120206
 
Canada gabriela power
Canada gabriela powerCanada gabriela power
Canada gabriela power
 
Neo4j gokuldaspillai-121018170144-phpapp01
Neo4j gokuldaspillai-121018170144-phpapp01Neo4j gokuldaspillai-121018170144-phpapp01
Neo4j gokuldaspillai-121018170144-phpapp01
 
Ingles celebration
Ingles celebrationIngles celebration
Ingles celebration
 
Coslene charles autobiography
Coslene charles autobiographyCoslene charles autobiography
Coslene charles autobiography
 
徐远航 企业核心理念介绍 博云森_20110818
徐远航 企业核心理念介绍 博云森_20110818徐远航 企业核心理念介绍 博云森_20110818
徐远航 企业核心理念介绍 博云森_20110818
 
「ほとんど同じ」画像を簡単に整理するために
「ほとんど同じ」画像を簡単に整理するために「ほとんど同じ」画像を簡単に整理するために
「ほとんど同じ」画像を簡単に整理するために
 
Automating functional testing of Flex applications.
Automating functional testing of Flex applications.Automating functional testing of Flex applications.
Automating functional testing of Flex applications.
 
Turbo tax presentation_v04
Turbo tax presentation_v04Turbo tax presentation_v04
Turbo tax presentation_v04
 
Yapcasia2012 ltthon
Yapcasia2012 ltthonYapcasia2012 ltthon
Yapcasia2012 ltthon
 
女性にモテるためのIT系男子的部屋の片付け術
女性にモテるためのIT系男子的部屋の片付け術女性にモテるためのIT系男子的部屋の片付け術
女性にモテるためのIT系男子的部屋の片付け術
 
Flextestingautomation 111116190906-phpapp02
Flextestingautomation 111116190906-phpapp02Flextestingautomation 111116190906-phpapp02
Flextestingautomation 111116190906-phpapp02
 
日常業務にperlを使おう
 日常業務にperlを使おう 日常業務にperlを使おう
日常業務にperlを使おう
 
Terrassa ingles
Terrassa inglesTerrassa ingles
Terrassa ingles
 
Research Trends: Smart Phone Applications Development,
Research Trends: Smart Phone Applications Development, Research Trends: Smart Phone Applications Development,
Research Trends: Smart Phone Applications Development,
 
Soal ipa kelas viii bab 1 pertumbuhan dan perkembangan
Soal ipa kelas viii bab 1 pertumbuhan dan perkembanganSoal ipa kelas viii bab 1 pertumbuhan dan perkembangan
Soal ipa kelas viii bab 1 pertumbuhan dan perkembangan
 
Penyelesaian soal uts statistika dan probabilitas 2013
Penyelesaian soal uts statistika dan probabilitas 2013Penyelesaian soal uts statistika dan probabilitas 2013
Penyelesaian soal uts statistika dan probabilitas 2013
 

Similaire à Hbasepreso 111116185419-phpapp02

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 

Similaire à Hbasepreso 111116185419-phpapp02 (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
01 hbase
01 hbase01 hbase
01 hbase
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase
HBaseHBase
HBase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 

Hbasepreso 111116185419-phpapp02

  • 2. HBase - The Hadoop Database • Based on Google’s BigTable (OSDI’06) • Runs on top of Hadoop but provides real time read/write access • Distributed Column Oriented Database
  • 3. HBase Strengths • Can scale to billions of rows X millions of columns • Relatively cheap & easy to scale • Random real time access read/write access to very large data • Support for update, delete
  • 4. Who is using it • StumpleUpon/ su.pr – Uses Hbase as a realtime data storage and analytics platform • Twitter – Distributed read/write backup of all mySQL instances. Powers “people search”. • Powerset (Now part of MS) • Adobe • Yahoo • Ning • Meetup • More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
  • 5. Key features • Column Oriented store – Table costs only for the data stored – NULLs in rows are free • Rows stored in sorted order • Can scale to Petabytes (At Google)
  • 6. Comparing to RDBMS • No Joins • No Query engine • No transactions • No column typing • No SQL, No ODBC/JDBC (Hbql is there now)
  • 7. Data Model - Tables • Tables consisting of rows and columns • Table cells are versioned (by timestamp) • Tables are sorted by row keys • Table access is via primary key • Row updates lock the row no matter how many columns are involved
  • 8. Column Families • Row’s columns are grouped into families • Column family members identified by a common ‘printable’ prefix • Column family should be predefined – but column family members can be added dynamically – member name can be bytes • All column family members are collocated on disk
  • 9.
  • 10.
  • 11. Server Architecture • Similar to HDFS – HbaseMaster ~ NameNode – RegionServer ~ DataNode • HBase stores state via the Hadoop FS API • Can persist to : – Local – Amazon S3 – HDFS (Default)
  • 12. HBaseMaster What it does: • Bootstrapping a new instance • Assignment and handling RegionServer problems – Each region from every table is assigned to a RegionServer • When machines fail, move regions • When regions split, move regions to balance What it does NOT do: – Handle write requests (Not a DB Master) – Handle location finding requests (handled by RegionServer)
  • 13. RegionServer • Carry the regions • Handle client read/write requests • Manage region splits (inform the Master)
  • 14. Regions • Horizontal Partitioning • Every region has a subset of the table’s rows • Region identified as – [table, first row(+), last row(-)] • Table starts on a single region • Splits into two equal sized regions as the original region grows bigger and so on..
  • 15. Zookeeper • Master election and server availability • Cluster management – Assignment transaction state management • Client contacts ZooKeeper to bootstrap connection to the Hbase cluster • Region key ranges, region server addresses • Guarantees consistency of data across clients
  • 16. Workflow (Client connecting first time) • Client  ZooKeeper (returns –ROOT- ) • Client  -ROOT- (returns .META.) • Client  .META. (returns RegionServer) • To avoid 3-lookups everytime, client caches this info. – Recache on fault
  • 17. Write/Read Operation • Write request from Client  RegionServer  Commit log (on HDFS), memstore • Flush to filesystem when memstore fills • Read request from Client  RegionServer Lookup the memstore if available If not, lookup flush files (reverse chrono. Order)
  • 18. Integration • Java HBase Client API • High performance Thrift gateway • A REST-ful Web service gateway (Stargate) – Supports XML, binary dat encoding options • Cascading, Hive and Pig integration • HBase shell (jruby) • TableInput/TableOutputFormat for MR
  • 19. Main Classes • HBaseAdmin – Create table, drop table, list and alter table • HTable – Put – Get – Scan
  • 20. Alternatives to HBase • Cassandra (From Facebook) – Based on Amazon’s Dynamo – No Master-slave but P2P – Tunable: Consistency Vs Latency • Yahoo’s PNUTS – Not Open source – Works well for multi DC/geographical disbursed servers
  • 21. References • Hadoop – The Definitive Guide • Cloudera website • http://wiki.hbase.apache.org • Lars George, – http://www.larsgeorge.com/2009/10/hbase-architecture- 101-storage.html • Comparing Hbase, Cassandra and PNUTS – http://blog.amandeepkhurana.com/2010/05/comparing- pnuts-hbase-and-cassandra.html • ACID compliance of Hbase - http://hbase.apache.org/docs/r0.89.20100621/acid- semantics.html

Notes de l'éditeur

  1. Some are also contributors
  2. Introduce Regions from Tables.
  3. -ROOT- Stores location of the .META. table regions.META. Stores the location of all user regionsEntries have keys as regionName and made up as [tableName, start row, timestamp, hash(1,2,3)]
  4. Writes arriving at a regionserver are first appended to a commit log and then are added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem.The commit log is hosted on HDFS, so it remains available through a regionserver crash.Reading, the region’s memstore is consulted first. If sufficient versions are found read- ingmemstore alone, we return. Otherwise, flush files are consulted in order, from newest to oldest until versions sufficient to satisfy the query are found, or until we run out of flush files.Compaction – merges multiple flush files into one, removes > max. versions and delete expired cells
  5. Add content one row at a time using Htable.put(Put)Create an instance of Put objectSpecify value, target column and optional TimestampRead using the get method Htable.get(Get)Broad : Get all in a rowNarrow : Return only a single cell valueScan table using Scan classCursor like accessHtable.getScanner(Scan)Invoke next on the returned objectGet, Scan return a Result object which is a List of KeyValue objectsDelete using Htable.delete(Delete) Remove individual cells or entire families etc.Put, Get, Delete lock the row.
  6. Cassandra weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state. As the data is replicated, the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.The CAP theorem (Brewer) states that you have to pick two of Consistency, Availability, Partition tolerance: You can't have the three at the same time and get an acceptable latency.