SlideShare a Scribd company logo
1 of 18
Introduction to HBase
Anil Gupta
@bigdatanoob
What is NoSql?
RDBMS vs NoSql
HBase
HBase Components
Architecture
HBase Cluster
HBase Data Model
Key -> Value
Region
Outline
NoSQL is acronym for Not Only SQL. These databases are
non-relational. This term was coined in 1998.
They do not use SQL as their primary language.
NoSQL is not a replacement of Relational
Database.
NoSQL is designed for distributed data stores
NoSQL was designed to store semi-structured
and sparse data
NoSQL RDBMS
Hardware Farm of Commodity(upto
several thousand)
1-3 High End or
Proprietary(costly)
Data Type Semi-structured and
Sparse
Structured and dense
Data Size PetaBytes(1015) TeraBytes(1012 bytes)
Auto-Sharding Yes No
Flexible Schema Yes No
Referential Integrity No Yes
Support for Joins No Yes
Support for Aggregations Basic Advance
HBase is an open-source, distributed, versioned,
key-value database modeled after Google's
Bigtable.
is optional for
HBase has real-time read/writes(in milliseconds)
HBase is highly fault tolerant(HA) and scalable
+ Random Read/Write
access= + Apache
Zookeeper
Selling Points of HBase
Highly Scalable
Auto-sharding
Strongly Consistent
Out of the box support for Historical Data
Very high read throughput
Readily compatible with Hadoop
Highly Fault-tolerant(HA)
HBase Components
1. HBase Master(HMaster): HMaster is the
Master Server.
 HMaster is responsible for monitoring all
RegionServers
 Performs load balancing a.k.a sharding
 Assigns regions to RegionServers
 All the metadata changes go through Master
 Periodically checks and cleans up the .META.
table
 Multiple HMaster can run in cluster but only one
HMaster will be active at any time.
HBase Components(cont.)
2. RegionServer(HRegionServer):
HRegionServer is the implementation of the
worker module.
 Runs as Java Service on worker nodes.
 Machine running a RegionServer is considered
a worker node.
 Serves get/put/scan requests
 Responsible for splitting and compacting regions
 Runs on DataNode
 Multiple RegionServers run in a cluster
Zookeeper in HBase
ZooKeeper: It allows distributed processes to
coordinate with each other through a shared
hierarchical name space. It is distributed and
highly reliable service.
In HBase it is responsible for following:
 Provide availability status of RegionServers
 To ensure single active HMaster in the cluster
 Provide location of “-ROOT-” table
 Selection of new HMaster in case of failure of
an active HMaster
HBase Architecture
HBase Cluster
Worker
Node
Worker
Node
Worker Node
DataNodeDataNode
TaskTracker
HRegionServe
r
DataNode
TaskTracker
HRegionServe
r
Worker Node
DataNode
Worker Node
DataNode
RegionServer
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Name
Node
HMaster
Zoo
keeper
HMaster
RegionServer
RegionServer
RegionServerRegionServer
RegionServerRegionServer
Name
Node
Column Family and Column Qualifier
Column Family: Columns Qualifiers in HBase are grouped
into column families.
The colon character (:) delimits the column qualifier family
from the column family.
Combination of <Column Family>: <Column Qualifier> is
equivalent to a Column name.
Physically, all column qualifiers of a column family are stored
together on the file system.
• Column Qualifiers within a family are sorted lexicographically and
stored together
Example: txn:amt , Here “txn” is the Column Family and “amt” is
the Column Qualifier.
HBase Data Model
• Table maintains data in lexicographic order by RowKey.
• Everything except table names are stored as byte array
• Only column families are defined at the creation time of table
 Each family can have any number of columns(to a
maximum of few millions)
 Each row can have different columns in a column family
 Each column consists of any number of versions
 Columns only exist when inserted because HBase does
not have NULL values
(RowKey, Column Family:Column Qualifier,
Timestamp) is a “Key” in HBase.
“Value” is stored corresponding to a “Key”
Timestamp is used to support storing of Historical
Data
Table is always indexed on RowKey
Key -> Value in HBase
Region
Tables in HBase are divided into multiple Regions.
1 Region = 1 Partition of Table
Regions are hosted by RegionServers
1 RegionServer can host 100’s of Regions
RegionServer can host Regions from multiple
tables.
After a major compaction, every region has 1 HFile
for each column family.
Random Facts About
HBase
Data in HBase is stored in HFile Format
Values are stored as Byte Array in HFiles
HLog is the file format used for storing “Write
Ahead Logging” in HBase.
References
http://hbase.apache.org/
https://hadoop.apache.org/
http://www.larsgeorge.com/2009/10/hbase-
architecture-101-storage.html
Questions?

More Related Content

What's hot

What's hot (20)

Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
HBase
HBaseHBase
HBase
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 

Viewers also liked

Viewers also liked (6)

HiveServer2
HiveServer2HiveServer2
HiveServer2
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
GFS
GFSGFS
GFS
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 

Similar to Introduction To HBase

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsRavindra kumar
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introductionyangwm
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 

Similar to Introduction To HBase (20)

Hbase
HbaseHbase
Hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase.pptx
Hbase.pptxHbase.pptx
Hbase.pptx
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
01 hbase
01 hbase01 hbase
01 hbase
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hbase
HbaseHbase
Hbase
 
Hbase
HbaseHbase
Hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Hbase
HbaseHbase
Hbase
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

Introduction To HBase

  • 1. Introduction to HBase Anil Gupta @bigdatanoob
  • 2. What is NoSql? RDBMS vs NoSql HBase HBase Components Architecture HBase Cluster HBase Data Model Key -> Value Region Outline
  • 3. NoSQL is acronym for Not Only SQL. These databases are non-relational. This term was coined in 1998. They do not use SQL as their primary language. NoSQL is not a replacement of Relational Database. NoSQL is designed for distributed data stores NoSQL was designed to store semi-structured and sparse data
  • 4. NoSQL RDBMS Hardware Farm of Commodity(upto several thousand) 1-3 High End or Proprietary(costly) Data Type Semi-structured and Sparse Structured and dense Data Size PetaBytes(1015) TeraBytes(1012 bytes) Auto-Sharding Yes No Flexible Schema Yes No Referential Integrity No Yes Support for Joins No Yes Support for Aggregations Basic Advance
  • 5. HBase is an open-source, distributed, versioned, key-value database modeled after Google's Bigtable. is optional for HBase has real-time read/writes(in milliseconds) HBase is highly fault tolerant(HA) and scalable + Random Read/Write access= + Apache Zookeeper
  • 6. Selling Points of HBase Highly Scalable Auto-sharding Strongly Consistent Out of the box support for Historical Data Very high read throughput Readily compatible with Hadoop Highly Fault-tolerant(HA)
  • 7. HBase Components 1. HBase Master(HMaster): HMaster is the Master Server.  HMaster is responsible for monitoring all RegionServers  Performs load balancing a.k.a sharding  Assigns regions to RegionServers  All the metadata changes go through Master  Periodically checks and cleans up the .META. table  Multiple HMaster can run in cluster but only one HMaster will be active at any time.
  • 8. HBase Components(cont.) 2. RegionServer(HRegionServer): HRegionServer is the implementation of the worker module.  Runs as Java Service on worker nodes.  Machine running a RegionServer is considered a worker node.  Serves get/put/scan requests  Responsible for splitting and compacting regions  Runs on DataNode  Multiple RegionServers run in a cluster
  • 9. Zookeeper in HBase ZooKeeper: It allows distributed processes to coordinate with each other through a shared hierarchical name space. It is distributed and highly reliable service. In HBase it is responsible for following:  Provide availability status of RegionServers  To ensure single active HMaster in the cluster  Provide location of “-ROOT-” table  Selection of new HMaster in case of failure of an active HMaster
  • 11. HBase Cluster Worker Node Worker Node Worker Node DataNodeDataNode TaskTracker HRegionServe r DataNode TaskTracker HRegionServe r Worker Node DataNode Worker Node DataNode RegionServer Worker Node DataNode Worker Node DataNode Worker Node DataNode Worker Node DataNode Name Node HMaster Zoo keeper HMaster RegionServer RegionServer RegionServerRegionServer RegionServerRegionServer Name Node
  • 12. Column Family and Column Qualifier Column Family: Columns Qualifiers in HBase are grouped into column families. The colon character (:) delimits the column qualifier family from the column family. Combination of <Column Family>: <Column Qualifier> is equivalent to a Column name. Physically, all column qualifiers of a column family are stored together on the file system. • Column Qualifiers within a family are sorted lexicographically and stored together Example: txn:amt , Here “txn” is the Column Family and “amt” is the Column Qualifier.
  • 13. HBase Data Model • Table maintains data in lexicographic order by RowKey. • Everything except table names are stored as byte array • Only column families are defined at the creation time of table  Each family can have any number of columns(to a maximum of few millions)  Each row can have different columns in a column family  Each column consists of any number of versions  Columns only exist when inserted because HBase does not have NULL values
  • 14. (RowKey, Column Family:Column Qualifier, Timestamp) is a “Key” in HBase. “Value” is stored corresponding to a “Key” Timestamp is used to support storing of Historical Data Table is always indexed on RowKey Key -> Value in HBase
  • 15. Region Tables in HBase are divided into multiple Regions. 1 Region = 1 Partition of Table Regions are hosted by RegionServers 1 RegionServer can host 100’s of Regions RegionServer can host Regions from multiple tables. After a major compaction, every region has 1 HFile for each column family.
  • 16. Random Facts About HBase Data in HBase is stored in HFile Format Values are stored as Byte Array in HFiles HLog is the file format used for storing “Write Ahead Logging” in HBase.