SlideShare a Scribd company logo
1 of 10
Download to read offline
Prepared by,
Vetri.V
WHAT IS HBASE?
 HBase is a database: the Hadoop database.It is indexed by rowkey, column key, and
timestamp.
 HBase stores structured and semistructured data naturally so you can load it with
tweets and parsed log files and a catalog of all your products right along with their
customer reviews.
 It can store unstructured data too, as long as it’s not too large
 HBase is designed to run on a cluster of computers instead of a single computer.The
cluster can be built using commodity hardware; HBase scales horizontally as you
add more machines to the cluster.
 Each node in the cluster provides a bit of storage, a bit of cache, and a bit of
computation as well. This makes HBase incredibly flexible and forgiving. No node is
unique, so if one of those machines breaks down, you simply replace it with another.
 This adds up to a powerful, scalable approach to data that,until now, hasn’t been
commonly available to mere mortals.
HBASE DATA MODEL:
Hbase Data model - These six concepts form the foundation of HBase.
Table:
 HBase organizes data into tables. Table names are Strings and composed of
characters that are safe for use in a file system path.
Row :
 Within a table, data is stored according to its row. Rows are identified uniquely by
their rowkey. Rowkeys don’t have a data type and are always treated as a
byte[].
Column family:
 Data within a row is grouped by column family. Column families also impact the
physical arrangement of data stored in HBase.
 For this reason, they must be defined up front and aren’t easily modified. Every row
in a table has the same column families, although a row need not store data in all its
families. Column family names are Strings and composed of characters that are safe
for use in a file system path.
Column qualifier:
 Data within a column family is addressed via its column qualifier,or column. Column
qualifiers need not be specified in advance. Column qualifiers
need not be consistent between rows.
 Like rowkeys, column qualifiers don’t have a data type and are always treated as a
byte[].
Prepared by,
Vetri.V
Cell:
 A combination of rowkey, column family, and column qualifier uniquely identifies a
cell. The data stored in a cell is referred to as that cell’s value. Values
also don’t have a data type and are always treated as a byte[].
Version:
 Values within a cell are versioned. Versions are identified by their timestamp,a long.
When a version isn’t specified, the current timestamp is used as the
basis for the operation. The number of cell value versions retained by HBase is
configured via the column family. The default number of cell versions is three.
Hbase Architecture
HBase Tables and Regions
Table is made up of any number of regions.
Region is specified by its startKey and endKey.
 Empty table: (Table, NULL, NULL)
 Two-region table: (Table, NULL, “com.ABC.www”) and (Table, “com.ABC.www”,
NULL)
Each region may live on a different node and is made up of several HDFS files and blocks,
each of which is replicated by Hadoop
HBase Tables:-
 Tables are sorted by Row in lexicographical order
 Table schema only defines its column families
 Each family consists of any number of columns
 Each column consists of any number of versions
 Columns only exist when inserted, NULLs are free
 Columns within a family are sorted and stored together
 Everything except table names are byte[]
Prepared by,
Vetri.V
 Hbase Table format (Row, Family:Column, Timestamp) -> Value
HBase uses HDFS as its reliable storage layer.It Handles checksums, replication, failover
Hbase consists of,
 Java API, Gateway for REST, Thrift, Avro
 Master manages cluster
 RegionServer manage data
 ZooKeeper is used the “neural network” and coordinates cluster
Data is stored in memory and flushed to disk on regular intervals or based on size
 Small flushes are merged in the background to keep number of files small
 Reads read memory stores first and then disk based files second
 Deletes are handled with “tombstone” markers
MemStores:-
After data is written to the WAL the RegionServer saves KeyValues in memory store
 Flush to disk based on size, is hbase.hregion.memstore.flush.size
 Default size is 64MB
 Uses snapshot mechanism to write flush to disk while still serving from it and
accepting new data at the same time
Compactions:-
Two types: Minor and Major Compactions
Minor Compactions
 Combine last “few” flushes
 Triggered by number of storage files
Major Compactions
 Rewrite all storage files
 Drop deleted data and those values exceeding TTL and/or number of versions
Key Cardinality:-
The best performance is gained from using row keys
Prepared by,
Vetri.V
 Time range bound reads can skip store files
 So can Bloom Filters
 Selecting column families reduces the amount of data to be scanned
Fold, Store, and Shift:-
All values are stored with the full coordinates,including: Row Key, Column Family, Column
Qualifier, and Timestamp
 Folds columns into “row per column”
 NULLs are cost free as nothing is stored
 Versions are multiple “rows” in folded table
DDI:-
Stands for Denormalization, Duplication and Intelligent Keys
Block Cache
Region Splits
Hbase shell and Commands
Hbase Install
$ mkdir hbase-install
$ cd hbase-install
$ wget http://apache.claz.org/hbase/hbase-0.92.1/hbase-0.92.1.tar.gz
Prepared by,
Vetri.V
$ tar xvfz hbase-0.92.1.tar.gz
$HBASE_HOME/bin/start-hbase.sh
configuration changes in Hbase
 Go to hbase-env.sh
 Edit JAVA_HOME
 Next go to hdfs-site.xml and edit the following:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://eattributes:54310/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
<!--
<property>
<name>hbase.master</name>
<value>master:60000</value>
<description>The host and port that the HBase master runs at.
</description>
</property>
-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The host and port that the HBase master runs at.
</description>
</property>
</configuration>
Starting hbase shell:
$ hbase shell
hbase(main):001:0> list
TABLE
Prepared by,
Vetri.V
0 row(s) in 0.5710 seconds
General HBase shell commands:
 Show cluster status. Can be ‘summary’, ‘simple’, or ‘detailed’. The
default is ‘summary’.
hbase> status
hbase> status ‘simple’
hbase> status ‘summary’
hbase> status ‘detailed’
hbase> version
hbase>whoami
Tables Management commands:
Create a table
hbase(main):002:0> create 'mytable', 'cf'
hbase(main):003:0> list
TABLE
mytable
1 row(s) in 0.0080 seconds
WRITING DATA
hbase(main):004:0> put 'mytable', 'first', 'cf:message', 'hello HBase'
READING DATA
hbase(main):007:0> get 'mytable', 'first'
hbase(main):008:0> scan 'mytable'
describe table
hbase(main):003:0> describe 'users'
DESCRIPTION ENABLED
{NAME => 'users', FAMILIES => [{NAME => 'info', true ,BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0 , COMPRESSION => 'NONE', VERSIONS => '3', TTL=>
'2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0330 seconds
Disable:
hbase> disable ‘users’
Prepared by,
Vetri.V
Disable_all:
Disable all of tables matching the given regex
hbase>disable_all ‘users.*’
Is_Disabled:
verifies Is named table disabled
hbase>is_disabled ‘users.*’
Drop:
Drop the named table. Table must first be disabled
hbase> drop ‘users’
drop_all:
Drop all of the tables matching the given regex
hbase>drop_all ‘users.*’
Enable:
hbase> enable ‘users’
enable_all:
hbase>enable_all ‘users.*’
is_enabled:
hbase>is_enabled ‘users.*’
exists:
hbase> exists ‘users.*’
list:
hbase> list
hbase> list ‘abc.*’
show_filters:
Show all the filters in hbase.
Prepared by,
Vetri.V
Count:
 Count the number of rows in a table. Return value is the number of rows.
This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar
hbase.jar rowcount’ to run a counting mapreduce job).
 Current count is shown every 1000 rows by default. Count interval may be
optionally specified. Scan caching is enabled on count scans by default. Default cache
size is 10 rows.
If your rows are small in size, you may want to increase this
parameter. Examples:hbase> count ‘users.*’
hbase> count ‘users.*’, INTERVAL => 100000
hbase> count ‘users.*’, CACHE => 1000
hbase> count ‘users.*’, INTERVAL => 10, CACHE => 1000
Put:
hbase> put ‘users, ‘r1, ‘c1’, ‘value’, ts1
Configurable block size
hbase(main):002:0> create 'mytable',{NAME => 'colfam1', BLOCKSIZE => '65536'}
Block cache:
 Workloads don’t benefit from putting data into a read cache—for instance, if a
certain table or column family in a table is only accessed for sequential scans or
isn’t accessed a lot and you don’t care if Gets or Scans take a little longer.
 By default, the block cache is enabled. You can disable it at the time of table
creationor by altering the table:
hbase(main):002:0> create 'mytable',{NAME => 'colfam1', BLOCKCACHE =>
'false’}
Aggressive caching:
 You can choose some column families to have a higher priority in the block
cache (LRU cache).
 This comes in handy if you expect more random reads on one column
family compared to another. This configuration is also done at table-
instantiation time:
hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', IN_MEMORY => 'true'}
The default value for the IN_MEMORY parameter is false.
Prepared by,
Vetri.V
Bloom filters:
hbase(main):007:0> create 'mytable',{NAME => 'colfam1', BLOOMFILTER =>
'ROWCOL'}
 The default value for the BLOOMFILTER parameter is NONE.
 A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is
enabled with ROWCOL.
 The rowlevel bloom filter checks for the non-existence of the particular rowkey in
the block,and the qualifier-level bloom filter checks for the non-existence of the row
and column qualifier combination.
 The overhead of the ROWCOL bloom filter is higher than that of the ROW bloom
filter.
TTL (Time To Live):
 can set the TTL while creating the table like this:
hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}
This command sets the TTL on the column family colfam1 as 18,000 seconds = 5
hours. Data in colfam1 that is older than 5 hours is deleted during the next major
compaction.
Compression
 Can enable compression on a column family when creating tables like this:
hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', COMPRESSION => 'SNAPPY'}
Note that data is compressed only on disk. It’s kept uncompressed in memory
(Mem-Store or block cache) or while transferring over the network.
Cell versioning:
 Versions are also configurable at a column family level and can be specified at
the time of table instantiation:
hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 1}
hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', VERSIONS => 1, TTL => '18000'}
hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5,
MIN_VERSIONS => '1'}
Description of a table:
Prepared by,
Vetri.V
hbase(main):004:0> describe 'follows'
DESCRIPTION ENABLED
{NAME => 'follows', coprocessor$1 => 'file:///U true
users/ndimiduk/repos/hbaseia twitbase/target/twitbase-
1.0.0.jar|HBaseIA.TwitBase.coprocessors.FollowsObserver|1001|', FAMILIES =>
[{NAME => 'f', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>'0', VERSIONS
=> '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0330 seconds
Tuning HBase:
hbase(main):003:0> help 'status'
SPLITTING TABLES:
hbase(main):019:0> split 'mytable' , 'G'
Alter table
hbase(main):020:0> alter 't', NAME => 'f', VERSIONS => 1
TRUNCATING TABLES:
hbase(main):023:0> truncate 't'
Truncating 't' table (it may take a while):
- Disabling table...
- Dropping table...
- Creating table...
0 row(s) in 14.3190 seconds
THANK YOU…

More Related Content

What's hot

What's hot (19)

Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Introduction Mysql
Introduction Mysql Introduction Mysql
Introduction Mysql
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
Postgresql
PostgresqlPostgresql
Postgresql
 
lab56_db
lab56_dblab56_db
lab56_db
 
MySQL lecture
MySQL lectureMySQL lecture
MySQL lecture
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Mysql all
Mysql allMysql all
Mysql all
 
Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
MySQL
MySQLMySQL
MySQL
 
MYSQL - PHP Database Connectivity
MYSQL - PHP Database ConnectivityMYSQL - PHP Database Connectivity
MYSQL - PHP Database Connectivity
 
Mysql
MysqlMysql
Mysql
 
My sql technical reference manual
My sql technical reference manualMy sql technical reference manual
My sql technical reference manual
 
Mysql ppt
Mysql pptMysql ppt
Mysql ppt
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
April 2013 HUG: HBase as a Service at Yahoo!
April 2013 HUG: HBase as a Service at Yahoo!April 2013 HUG: HBase as a Service at Yahoo!
April 2013 HUG: HBase as a Service at Yahoo!
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 

Similar to Hbase

Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsRavindra kumar
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designphanleson
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comEdward D. Kim
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 

Similar to Hbase (20)

HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
Hbase
HbaseHbase
Hbase
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
01 hbase
01 hbase01 hbase
01 hbase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hive
HiveHive
Hive
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase
HbaseHbase
Hbase
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Hbase

  • 1. Prepared by, Vetri.V WHAT IS HBASE?  HBase is a database: the Hadoop database.It is indexed by rowkey, column key, and timestamp.  HBase stores structured and semistructured data naturally so you can load it with tweets and parsed log files and a catalog of all your products right along with their customer reviews.  It can store unstructured data too, as long as it’s not too large  HBase is designed to run on a cluster of computers instead of a single computer.The cluster can be built using commodity hardware; HBase scales horizontally as you add more machines to the cluster.  Each node in the cluster provides a bit of storage, a bit of cache, and a bit of computation as well. This makes HBase incredibly flexible and forgiving. No node is unique, so if one of those machines breaks down, you simply replace it with another.  This adds up to a powerful, scalable approach to data that,until now, hasn’t been commonly available to mere mortals. HBASE DATA MODEL: Hbase Data model - These six concepts form the foundation of HBase. Table:  HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path. Row :  Within a table, data is stored according to its row. Rows are identified uniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a byte[]. Column family:  Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase.  For this reason, they must be defined up front and aren’t easily modified. Every row in a table has the same column families, although a row need not store data in all its families. Column family names are Strings and composed of characters that are safe for use in a file system path. Column qualifier:  Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows.  Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
  • 2. Prepared by, Vetri.V Cell:  A combination of rowkey, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values also don’t have a data type and are always treated as a byte[]. Version:  Values within a cell are versioned. Versions are identified by their timestamp,a long. When a version isn’t specified, the current timestamp is used as the basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Hbase Architecture HBase Tables and Regions Table is made up of any number of regions. Region is specified by its startKey and endKey.  Empty table: (Table, NULL, NULL)  Two-region table: (Table, NULL, “com.ABC.www”) and (Table, “com.ABC.www”, NULL) Each region may live on a different node and is made up of several HDFS files and blocks, each of which is replicated by Hadoop HBase Tables:-  Tables are sorted by Row in lexicographical order  Table schema only defines its column families  Each family consists of any number of columns  Each column consists of any number of versions  Columns only exist when inserted, NULLs are free  Columns within a family are sorted and stored together  Everything except table names are byte[]
  • 3. Prepared by, Vetri.V  Hbase Table format (Row, Family:Column, Timestamp) -> Value HBase uses HDFS as its reliable storage layer.It Handles checksums, replication, failover Hbase consists of,  Java API, Gateway for REST, Thrift, Avro  Master manages cluster  RegionServer manage data  ZooKeeper is used the “neural network” and coordinates cluster Data is stored in memory and flushed to disk on regular intervals or based on size  Small flushes are merged in the background to keep number of files small  Reads read memory stores first and then disk based files second  Deletes are handled with “tombstone” markers MemStores:- After data is written to the WAL the RegionServer saves KeyValues in memory store  Flush to disk based on size, is hbase.hregion.memstore.flush.size  Default size is 64MB  Uses snapshot mechanism to write flush to disk while still serving from it and accepting new data at the same time Compactions:- Two types: Minor and Major Compactions Minor Compactions  Combine last “few” flushes  Triggered by number of storage files Major Compactions  Rewrite all storage files  Drop deleted data and those values exceeding TTL and/or number of versions Key Cardinality:- The best performance is gained from using row keys
  • 4. Prepared by, Vetri.V  Time range bound reads can skip store files  So can Bloom Filters  Selecting column families reduces the amount of data to be scanned Fold, Store, and Shift:- All values are stored with the full coordinates,including: Row Key, Column Family, Column Qualifier, and Timestamp  Folds columns into “row per column”  NULLs are cost free as nothing is stored  Versions are multiple “rows” in folded table DDI:- Stands for Denormalization, Duplication and Intelligent Keys Block Cache Region Splits Hbase shell and Commands Hbase Install $ mkdir hbase-install $ cd hbase-install $ wget http://apache.claz.org/hbase/hbase-0.92.1/hbase-0.92.1.tar.gz
  • 5. Prepared by, Vetri.V $ tar xvfz hbase-0.92.1.tar.gz $HBASE_HOME/bin/start-hbase.sh configuration changes in Hbase  Go to hbase-env.sh  Edit JAVA_HOME  Next go to hdfs-site.xml and edit the following: <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://eattributes:54310/hbase</value> <description>The directory shared by region servers. Should be fully-qualified to include the filesystem to use. E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR </description> </property> <!-- <property> <name>hbase.master</name> <value>master:60000</value> <description>The host and port that the HBase master runs at. </description> </property> --> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The host and port that the HBase master runs at. </description> </property> </configuration> Starting hbase shell: $ hbase shell hbase(main):001:0> list TABLE
  • 6. Prepared by, Vetri.V 0 row(s) in 0.5710 seconds General HBase shell commands:  Show cluster status. Can be ‘summary’, ‘simple’, or ‘detailed’. The default is ‘summary’. hbase> status hbase> status ‘simple’ hbase> status ‘summary’ hbase> status ‘detailed’ hbase> version hbase>whoami Tables Management commands: Create a table hbase(main):002:0> create 'mytable', 'cf' hbase(main):003:0> list TABLE mytable 1 row(s) in 0.0080 seconds WRITING DATA hbase(main):004:0> put 'mytable', 'first', 'cf:message', 'hello HBase' READING DATA hbase(main):007:0> get 'mytable', 'first' hbase(main):008:0> scan 'mytable' describe table hbase(main):003:0> describe 'users' DESCRIPTION ENABLED {NAME => 'users', FAMILIES => [{NAME => 'info', true ,BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0 , COMPRESSION => 'NONE', VERSIONS => '3', TTL=> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0330 seconds Disable: hbase> disable ‘users’
  • 7. Prepared by, Vetri.V Disable_all: Disable all of tables matching the given regex hbase>disable_all ‘users.*’ Is_Disabled: verifies Is named table disabled hbase>is_disabled ‘users.*’ Drop: Drop the named table. Table must first be disabled hbase> drop ‘users’ drop_all: Drop all of the tables matching the given regex hbase>drop_all ‘users.*’ Enable: hbase> enable ‘users’ enable_all: hbase>enable_all ‘users.*’ is_enabled: hbase>is_enabled ‘users.*’ exists: hbase> exists ‘users.*’ list: hbase> list hbase> list ‘abc.*’ show_filters: Show all the filters in hbase.
  • 8. Prepared by, Vetri.V Count:  Count the number of rows in a table. Return value is the number of rows. This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount’ to run a counting mapreduce job).  Current count is shown every 1000 rows by default. Count interval may be optionally specified. Scan caching is enabled on count scans by default. Default cache size is 10 rows. If your rows are small in size, you may want to increase this parameter. Examples:hbase> count ‘users.*’ hbase> count ‘users.*’, INTERVAL => 100000 hbase> count ‘users.*’, CACHE => 1000 hbase> count ‘users.*’, INTERVAL => 10, CACHE => 1000 Put: hbase> put ‘users, ‘r1, ‘c1’, ‘value’, ts1 Configurable block size hbase(main):002:0> create 'mytable',{NAME => 'colfam1', BLOCKSIZE => '65536'} Block cache:  Workloads don’t benefit from putting data into a read cache—for instance, if a certain table or column family in a table is only accessed for sequential scans or isn’t accessed a lot and you don’t care if Gets or Scans take a little longer.  By default, the block cache is enabled. You can disable it at the time of table creationor by altering the table: hbase(main):002:0> create 'mytable',{NAME => 'colfam1', BLOCKCACHE => 'false’} Aggressive caching:  You can choose some column families to have a higher priority in the block cache (LRU cache).  This comes in handy if you expect more random reads on one column family compared to another. This configuration is also done at table- instantiation time: hbase(main):002:0> create 'mytable', {NAME => 'colfam1', IN_MEMORY => 'true'} The default value for the IN_MEMORY parameter is false.
  • 9. Prepared by, Vetri.V Bloom filters: hbase(main):007:0> create 'mytable',{NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}  The default value for the BLOOMFILTER parameter is NONE.  A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is enabled with ROWCOL.  The rowlevel bloom filter checks for the non-existence of the particular rowkey in the block,and the qualifier-level bloom filter checks for the non-existence of the row and column qualifier combination.  The overhead of the ROWCOL bloom filter is higher than that of the ROW bloom filter. TTL (Time To Live):  can set the TTL while creating the table like this: hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'} This command sets the TTL on the column family colfam1 as 18,000 seconds = 5 hours. Data in colfam1 that is older than 5 hours is deleted during the next major compaction. Compression  Can enable compression on a column family when creating tables like this: hbase(main):002:0> create 'mytable', {NAME => 'colfam1', COMPRESSION => 'SNAPPY'} Note that data is compressed only on disk. It’s kept uncompressed in memory (Mem-Store or block cache) or while transferring over the network. Cell versioning:  Versions are also configurable at a column family level and can be specified at the time of table instantiation: hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 1} hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 1, TTL => '18000'} hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5, MIN_VERSIONS => '1'} Description of a table:
  • 10. Prepared by, Vetri.V hbase(main):004:0> describe 'follows' DESCRIPTION ENABLED {NAME => 'follows', coprocessor$1 => 'file:///U true users/ndimiduk/repos/hbaseia twitbase/target/twitbase- 1.0.0.jar|HBaseIA.TwitBase.coprocessors.FollowsObserver|1001|', FAMILIES => [{NAME => 'f', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>'0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0330 seconds Tuning HBase: hbase(main):003:0> help 'status' SPLITTING TABLES: hbase(main):019:0> split 'mytable' , 'G' Alter table hbase(main):020:0> alter 't', NAME => 'f', VERSIONS => 1 TRUNCATING TABLES: hbase(main):023:0> truncate 't' Truncating 't' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 14.3190 seconds THANK YOU…