Breaking with relational dbms and dating with hbase

Gaurav Kohli
Xebia
Breaking with DBMS and
Dating with

1

me

Gaurav Kohli
gaurav.in@gmail.com

Consultant
Xebia IT Architects

2

 Why are we here ?
 Something about RDBMS
 Limitations of RDBMS
 Why Hbase or any NoSql solution
 Overview of Hbase
 Specific Use cases
 Paradigm shift in Schema Design
 Architecture of Hbase
 Hbase Interface – Java API, Thrift
 Conclusion 3

Relational Databases have a lot of

5

 Data Set going into PetaBytes
 RDBMS don't scale inherently
 Scale up/Scale out ( Load Balancing + Replication)
 Hard to shard / partition
 Both read / write throughput not possible
 Transactional / Analytical databases
 Specialized Hardware …... is very expensive
 Oracle clustering

6

Master

Replication

Slave

7

Master
Writes

Reads
Slave nodes

 MySQL master becomes a problem
 All Slaves must have the same write capacity as master
 Single point of failure, no easy failover

8

Master Master

Replication

Slave

9

 2006.11
 Google releases paper on BigTable

 2007.2
 Initial HBase prototype created as Hadoop contrib.

 2007.10
 First usable HBase

 2008.1
 Hadoop become Apache top-level project and HBase becomes
subproject
 2010.5~
 Hbase becomes Apache top-level project

 2010.6
 Hbase 0.26.5 released.
 2010.10
12
 HBase 0.89.2010092 – third developer release

 Distributed
 uses HDFS for storage
 Column-Oriented
 Multi-Dimensional
 versions
 High-Availability
 High-Performance
 Storage System

13

Hbase is
 A Sql Database
 No Joins, no query engine, no datatypes, no sql
 No Schema
 Denormalized data
 Wide and sparsely populated data structure(key-
value)
 No DBA needed

14

 Bigness
 Big data, big number of users, big number of computers
 Massive write performance
 Facebook needs 135 billion messages a month
 Twitter stores 7 TB data per day
 Fast key-value access
 Write availability
 No Single point of failure

15

Specific
 Managing large streams of non-transactional data: Apache
logs, application logs, MySQL logs, etc.
 Real-time inserts, updates, and queries.
 Fraud detection by comparing transactions to known
patterns in real-time.
 Analytics - Use MapReduce, Hive, or Pig to perform
analytical queries

16

 Column-oriented database
 Table are sorted by Row
 Table schema only defines Column families
 column family can have any number of columns
 Each cell value has a timestamp

17

Sorted Map(
RowKey, List(
SortedMap(
Column, List(
value, Timestamp
)
)
)
)
SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))

20

 A BIG SORTED MAP
 Row Key+ Column Key + timestamp => value
Column family
Student table
Row Key Column Key Timestamp Value
1 info:name 1273516197868 Gaurav
1 info:age 1273871824184 28
Sorted by 2 Versions
Row key and 1 info:age 1273871823022 34 of this row
column key
1 info:sex 1273746281432 Male
2 info:name 1273863723227 Harsh
3 Info:name 1273822456433 Raman

Column Qualifier/Name Timestamp is a long value
21

 Example of a Student and Subject

Student Table Subject Table
PK id PK id
m n
name title
age introduction
sex teacher_id

Student-Subject Table
student_id
subject_id
type

22

RDBMS

 Example of a Student and Subject
Student table

key name age sex
1 Gaurav 28 Male

Subject table

id title introduction teacher_id
1 Hbase Hbase is cool 10

Student-Subject table

student_id subject_id type
1 1 elective

23

Hbase

 Student-Subject schema - Hbase
Student table

Row Key Column family Column Keys
student_id info name, age, sex
student_id subjects Subject Id's as qualifier(key)
Subject table

Row Key Column family Column Keys
subject_id info title, introduction, teacher_id
subject_id students Student id's as qualifier(key)

24

Hbase

 Student-Subject schema - Hbase
Student table
key info subjects
1 info:name=Gaurav subjects:1=”elective”
info:age=28 subjects:2=”main”
info:sex=Male

Subject table
key info students
1 info:title=Hbase students:1
info:introduction=Hbase is cool students:2
info:teacher_id=10

25

Attribute Possible Values Default
COMPRESSION NONE,GZ,LZO NONE
VERSIONS 1+ 3
TTL 1-2147483647(seconds) 2147483647

BLOCKSIZE 1 byte – 2 GB 64k
IN_MEMORY true,false false
BLOCKCACHE true,false true

26

 Region: Contiguous set of lexicographically sorted
rows
 hbase.hregion.max.filesize (default:256 Mb)
 Region hosted by Region Servers
 Each Table is partitioned into Regions

27

Regions and

row1

row200

row201

row500

new row

28

Regions and

row1

row200

row201

row350
row 351

row 501

29

 Master
 Zookeeper
 RegionServers
 HDFS
 MapReduce

30

– Java API, Thrift...

32

 Java
 Thrift ( Ruby, Php, Python, Perl, C++... )
 REST
 Groovy DSL
 MapReduce
 Hbase Shell

33

 Java
 Get
 Put
 Delete
 Scan
 IncrementalColumnValue

34

 Hbase v/s RDBMS
 Not a replacement
 Solves only a small subset(~5%)

36

 Where Sql makes life easy
 Joining
 Secondary Indexing
 Referential Integrity (updates)
 ACID
 Where Hbase makes life easy
 Dataset scale
 Read/Write scale
 Replication
 Batch analysis
37

 Hbase Apache (http://hbase.apache.org/)
 Hbase Wiki (wiki.apache.org/hadoop/Hbase)
 Hbase blog (blog.hbase.org)
 Images from Google Search
 http://www.larsgeorge.com/2009/10/hbase-
architecture-101-storage.html
 http://highscalability.com/blog/2010/12/6/what-the-
heck-are-you-actually-using-nosql-for.html

40

Breaking with relational dbms and dating with hbase

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (14)

Similaire à Breaking with relational dbms and dating with hbase

Similaire à Breaking with relational dbms and dating with hbase (20)

Dernier

Dernier (20)

Breaking with relational dbms and dating with hbase