SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Gaurav Kohli
                              Xebia
Breaking with   DBMS and
Dating with




                1
me




Gaurav Kohli
gaurav.in@gmail.com

Consultant
Xebia IT Architects

                      2
   Why are we here ?
   Something about RDBMS
   Limitations of RDBMS
   Why Hbase or any NoSql solution
   Overview of Hbase
   Specific Use cases
   Paradigm shift in Schema Design
   Architecture of Hbase
   Hbase Interface – Java API, Thrift
   Conclusion              3
Databases




            4
Relational Databases have a lot of




                        5
   Data Set going into PetaBytes
   RDBMS don't scale inherently
       Scale up/Scale out ( Load Balancing + Replication)
   Hard to shard / partition
   Both read / write throughput not possible
       Transactional / Analytical databases
   Specialized Hardware …... is very expensive
       Oracle clustering


                              6
Master



Replication



              Slave



         7
Master
                                             Writes


                                                           Reads
Slave nodes



                 MySQL master becomes a problem
                 All Slaves must have the same write capacity as master
                 Single point of failure, no easy failover


                             8
Master                    Master




Replication

                  Slave



              9
10
11
   2006.11
      Google releases paper on BigTable

   2007.2
      Initial HBase prototype created as Hadoop contrib.

   2007.10
      First usable HBase

   2008.1
      Hadoop become Apache top-level project and HBase becomes
       subproject
   2010.5~
      Hbase becomes Apache top-level project

   2010.6
       Hbase 0.26.5 released.
   2010.10
                                 12
       HBase 0.89.2010092 – third developer release
   Distributed
       uses HDFS for storage
   Column-Oriented
   Multi-Dimensional
       versions
   High-Availability
   High-Performance
   Storage System

                                13
Hbase is
     A Sql Database
         No Joins, no query engine, no datatypes, no sql
     No Schema
     Denormalized data
     Wide and sparsely populated data structure(key-
      value)
     No DBA needed



                             14
   Bigness
       Big data, big number of users, big number of computers
   Massive write performance
       Facebook needs 135 billion messages a month
       Twitter stores 7 TB data per day
   Fast key-value access
   Write availability
   No Single point of failure


                              15
Specific
     Managing large streams of non-transactional data: Apache
      logs, application logs, MySQL logs, etc.
     Real-time inserts, updates, and queries.
     Fraud detection by comparing transactions to known
      patterns in real-time.
     Analytics - Use MapReduce, Hive, or Pig to perform
      analytical queries




                               16
   Column-oriented database
   Table are sorted by Row
   Table schema only defines Column families
       column family can have any number of columns
   Each cell value has a timestamp




                            17
18
19
Sorted Map(
    RowKey, List(
        SortedMap(
          Column, List(
             value, Timestamp
          )
        )
    )
)
SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))

                           20
 A BIG SORTED MAP
     Row Key+ Column Key + timestamp => value
                                Column family
              Student table
              Row Key     Column Key        Timestamp         Value
              1           info:name         1273516197868     Gaurav
              1           info:age          1273871824184     28
  Sorted by                                                              2 Versions
Row key and   1           info:age          1273871823022     34         of this row
 column key
              1           info:sex          1273746281432     Male
              2           info:name         1273863723227     Harsh
              3           Info:name         1273822456433     Raman


                  Column Qualifier/Name      Timestamp is a long value
                                       21
   Example of a Student and Subject


      Student Table                     Subject Table
      PK   id                           PK   id
                         m          n
           name                              title
           age                               introduction
           sex                               teacher_id



                  Student-Subject Table
                  student_id
                  subject_id
                  type


                               22
RDBMS

       Example of a Student and Subject
Student table

    key     name             age               sex
    1       Gaurav           28                Male

Subject table

    id       title          introduction              teacher_id
    1        Hbase          Hbase is cool             10

Student-Subject table

    student_id       subject_id         type
    1                1                  elective


                                   23
Hbase

   Student-Subject schema - Hbase
Student table

Row Key           Column family Column Keys
student_id        info            name, age, sex
student_id        subjects        Subject Id's as qualifier(key)
Subject table

Row Key           Column family Column Keys
subject_id        info            title, introduction, teacher_id
subject_id        students        Student id's as qualifier(key)




                             24
Hbase

       Student-Subject schema - Hbase
Student table
key               info                              subjects
1                 info:name=Gaurav                  subjects:1=”elective”
                  info:age=28                       subjects:2=”main”
                  info:sex=Male

Subject table
    key           info                              students
    1             info:title=Hbase                  students:1
                  info:introduction=Hbase is cool   students:2
                  info:teacher_id=10




                                   25
Attribute     Possible Values         Default
COMPRESSION   NONE,GZ,LZO             NONE
VERSIONS      1+                      3
TTL           1-2147483647(seconds)   2147483647

BLOCKSIZE     1 byte – 2 GB           64k
IN_MEMORY     true,false              false
BLOCKCACHE    true,false              true




                      26
   Region: Contiguous set of lexicographically sorted
    rows
       hbase.hregion.max.filesize (default:256 Mb)
   Region hosted by Region Servers
   Each Table is partitioned into Regions




                          27
Regions and


     row1


     row200

     row201


     row500

     new row




               28
Regions and


     row1


     row200


     row201


     row350
     row 351

     row 501




               29
   Master
   Zookeeper
   RegionServers
   HDFS
   MapReduce




                    30
31
– Java API, Thrift...




            32
– Java API, Thrift...
   Java
   Thrift ( Ruby, Php, Python, Perl, C++... )
   REST
   Groovy DSL
   MapReduce
   Hbase Shell




                          33
– Java API, Thrift...
   Java
       Get
       Put
       Delete
       Scan
       IncrementalColumnValue




                           34
35
   Hbase v/s RDBMS
       Not a replacement
       Solves only a small subset(~5%)




                              36
   Where Sql makes life easy
       Joining
       Secondary Indexing
       Referential Integrity (updates)
       ACID
   Where Hbase makes life easy
       Dataset scale
       Read/Write scale
       Replication
       Batch analysis
                              37
38
39
   Hbase Apache (http://hbase.apache.org/)
   Hbase Wiki (wiki.apache.org/hadoop/Hbase)
   Hbase blog (blog.hbase.org)
   Images from Google Search
   http://www.larsgeorge.com/2009/10/hbase-
    architecture-101-storage.html
   http://highscalability.com/blog/2010/12/6/what-the-
    heck-are-you-actually-using-nosql-for.html




                            40

Contenu connexe

Tendances

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 

Tendances (14)

Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
HDFS
HDFSHDFS
HDFS
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 

Similaire à Breaking with relational dbms and dating with hbase

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...IndicThreads
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational modelChirag vasava
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDBJohannes Hoppe
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingAlexander Schätzle
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoopbigdatasyd
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for roboticsJoão Gabriel Lima
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 

Similaire à Breaking with relational dbms and dating with hbase (20)

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join Processing
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 

Dernier

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Dernier (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Breaking with relational dbms and dating with hbase

  • 1. Gaurav Kohli Xebia Breaking with DBMS and Dating with 1
  • 3. Why are we here ?  Something about RDBMS  Limitations of RDBMS  Why Hbase or any NoSql solution  Overview of Hbase  Specific Use cases  Paradigm shift in Schema Design  Architecture of Hbase  Hbase Interface – Java API, Thrift  Conclusion 3
  • 6. Data Set going into PetaBytes  RDBMS don't scale inherently  Scale up/Scale out ( Load Balancing + Replication)  Hard to shard / partition  Both read / write throughput not possible  Transactional / Analytical databases  Specialized Hardware …... is very expensive  Oracle clustering 6
  • 8. Master Writes Reads Slave nodes  MySQL master becomes a problem  All Slaves must have the same write capacity as master  Single point of failure, no easy failover 8
  • 9. Master Master Replication Slave 9
  • 10. 10
  • 11. 11
  • 12. 2006.11  Google releases paper on BigTable  2007.2  Initial HBase prototype created as Hadoop contrib.  2007.10  First usable HBase  2008.1  Hadoop become Apache top-level project and HBase becomes subproject  2010.5~  Hbase becomes Apache top-level project  2010.6  Hbase 0.26.5 released.  2010.10 12  HBase 0.89.2010092 – third developer release
  • 13. Distributed  uses HDFS for storage  Column-Oriented  Multi-Dimensional  versions  High-Availability  High-Performance  Storage System 13
  • 14. Hbase is  A Sql Database  No Joins, no query engine, no datatypes, no sql  No Schema  Denormalized data  Wide and sparsely populated data structure(key- value)  No DBA needed 14
  • 15. Bigness  Big data, big number of users, big number of computers  Massive write performance  Facebook needs 135 billion messages a month  Twitter stores 7 TB data per day  Fast key-value access  Write availability  No Single point of failure 15
  • 16. Specific  Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.  Real-time inserts, updates, and queries.  Fraud detection by comparing transactions to known patterns in real-time.  Analytics - Use MapReduce, Hive, or Pig to perform analytical queries 16
  • 17. Column-oriented database  Table are sorted by Row  Table schema only defines Column families  column family can have any number of columns  Each cell value has a timestamp 17
  • 18. 18
  • 19. 19
  • 20. Sorted Map( RowKey, List( SortedMap( Column, List( value, Timestamp ) ) ) ) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp))) 20
  • 21.  A BIG SORTED MAP  Row Key+ Column Key + timestamp => value Column family Student table Row Key Column Key Timestamp Value 1 info:name 1273516197868 Gaurav 1 info:age 1273871824184 28 Sorted by 2 Versions Row key and 1 info:age 1273871823022 34 of this row column key 1 info:sex 1273746281432 Male 2 info:name 1273863723227 Harsh 3 Info:name 1273822456433 Raman Column Qualifier/Name Timestamp is a long value 21
  • 22. Example of a Student and Subject Student Table Subject Table PK id PK id m n name title age introduction sex teacher_id Student-Subject Table student_id subject_id type 22
  • 23. RDBMS  Example of a Student and Subject Student table key name age sex 1 Gaurav 28 Male Subject table id title introduction teacher_id 1 Hbase Hbase is cool 10 Student-Subject table student_id subject_id type 1 1 elective 23
  • 24. Hbase  Student-Subject schema - Hbase Student table Row Key Column family Column Keys student_id info name, age, sex student_id subjects Subject Id's as qualifier(key) Subject table Row Key Column family Column Keys subject_id info title, introduction, teacher_id subject_id students Student id's as qualifier(key) 24
  • 25. Hbase  Student-Subject schema - Hbase Student table key info subjects 1 info:name=Gaurav subjects:1=”elective” info:age=28 subjects:2=”main” info:sex=Male Subject table key info students 1 info:title=Hbase students:1 info:introduction=Hbase is cool students:2 info:teacher_id=10 25
  • 26. Attribute Possible Values Default COMPRESSION NONE,GZ,LZO NONE VERSIONS 1+ 3 TTL 1-2147483647(seconds) 2147483647 BLOCKSIZE 1 byte – 2 GB 64k IN_MEMORY true,false false BLOCKCACHE true,false true 26
  • 27. Region: Contiguous set of lexicographically sorted rows  hbase.hregion.max.filesize (default:256 Mb)  Region hosted by Region Servers  Each Table is partitioned into Regions 27
  • 28. Regions and row1 row200 row201 row500 new row 28
  • 29. Regions and row1 row200 row201 row350 row 351 row 501 29
  • 30. Master  Zookeeper  RegionServers  HDFS  MapReduce 30
  • 31. 31
  • 32. – Java API, Thrift... 32
  • 33. – Java API, Thrift...  Java  Thrift ( Ruby, Php, Python, Perl, C++... )  REST  Groovy DSL  MapReduce  Hbase Shell 33
  • 34. – Java API, Thrift...  Java  Get  Put  Delete  Scan  IncrementalColumnValue 34
  • 35. 35
  • 36. Hbase v/s RDBMS  Not a replacement  Solves only a small subset(~5%) 36
  • 37. Where Sql makes life easy  Joining  Secondary Indexing  Referential Integrity (updates)  ACID  Where Hbase makes life easy  Dataset scale  Read/Write scale  Replication  Batch analysis 37
  • 38. 38
  • 39. 39
  • 40. Hbase Apache (http://hbase.apache.org/)  Hbase Wiki (wiki.apache.org/hadoop/Hbase)  Hbase blog (blog.hbase.org)  Images from Google Search  http://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.html  http://highscalability.com/blog/2010/12/6/what-the- heck-are-you-actually-using-nosql-for.html 40