SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Building	
  applica2ons	
  with	
  
             HBase Goes	
  Here
             Headline	
  
             Amandeep	
  Khurana	
  |	
  Solu7ons	
  AHere
             Speaker	
  Name	
  or	
  Subhead	
  Goes	
   rchitect
             Big	
  Data	
  TechCon,	
  Boston,	
  April	
  2013




  1
Friday, April 12, 13
About	
  me
         • Solu2ons	
  Architect,	
  Cloudera	
  Inc
         • Amazon	
  Web	
  Services
         • Interested	
  in	
  large	
  scale	
  distributed	
  systems
         • Co-­‐author,	
  HBase	
  In	
  Ac2on
         • TwiHer:	
  amansk

                                                                Nick Dimiduk
                                                                Amandeep Khurana




                                                                     MANNING




  2
Friday, April 12, 13
About	
  the	
  talk
         • Setup	
  local	
  HBase	
  environment
         • Simple	
  client
         • Deeper	
  dive	
  on	
  API
         • Data	
  model




  3
Friday, April 12, 13
What	
  is	
  HBase?
         • Scalable,	
  distributed	
  data	
  store
         • Open	
  source	
  avatar	
  of	
  Google’s	
  Bigtable
         • Sorted	
  map	
  of	
  maps
         • Sparse
         • Mul2	
  dimensional
         • Tightly	
  integrated	
  with	
  Hadoop
         • Not	
  a	
  RDBMS



  4
Friday, April 12, 13
So,	
  what’s	
  the	
  big	
  deal?
         • Give	
  up	
  system	
  complexity	
  and	
  sophis2cated	
  
           features	
  for	
  ease	
  of	
  scale	
  and	
  manageability
         • Simple	
  system	
  !=	
  easy	
  to	
  build	
  applica2ons	
  on	
  top
         • Goal:	
  predictable	
  read	
  and	
  write	
  performance	
  at	
  
           scale
            • Effec2ve	
  API	
  usage
            • Op2mal	
  schema	
  design
            • Understand	
  internals	
  and	
  leverage	
  at	
  scale


  5
Friday, April 12, 13
Use	
  cases
         • Canonical	
  use	
  case:	
  storing	
  crawl	
  data	
  and	
  indices	
  
           for	
  search
                                                                                                                                 Web Search
                                                                                                                                 powered by Bigtable
                                                                Crawlers
                                       Internets                 Crawlers                 Bigtable
                                                                   Crawlers
                                                                    Crawlers
                                                         1




                                                                                  2
                                                                                         MapReduce
                           Indexing the Internet
                           1 Crawlers constantly scour the Internet for new pages.
                             Those pages are stored as individual records in Bigtable.                      4                           3

                           2 A MapReduce job runs over the entire table, generating      Search
                             search indexes for the Web Search application.                Search
                                                                                          Index
                                                                                             Search                Web Search
                                                                                            Index
                                                                                              Index

                                                                                                                                        5           You


                                                                                                      Searching the Internet
                                                                                                      3 The user initiates a Web Search request.

                                                                                                      4 The Web Search application queries the Search Indexes
                                                                                                        and retries matching documents directly from Bigtable.

                                                                                                      5 Search results are presented to the user.




  6
Friday, April 12, 13
Use	
  cases	
  (con2nued)
         • Capturing	
  incremental	
  trickle	
  data
         • Content	
  serving	
  /	
  OLTP
         • Blob	
  store




  7
Friday, April 12, 13
Installing	
  HBase
         • Get	
  it	
  here	
  -­‐>	
  hHp://hbase.apache.org/
         • Downloads	
  -­‐>	
  Choose	
  a	
  mirror	
  -­‐>	
  Choose	
  a	
  release
         • Download	
  the	
  tarball
         • Untar	
  it	
  and	
  run	
  it




  8
Friday, April 12, 13
Basics
         • Modes	
  of	
  opera2on
                • Standalone
                • Pseudo-­‐distributed
                • Fully	
  distributed
         • Start	
  up	
  in	
  standalone	
  mode
                • cd	
  hbase-­‐0.94.1
                • bin/start-­‐hbase.sh
                • Logs	
  are	
  in	
  ./logs/
         • Check	
  out	
  hHp://localhost:60010	
  in	
  your	
  browser



  9
Friday, April 12, 13
Ways	
  to	
  interact
         • Shell
         • Java	
  client
         • Thrid	
  interface
         • REST	
  interface
         • Asynchbase




  10
Friday, April 12, 13
Shell
         • Basic	
  interac2on	
  tool
         • WriHen	
  in	
  JRuby
         • Uses	
  the	
  Java	
  client	
  library
         • Good	
  place	
  to	
  start	
  poking	
  around




  11
Friday, April 12, 13
Shell	
  (con2nued)
         • bin/hbase	
  shell
         • Create	
  table
                • create	
  ‘users’	
  ,	
  ‘info’
         • List	
  tables
                • list
         • Describe	
  table
                • describe	
  ‘users’



  12
Friday, April 12, 13
Shell	
  (con2nued)
         • Put	
  a	
  row
                • put	
  ‘users’	
  ,	
  ‘u1’,	
  ‘info:name’	
  ,	
  ‘Mr.	
  Rogers’
         • Get	
  a	
  row
                • get	
  ‘users’	
  ,	
  ‘u1’
         • Put	
  more
                • put	
  ‘users’	
  ,	
  ‘u1’	
  ,	
  ‘info:email’	
  ,	
  ‘mail@rogers.com’
                • put	
  ‘users’	
  ,	
  ‘u1’	
  ,	
  ‘info:password’	
  ,	
  ‘foobar’
         • Get	
  a	
  row
                • get	
  ‘users’	
  ,	
  ‘u1’
         • Scan	
  table
                • scan	
  ‘users’

  13
Friday, April 12, 13
Demo
             -­‐	
  Standalone	
  mode
             -­‐	
  Shell



  14
Friday, April 12, 13
Java	
  API
              ...	
  can’t	
  really	
  build	
  an	
  applica7on	
  just	
  with	
  the	
  shell




  15
Friday, April 12, 13
Important	
  terms
         • Table
                • Consists	
  of	
  rows	
  and	
  columns
         • Row
                • Has	
  a	
  bunch	
  of	
  columns.
                • Iden2fied	
  by	
  a	
  rowkey	
  (primary	
  key)
         • Column	
  Qualifier
                • Dynamic	
  column	
  name
         • Column	
  Family
                • Column	
  groups	
  -­‐	
  logical	
  and	
  physical	
  (Similar	
  access	
  paHern)
         • Cell
                • The	
  actual	
  element	
  that	
  contains	
  the	
  data	
  for	
  a	
  row-­‐column	
  intersec2on
         • Version
                • Every	
  cell	
  has	
  mul2ple	
  versions.


  16
Friday, April 12, 13
Introducing	
  TwitBase
         • TwiHer	
  clone	
  built	
  on	
  HBase
         • Designed	
  to	
  scale
         • It’s	
  an	
  example,	
  not	
  a	
  produc2on	
  app
         • Start	
  with	
  users	
  and	
  twits	
  for	
  now
         • Users
                • Have	
  profiles
                • Post	
  twits
         • Twits
                • Have	
  text	
  posted	
  by	
  users
         • Get	
  sample	
  code	
  at:	
  hHps://github.com/hbaseinac2on/twitbase


  17
Friday, April 12, 13
Java	
  API
         • Configura)on	
  -­‐	
  holds	
  config	
  informa2on	
  about	
  
           the	
  cluster.	
  Roughly	
  equivalent	
  to	
  JDBC	
  
           connec2on	
  string
         • HConnec)on	
  -­‐	
  represents	
  connec2on	
  to	
  a	
  cluster
         • HBaseAdmin	
  -­‐	
  admin	
  tool	
  to	
  handle	
  DDL	
  
           opera2ons
         • HTablePool	
  -­‐	
  pool	
  of	
  connec2ons	
  to	
  cluster
         • HTable	
  (HTableInterface)	
  -­‐	
  handle	
  on	
  a	
  single	
  
           HBase	
  table.	
  Sends	
  commands	
  to	
  table
  18
Friday, April 12, 13
Connec2ng	
  to	
  HBase
         • Using       default confs:
           HTableInterface usersTable = new HTable("users");




         • Passing       custom conf:
           Configuration myConf = HBaseConfiguration.create();
           myConf.set("parameter_name", "parameter_value");
           HTableInterface usersTable = new HTable(myConf, "users");



         • Connection       pooling
           HTablePool pool = new HTablePool();
           HTableInterface usersTable = pool.getTable("users");
             // work with the table
           usersTable.close();



  19
Friday, April 12, 13
Inser2ng	
  data
         • Put	
  call	
  on	
  HTable
           Put p = new Put(Bytes.toBytes("TheRealMT"));
             p.add(Bytes.toBytes("info"),
               Bytes.toBytes("name"),
               Bytes.toBytes("Mark Twain"));
             p.add(Bytes.toBytes("info"),
               Bytes.toBytes("email"),
               Bytes.toBytes("samuel@clemens.org"));
             p.add(Bytes.toBytes("info"),
               Bytes.toBytes("password"),
               Bytes.toBytes("Langhorne"));
           usersTable.put(p);




  20
Friday, April 12, 13
Data	
  coordinates
         • Row	
  is	
  addressed	
  using	
  rowkey
         • Cell	
  is	
  addressed	
  using	
  
           	
  	
  	
  	
  	
  	
  [rowkey	
  +	
  family	
  +	
  qualifier]




  21
Friday, April 12, 13
Upda2ng	
  data
         •Update	
  ==	
  Write
           Put p = new Put(Bytes.toBytes("TheRealMT"));
           p.add(Bytes.toBytes("info"),
                Bytes.toBytes("password"),
                Bytes.toBytes("abc123"));
           usersTable.put(p);




  22
Friday, April 12, 13
Write	
  path

                                  WAL

                                             Write goes to the write-ahead log (WAL)
                                Memstore    as well as the in memory write buffer called
                                                           the Memstore.
                       Client     HFile        Client's don't interact directly with the
                                                  underlying HFiles during writes.
                                  HFile




                                Memstore




                                New HFile        Memstore gets flushed to disk to
                                                       form a new HFile
                                                   when filled up with writes.
                                  HFile


                                  HFile




  23
Friday, April 12, 13
Reading	
  data
         • Get	
  the	
  en2re	
  row
           Get g = new Get(Bytes.toBytes("TheRealMT"));
           Result r = usersTable.get(g);


         • Get	
  selected	
  columns
           Get g = new Get(Bytes.toBytes("TheRealMT"));
           g.addColumn(Bytes.toBytes("info"),
                Bytes.toBytes("password"));
           Result r = usersTable.get(g);



         • Extract	
  the	
  value	
  out	
  of	
  the	
  Result	
  object
           byte[] b = r.getValue(Bytes.toBytes("info"),
           Bytes.toBytes("email"));
           String email = Bytes.toString(b);


  24
Friday, April 12, 13
Reading	
  data	
  (con2nued)
         • Scan
           Scan s = new Scan(startRow, stopRow);
           ResultsScanner rs = twits.getScanner(s);
           for(Result r : rs) {
                   // iterate over scanner results
           }



         • Single threaded iteration over defined set of rows. Could
           be the entire table too.




  25
Friday, April 12, 13
What	
  happens	
  when	
  you	
  read	
  data?



                                  BlockCache


                                   Memstore

                        Client       HFile

                                     HFile




  26
Friday, April 12, 13
Dele2ng	
  data
         • Another	
  simple	
  API	
  call
         • Delete	
  en2re	
  row
           Delete d = new Delete(Bytes.toBytes("TheRealMT"));
           usersTable.delete(d);



         • Delete	
  selected	
  columns
           Delete d = new Delete(Bytes.toBytes("TheRealMT"));
           d.deleteColumns(Bytes.toBytes("info"),
                Bytes.toBytes("email"));
           usersTable.delete(d);




  27
Friday, April 12, 13
Delete	
  internals
         • Tombstone	
  markers
         • Dele2on	
  only	
  happens	
  during	
  major	
  compac2ons
         • Compac2ons
                • Minor
                • Major




  28
Friday, April 12, 13
Compac2ons


                       Memstore   Memstore



                        HFile                 Two or more HFiles get combined
                                  Compacted     to form a single HFile during
                                    HFile              a compaction.
                        HFile


                        HFile       HFile




  29
Friday, April 12, 13
DAO
         • Like	
  any	
  other	
  applica2on
         • Makes	
  data	
  access	
  management	
  easy
                • Cleaner	
  code
                • Table	
  info	
  at	
  a	
  single	
  place
         • Op2miza2ons	
  like	
  connec2on	
  pooling	
  etc
         • Easier	
  management	
  of	
  database	
  configs




  30
Friday, April 12, 13
Admin
        HBaseAdmin                        Table	
  Opera2ons
        • Administra2ve	
  API	
  to	
  the	
   • createTable()
          cluster.
                                                • deleteTable()
        • Provides	
  an	
  interface	
  to	
  
                                                • addColumn()
          manage	
  HBase	
  database	
  
          tables	
  metadata	
  and	
           • modifyColumn()
          general	
  administra2ve	
            • deleteColumn()
          func2ons.
                                                • listTables()



Friday, April 12, 13
Demo
             -­‐	
  Java	
  client




  32
Friday, April 12, 13
More	
  API	
  features
         • Filters
         • Compare	
  and	
  Swap
         • Coprocessors




  33
Friday, April 12, 13
Data	
  Model
              It’s	
  not	
  a	
  rela7onal	
  database	
  system




  34
Friday, April 12, 13
HBase	
  is	
  ...
         • Column	
  family	
  oriented	
  database
                • Column	
  family	
  oriented
                • Tables	
  consis2ng	
  of	
  rows	
  and	
  columns
         • Persisted	
  Map
                • Sparse
                • Mul2	
  dimensional
                • Sorted
                • Indexed	
  by	
  rowkey,	
  column	
  and	
  2mestamp
         • Key	
  Value	
  store
                • [rowkey,	
  col	
  family,	
  col	
  qualifier,	
  2mestamp]	
  -­‐>	
  cell	
  value

  35
Friday, April 12, 13
HBase	
  is	
  not	
  ...
         • A	
  rela2onal	
  database
                • No	
  SQL	
  query	
  language
                • No	
  joins
                • No	
  secondary	
  indexing
                • No	
  transac2ons




  36
Friday, April 12, 13
Tabular	
  representa2on


                                                                                                              2 Column Family - Info
                                                              1                                                                                   3
                                                                  Rowkey                     name                           email                     password
                                                           GrandpaD             Mark Twain                          samuel@clemens.org          abc123

                                                           HMS_Surprise         Patrick O'Brien                     aubrey@sea.com              abc123
                       The table is lexicographically
                          sorted on the rowkeys
                                                           SirDoyle             Fyodor Dostoyevsky                  fyodor@brothers.net         abc123

                                                           TheRealMT            Sir Arthur Conan Doyle              art@TheQueensMen.co.uk       Langhorne
                                                                                                                                                      abc123
                                                                                                                                                                          4

                                                                                                                Cells               ts1=1329088321289        ts2=1329088818321

                                                                                                                                               Each cell has multiple
                                                   The coordinates used to identify data in an HBase table are:                                       versions,
                                                  (1) rowkey, (2) column family, (3) column qualifier, (4) version                             typically represented by
                                                                                                                                                   the timestamp
                                                                                                                                                 of when they were
                                                                                                                                               inserted into the table
                                                                                                                                                      (ts2>ts1)




  37
Friday, April 12, 13
Key-­‐Value	
  store



                                                             Keys    Values
                       [TheRealMT, info, password, 1329088818321]             abc123
                       [TheRealMT, info, password, 1329088321289]         Langhorne


                                        A single KeyValue instance




  38
Friday, April 12, 13
Key-­‐Value	
  store
                                 [TheRealMT, info, password, 1329088818321]                       abc123

                                                1    Start with coordinates of full precision

                                                                        {
                                                                            1329088818321 : "abc123",
                              [TheRealMT, info, password]
                                                                            1329088321289 : "Langhorne"
                                                                        }
                                        2    Drop version and you're left with a map of version to values

                       Keys                              {
                                                              "email" : {
                                                                 1329088321289 : "samuel@clemens.org"
                                                              },
                                                             "name" : {
                              [TheRealMT, info]                 1329088321289 : "Mark Twain"
                                                             },
                                                                                                                 Values
                                                              "password" : {
                                                                 1329088818321 : "abc123",
                                                                 1329088321289 : "Langhorne"
                                                              }
                                                         }
                                        3 Omit qualifier and you have a map of qualifiers to the previous maps


                                                    {
                                                        "info" : {
                                                           "email" : {
                                                              1329088321289 : "samuel@clemens.org"
                                                           },
                                                          "name" : {
                                                             1329088321289 : "Mark Twain"
                               [TheRealMT]
                                                          },
                                                           "password" : {
                                                              1329088818321 : "abc123",
                                                              1329088321289 : "Langhorne"
                                                           }
                                                        }
                                                    }
                                        4    Finally, drop the column family and you have a row, a map of maps




  39
Friday, April 12, 13
Sorted	
  map	
  of	
  maps

                                                         Rowkey
                                      {
                                          "TheRealMT" : {
                         Column family      "info" : {
                                               "email" : {
                                                  1329088321289 : "samuel@clemens.org"
                                               },
                                              "name" : {
                                                 1329088321289 : "Mark Twain"
                       Column qualifiers       },
                                               "password" : {                    Values
                                                  1329088818321 : "abc123",
                                                  1329088321289 : "Langhorne"
                                               }
                                            }
                                          }                          Versions
                                      }




  40
Friday, April 12, 13
HFiles	
  and	
  physical	
  data	
  model
         • HFiles	
  are
                • Immutable
                • Sorted	
  on	
  rowkey	
  +	
  qualifier	
  +	
  2mestamp
                • In	
  the	
  context	
  of	
  a	
  column	
  family	
  per	
  region
                              "TheRealMT" ,   "info" ,   "email" , 1329088321289, "samuel@clemens.org"
                              "TheRealMT" ,   "info" ,   "name" , 1329088321289 , "Mark Twain"
                              "TheRealMT" ,   "info" ,   "password" , 1329088818321 , "abc123",
                              "TheRealMT" ,   "info" ,   "password" , 1329088321289 , "Langhorne"


                                              HFile for the info column family in the users table




  41
Friday, April 12, 13
HBase	
  in	
  distributed	
  mode
              ...	
  what	
  really	
  goes	
  on	
  under	
  the	
  hood




  42
Friday, April 12, 13
Distributed	
  mode
         • Table	
  is	
  split	
  into	
  Regions


                                                                                                                                                  T1R1
                                                                                                00001        John               415-111-1234
                                                                                                00002        Paul               408-432-9922
                                                                 00001   John    415-111-1234   00003        Ron                415-993-2124
                       00001   John               415-111-1234   00002   Paul    408-432-9922   00004        Bob                818-243-9988
                       00002   Paul               408-432-9922   00003   Ron     415-993-2124
                       00003   Ron                415-993-2124   00004   Bob     818-243-9988
                       00004   Bob                818-243-9988                                                                             T1R2
                       00005   Carly              206-221-9123   00005   Carly   206-221-9123   00005        Carly              206-221-9123
                       00006   Scott              818-231-2566   00006   Scott   818-231-2566   00006        Scott              818-231-2566
                       00007   Simon              425-112-9877   00007   Simon   425-112-9877   00007        Simon              425-112-9877
                       00008   Lucas              415-992-4432   00008   Lucas   415-992-4432   00008        Lucas              415-992-4432
                       00009   Steve              530-288-9832
                       00010   Kelly              916-992-1234   00009   Steve   530-288-9832
                                                                                                                                           T1R3
                       00011   Betty              650-241-1192   00010   Kelly   916-992-1234
                       00012   Anne               206-294-1298   00011   Betty   650-241-1192   00009        Steve              530-288-9832
                                                                 00012   Anne    206-294-1298   00010        Kelly              916-992-1234
                                                                                                00011        Betty              650-241-1192
                                                                                                00012        Anne               206-294-1298


                                  Full table T1                                                   Table T1 split into 3 regions - R1, R2 and R3




  43
Friday, April 12, 13
Distributed	
  mode	
  (con2nued)
         • Regions	
  are	
  served	
  
           by	
  RegionServers            DataNode     RegionServer
                                                                       T1R1
                                                                              00001
                                                                              00002
                                                                              00003
                                                                                      John
                                                                                      Paul
                                                                                      Ron
                                                                                              415-111-1234
                                                                                              408-432-9922
                                                                                              415-993-2124
                                                                              00004   Bob     818-243-9988


         • RegionServers	
  are	
                    Host1




           Java	
  processes,	
  
                                                                       T1R2
                                                                              00005   Carly   206-221-9123
                                                                              00006   Scott   818-231-2566
                                          DataNode     RegionServer           00007   Simon   425-112-9877
                                                                              00008   Lucas   415-992-4432



           typically	
  collocated	
                                   T1R3
                                                                              00009
                                                                              00010
                                                                              00011
                                                                                      Steve
                                                                                      Kelly
                                                                                      Betty
                                                                                              530-288-9832
                                                                                              916-992-1234
                                                                                              650-241-1192


           with	
  the	
  DataNodes
                                                     Host2                    00012   Anne    206-294-1298




                                          DataNode      RegionServer




                                                     Host3




  44
Friday, April 12, 13
Finding	
  your	
  region
         • Two	
  special	
  tables:	
  -­‐ROOT-­‐	
  and	
  .META.


                                                                                     -ROOT- table, consisting of only 1 region
                                                 -ROOT-
                                                                                         hosted on region server RS1
                                                          RS1




                                                                                      .META. table, consisting of 3 regions,
                                     M1           M2             M3
                                                                                         hosted on RS1, RS2 and RS3
                                      RS2              RS3   RS1




                                                                                       User tables T1 and T2, consisting of
                       T1R1   T1R2        T2R1     T1R3         T2R2   T2R3   T2R4          4 and 3 regions respectively,
                                                                                     distributed amongst RS1, RS2 and RS3
                       RS3    RS1         RS2      RS3          RS2    RS1    RS1




  45
Friday, April 12, 13
Regions	
  distributed	
  across	
  the	
  cluster

                                                     T1-R1
                                                             00001      John            415-111-1234
                       DataNode     RegionServer             00002      Paul            408-432-9922
                                                             00003      Ron             415-993-2124
                                                                                                         RegionServer 1 (RS1), hosting regions R1 of
                                                             00004      Bob             818-243-9988
                                                                                                             the user table T1 and M2 of .META.
                                                   .META. - M2

                                                             T1:00009-T1:00012     R3        RS2
                                  Host1

                                                     T1-R2
                                                             00005      Carly           206-221-9123
                                                             00006      Scott           818-231-2566
                                                             00007      Simon           425-112-9877
                       DataNode     RegionServer             00008      Lucas           415-992-4432
                                                     T1-R3
                                                                                                          RegionServer 2 (RS2), hosting regions R2,
                                                             00009      Steve           530-288-9832        R3 of the user table and M1 of .META.
                                                             00010      Kelly           916-992-1234
                                                             00011      Betty           650-241-1192
                                                             00012      Anne            206-294-1298
                                  Host2
                                                   .META. - M1
                                                             T1:00001-T1:00004     R1        RS1
                                                             T1:00005-T1:00008     R2        RS2



                       DataNode     RegionServer     -ROOT-
                                                             M:T100001-M:T100009        M1         RS2
                                                                                                          RegionServer 3 (RS3), hosting just -ROOT-
                                                             M:T100010-M:T100012        M2         RS1



                                  Host3




  46
Friday, April 12, 13
What	
  the	
  client	
  does
                                            MyTable


                                   .META.
                                            TheRealMT


                       -ROOT-


      ZooKeeper




  47
Friday, April 12, 13
What	
  the	
  client	
  does
                                            MyTable


                                   .META.
                                            TheRealMT


                       -ROOT-


      ZooKeeper




  47
Friday, April 12, 13
What	
  the	
  client	
  does
                                                      MyTable


                                             .META.
                                                      TheRealMT


                           -ROOT-


      ZooKeeper


                       Row per META region




  47
Friday, April 12, 13
What	
  the	
  client	
  does
                                                                    MyTable


                                             .META.
                                                                    TheRealMT


                           -ROOT-


      ZooKeeper


                       Row per META region




                                             Row per table region




  47
Friday, April 12, 13
What	
  the	
  client	
  does
                                                                    MyTable


                                             .META.
                                                                    TheRealMT


                           -ROOT-


      ZooKeeper


                       Row per META region




                                             Row per table region




  47
Friday, April 12, 13
48
Friday, April 12, 13

Contenu connexe

En vedette

20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d
Zeeshan Huq
 
Tajuk 1 penyelidikan pendidikan tawau
Tajuk 1 penyelidikan pendidikan tawauTajuk 1 penyelidikan pendidikan tawau
Tajuk 1 penyelidikan pendidikan tawau
fatinmizi
 
20130309 brand management chapter 3 iba 45 e
20130309 brand management chapter 3 iba 45 e20130309 brand management chapter 3 iba 45 e
20130309 brand management chapter 3 iba 45 e
Zeeshan Huq
 
20130603 brand management chapter 6
20130603 brand management chapter 620130603 brand management chapter 6
20130603 brand management chapter 6
Zeeshan Huq
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
alanpillay79
 
20140308 brand management chapter 2 iba mba48 e
20140308 brand management chapter 2 iba mba48 e20140308 brand management chapter 2 iba mba48 e
20140308 brand management chapter 2 iba mba48 e
Zeeshan Huq
 
Programming fundamental
Programming fundamentalProgramming fundamental
Programming fundamental
Mukesh Thakur
 
20140329 brand management chapter 4 iba mba48 e
20140329 brand management chapter 4 iba mba48 e20140329 brand management chapter 4 iba mba48 e
20140329 brand management chapter 4 iba mba48 e
Zeeshan Huq
 
Me myself and i (h)
 Me myself and i (h) Me myself and i (h)
Me myself and i (h)
Matt Carey
 
513 presentation1
513 presentation1513 presentation1
513 presentation1
myriahjday
 
20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d
Zeeshan Huq
 
20130223 brand management chapter 1 iba 45 e.pptx
20130223 brand management chapter 1 iba 45 e.pptx20130223 brand management chapter 1 iba 45 e.pptx
20130223 brand management chapter 1 iba 45 e.pptx
Zeeshan Huq
 

En vedette (20)

20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d20131125 buyer behavior iba mba48 d
20131125 buyer behavior iba mba48 d
 
Tajuk 1 penyelidikan pendidikan tawau
Tajuk 1 penyelidikan pendidikan tawauTajuk 1 penyelidikan pendidikan tawau
Tajuk 1 penyelidikan pendidikan tawau
 
Letra P
Letra PLetra P
Letra P
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
 
Mt 1
Mt 1Mt 1
Mt 1
 
20130309 brand management chapter 3 iba 45 e
20130309 brand management chapter 3 iba 45 e20130309 brand management chapter 3 iba 45 e
20130309 brand management chapter 3 iba 45 e
 
仰望星空-最不靠谱的10条古训
仰望星空-最不靠谱的10条古训仰望星空-最不靠谱的10条古训
仰望星空-最不靠谱的10条古训
 
20130603 brand management chapter 6
20130603 brand management chapter 620130603 brand management chapter 6
20130603 brand management chapter 6
 
User Experience through My work
User Experience through My workUser Experience through My work
User Experience through My work
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone Plus
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
 
20140308 brand management chapter 2 iba mba48 e
20140308 brand management chapter 2 iba mba48 e20140308 brand management chapter 2 iba mba48 e
20140308 brand management chapter 2 iba mba48 e
 
Programming fundamental
Programming fundamentalProgramming fundamental
Programming fundamental
 
20140329 brand management chapter 4 iba mba48 e
20140329 brand management chapter 4 iba mba48 e20140329 brand management chapter 4 iba mba48 e
20140329 brand management chapter 4 iba mba48 e
 
Me myself and i (h)
 Me myself and i (h) Me myself and i (h)
Me myself and i (h)
 
513 presentation1
513 presentation1513 presentation1
513 presentation1
 
Helsingin Keskustakirjastoselvitys
Helsingin KeskustakirjastoselvitysHelsingin Keskustakirjastoselvitys
Helsingin Keskustakirjastoselvitys
 
20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d20131220 buyer behavior iba mba48 d
20131220 buyer behavior iba mba48 d
 
20130223 brand management chapter 1 iba 45 e.pptx
20130223 brand management chapter 1 iba 45 e.pptx20130223 brand management chapter 1 iba 45 e.pptx
20130223 brand management chapter 1 iba 45 e.pptx
 
TL P5 & P6 parent's briefing 2011
TL P5 & P6 parent's briefing 2011TL P5 & P6 parent's briefing 2011
TL P5 & P6 parent's briefing 2011
 

Similaire à Building apps with HBase - Big Data TechCon Boston

SharePoint TechCon 2009 - 803
SharePoint TechCon 2009 - 803SharePoint TechCon 2009 - 803
SharePoint TechCon 2009 - 803
Andreas Grabner
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Framework
sixtyone
 
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 SearchSPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
Agnes Molnar
 
Going Mobile with HTML5
Going Mobile with HTML5Going Mobile with HTML5
Going Mobile with HTML5
John Reiser
 

Similaire à Building apps with HBase - Big Data TechCon Boston (20)

Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Search all the things
Search all the thingsSearch all the things
Search all the things
 
A View on eScience
A View on eScienceA View on eScience
A View on eScience
 
2012 09 aos-workshop-johanneskeizer
2012 09 aos-workshop-johanneskeizer2012 09 aos-workshop-johanneskeizer
2012 09 aos-workshop-johanneskeizer
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
SharePoint TechCon 2009 - 803
SharePoint TechCon 2009 - 803SharePoint TechCon 2009 - 803
SharePoint TechCon 2009 - 803
 
Neo4j
Neo4jNeo4j
Neo4j
 
2012 09 caas-ag_infra
2012 09 caas-ag_infra2012 09 caas-ag_infra
2012 09 caas-ag_infra
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Framework
 
10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 Search10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 Search
 
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 SearchSPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
 
Going Mobile with HTML5
Going Mobile with HTML5Going Mobile with HTML5
Going Mobile with HTML5
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Building Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopBuilding Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On Hadoop
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Building apps with HBase - Big Data TechCon Boston

  • 1. Building  applica2ons  with   HBase Goes  Here Headline   Amandeep  Khurana  |  Solu7ons  AHere Speaker  Name  or  Subhead  Goes   rchitect Big  Data  TechCon,  Boston,  April  2013 1 Friday, April 12, 13
  • 2. About  me • Solu2ons  Architect,  Cloudera  Inc • Amazon  Web  Services • Interested  in  large  scale  distributed  systems • Co-­‐author,  HBase  In  Ac2on • TwiHer:  amansk Nick Dimiduk Amandeep Khurana MANNING 2 Friday, April 12, 13
  • 3. About  the  talk • Setup  local  HBase  environment • Simple  client • Deeper  dive  on  API • Data  model 3 Friday, April 12, 13
  • 4. What  is  HBase? • Scalable,  distributed  data  store • Open  source  avatar  of  Google’s  Bigtable • Sorted  map  of  maps • Sparse • Mul2  dimensional • Tightly  integrated  with  Hadoop • Not  a  RDBMS 4 Friday, April 12, 13
  • 5. So,  what’s  the  big  deal? • Give  up  system  complexity  and  sophis2cated   features  for  ease  of  scale  and  manageability • Simple  system  !=  easy  to  build  applica2ons  on  top • Goal:  predictable  read  and  write  performance  at   scale • Effec2ve  API  usage • Op2mal  schema  design • Understand  internals  and  leverage  at  scale 5 Friday, April 12, 13
  • 6. Use  cases • Canonical  use  case:  storing  crawl  data  and  indices   for  search Web Search powered by Bigtable Crawlers Internets Crawlers Bigtable Crawlers Crawlers 1 2 MapReduce Indexing the Internet 1 Crawlers constantly scour the Internet for new pages. Those pages are stored as individual records in Bigtable. 4 3 2 A MapReduce job runs over the entire table, generating Search search indexes for the Web Search application. Search Index Search Web Search Index Index 5 You Searching the Internet 3 The user initiates a Web Search request. 4 The Web Search application queries the Search Indexes and retries matching documents directly from Bigtable. 5 Search results are presented to the user. 6 Friday, April 12, 13
  • 7. Use  cases  (con2nued) • Capturing  incremental  trickle  data • Content  serving  /  OLTP • Blob  store 7 Friday, April 12, 13
  • 8. Installing  HBase • Get  it  here  -­‐>  hHp://hbase.apache.org/ • Downloads  -­‐>  Choose  a  mirror  -­‐>  Choose  a  release • Download  the  tarball • Untar  it  and  run  it 8 Friday, April 12, 13
  • 9. Basics • Modes  of  opera2on • Standalone • Pseudo-­‐distributed • Fully  distributed • Start  up  in  standalone  mode • cd  hbase-­‐0.94.1 • bin/start-­‐hbase.sh • Logs  are  in  ./logs/ • Check  out  hHp://localhost:60010  in  your  browser 9 Friday, April 12, 13
  • 10. Ways  to  interact • Shell • Java  client • Thrid  interface • REST  interface • Asynchbase 10 Friday, April 12, 13
  • 11. Shell • Basic  interac2on  tool • WriHen  in  JRuby • Uses  the  Java  client  library • Good  place  to  start  poking  around 11 Friday, April 12, 13
  • 12. Shell  (con2nued) • bin/hbase  shell • Create  table • create  ‘users’  ,  ‘info’ • List  tables • list • Describe  table • describe  ‘users’ 12 Friday, April 12, 13
  • 13. Shell  (con2nued) • Put  a  row • put  ‘users’  ,  ‘u1’,  ‘info:name’  ,  ‘Mr.  Rogers’ • Get  a  row • get  ‘users’  ,  ‘u1’ • Put  more • put  ‘users’  ,  ‘u1’  ,  ‘info:email’  ,  ‘mail@rogers.com’ • put  ‘users’  ,  ‘u1’  ,  ‘info:password’  ,  ‘foobar’ • Get  a  row • get  ‘users’  ,  ‘u1’ • Scan  table • scan  ‘users’ 13 Friday, April 12, 13
  • 14. Demo -­‐  Standalone  mode -­‐  Shell 14 Friday, April 12, 13
  • 15. Java  API ...  can’t  really  build  an  applica7on  just  with  the  shell 15 Friday, April 12, 13
  • 16. Important  terms • Table • Consists  of  rows  and  columns • Row • Has  a  bunch  of  columns. • Iden2fied  by  a  rowkey  (primary  key) • Column  Qualifier • Dynamic  column  name • Column  Family • Column  groups  -­‐  logical  and  physical  (Similar  access  paHern) • Cell • The  actual  element  that  contains  the  data  for  a  row-­‐column  intersec2on • Version • Every  cell  has  mul2ple  versions. 16 Friday, April 12, 13
  • 17. Introducing  TwitBase • TwiHer  clone  built  on  HBase • Designed  to  scale • It’s  an  example,  not  a  produc2on  app • Start  with  users  and  twits  for  now • Users • Have  profiles • Post  twits • Twits • Have  text  posted  by  users • Get  sample  code  at:  hHps://github.com/hbaseinac2on/twitbase 17 Friday, April 12, 13
  • 18. Java  API • Configura)on  -­‐  holds  config  informa2on  about   the  cluster.  Roughly  equivalent  to  JDBC   connec2on  string • HConnec)on  -­‐  represents  connec2on  to  a  cluster • HBaseAdmin  -­‐  admin  tool  to  handle  DDL   opera2ons • HTablePool  -­‐  pool  of  connec2ons  to  cluster • HTable  (HTableInterface)  -­‐  handle  on  a  single   HBase  table.  Sends  commands  to  table 18 Friday, April 12, 13
  • 19. Connec2ng  to  HBase • Using default confs: HTableInterface usersTable = new HTable("users"); • Passing custom conf: Configuration myConf = HBaseConfiguration.create(); myConf.set("parameter_name", "parameter_value"); HTableInterface usersTable = new HTable(myConf, "users"); • Connection pooling HTablePool pool = new HTablePool(); HTableInterface usersTable = pool.getTable("users"); // work with the table usersTable.close(); 19 Friday, April 12, 13
  • 20. Inser2ng  data • Put  call  on  HTable Put p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("samuel@clemens.org")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("Langhorne")); usersTable.put(p); 20 Friday, April 12, 13
  • 21. Data  coordinates • Row  is  addressed  using  rowkey • Cell  is  addressed  using              [rowkey  +  family  +  qualifier] 21 Friday, April 12, 13
  • 22. Upda2ng  data •Update  ==  Write Put p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("abc123")); usersTable.put(p); 22 Friday, April 12, 13
  • 23. Write  path WAL Write goes to the write-ahead log (WAL) Memstore as well as the in memory write buffer called the Memstore. Client HFile Client's don't interact directly with the underlying HFiles during writes. HFile Memstore New HFile Memstore gets flushed to disk to form a new HFile when filled up with writes. HFile HFile 23 Friday, April 12, 13
  • 24. Reading  data • Get  the  en2re  row Get g = new Get(Bytes.toBytes("TheRealMT")); Result r = usersTable.get(g); • Get  selected  columns Get g = new Get(Bytes.toBytes("TheRealMT")); g.addColumn(Bytes.toBytes("info"), Bytes.toBytes("password")); Result r = usersTable.get(g); • Extract  the  value  out  of  the  Result  object byte[] b = r.getValue(Bytes.toBytes("info"), Bytes.toBytes("email")); String email = Bytes.toString(b); 24 Friday, April 12, 13
  • 25. Reading  data  (con2nued) • Scan Scan s = new Scan(startRow, stopRow); ResultsScanner rs = twits.getScanner(s); for(Result r : rs) { // iterate over scanner results } • Single threaded iteration over defined set of rows. Could be the entire table too. 25 Friday, April 12, 13
  • 26. What  happens  when  you  read  data? BlockCache Memstore Client HFile HFile 26 Friday, April 12, 13
  • 27. Dele2ng  data • Another  simple  API  call • Delete  en2re  row Delete d = new Delete(Bytes.toBytes("TheRealMT")); usersTable.delete(d); • Delete  selected  columns Delete d = new Delete(Bytes.toBytes("TheRealMT")); d.deleteColumns(Bytes.toBytes("info"), Bytes.toBytes("email")); usersTable.delete(d); 27 Friday, April 12, 13
  • 28. Delete  internals • Tombstone  markers • Dele2on  only  happens  during  major  compac2ons • Compac2ons • Minor • Major 28 Friday, April 12, 13
  • 29. Compac2ons Memstore Memstore HFile Two or more HFiles get combined Compacted to form a single HFile during HFile a compaction. HFile HFile HFile 29 Friday, April 12, 13
  • 30. DAO • Like  any  other  applica2on • Makes  data  access  management  easy • Cleaner  code • Table  info  at  a  single  place • Op2miza2ons  like  connec2on  pooling  etc • Easier  management  of  database  configs 30 Friday, April 12, 13
  • 31. Admin HBaseAdmin Table  Opera2ons • Administra2ve  API  to  the   • createTable() cluster. • deleteTable() • Provides  an  interface  to   • addColumn() manage  HBase  database   tables  metadata  and   • modifyColumn() general  administra2ve   • deleteColumn() func2ons. • listTables() Friday, April 12, 13
  • 32. Demo -­‐  Java  client 32 Friday, April 12, 13
  • 33. More  API  features • Filters • Compare  and  Swap • Coprocessors 33 Friday, April 12, 13
  • 34. Data  Model It’s  not  a  rela7onal  database  system 34 Friday, April 12, 13
  • 35. HBase  is  ... • Column  family  oriented  database • Column  family  oriented • Tables  consis2ng  of  rows  and  columns • Persisted  Map • Sparse • Mul2  dimensional • Sorted • Indexed  by  rowkey,  column  and  2mestamp • Key  Value  store • [rowkey,  col  family,  col  qualifier,  2mestamp]  -­‐>  cell  value 35 Friday, April 12, 13
  • 36. HBase  is  not  ... • A  rela2onal  database • No  SQL  query  language • No  joins • No  secondary  indexing • No  transac2ons 36 Friday, April 12, 13
  • 37. Tabular  representa2on 2 Column Family - Info 1 3 Rowkey name email password GrandpaD Mark Twain samuel@clemens.org abc123 HMS_Surprise Patrick O'Brien aubrey@sea.com abc123 The table is lexicographically sorted on the rowkeys SirDoyle Fyodor Dostoyevsky fyodor@brothers.net abc123 TheRealMT Sir Arthur Conan Doyle art@TheQueensMen.co.uk Langhorne abc123 4 Cells ts1=1329088321289 ts2=1329088818321 Each cell has multiple The coordinates used to identify data in an HBase table are: versions, (1) rowkey, (2) column family, (3) column qualifier, (4) version typically represented by the timestamp of when they were inserted into the table (ts2>ts1) 37 Friday, April 12, 13
  • 38. Key-­‐Value  store Keys Values [TheRealMT, info, password, 1329088818321] abc123 [TheRealMT, info, password, 1329088321289] Langhorne A single KeyValue instance 38 Friday, April 12, 13
  • 39. Key-­‐Value  store [TheRealMT, info, password, 1329088818321] abc123 1 Start with coordinates of full precision { 1329088818321 : "abc123", [TheRealMT, info, password] 1329088321289 : "Langhorne" } 2 Drop version and you're left with a map of version to values Keys { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { [TheRealMT, info] 1329088321289 : "Mark Twain" }, Values "password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } 3 Omit qualifier and you have a map of qualifiers to the previous maps { "info" : { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { 1329088321289 : "Mark Twain" [TheRealMT] }, "password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } } 4 Finally, drop the column family and you have a row, a map of maps 39 Friday, April 12, 13
  • 40. Sorted  map  of  maps Rowkey { "TheRealMT" : { Column family "info" : { "email" : { 1329088321289 : "samuel@clemens.org" }, "name" : { 1329088321289 : "Mark Twain" Column qualifiers }, "password" : { Values 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } } Versions } 40 Friday, April 12, 13
  • 41. HFiles  and  physical  data  model • HFiles  are • Immutable • Sorted  on  rowkey  +  qualifier  +  2mestamp • In  the  context  of  a  column  family  per  region "TheRealMT" , "info" , "email" , 1329088321289, "samuel@clemens.org" "TheRealMT" , "info" , "name" , 1329088321289 , "Mark Twain" "TheRealMT" , "info" , "password" , 1329088818321 , "abc123", "TheRealMT" , "info" , "password" , 1329088321289 , "Langhorne" HFile for the info column family in the users table 41 Friday, April 12, 13
  • 42. HBase  in  distributed  mode ...  what  really  goes  on  under  the  hood 42 Friday, April 12, 13
  • 43. Distributed  mode • Table  is  split  into  Regions T1R1 00001 John 415-111-1234 00002 Paul 408-432-9922 00001 John 415-111-1234 00003 Ron 415-993-2124 00001 John 415-111-1234 00002 Paul 408-432-9922 00004 Bob 818-243-9988 00002 Paul 408-432-9922 00003 Ron 415-993-2124 00003 Ron 415-993-2124 00004 Bob 818-243-9988 00004 Bob 818-243-9988 T1R2 00005 Carly 206-221-9123 00005 Carly 206-221-9123 00005 Carly 206-221-9123 00006 Scott 818-231-2566 00006 Scott 818-231-2566 00006 Scott 818-231-2566 00007 Simon 425-112-9877 00007 Simon 425-112-9877 00007 Simon 425-112-9877 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00008 Lucas 415-992-4432 00009 Steve 530-288-9832 00010 Kelly 916-992-1234 00009 Steve 530-288-9832 T1R3 00011 Betty 650-241-1192 00010 Kelly 916-992-1234 00012 Anne 206-294-1298 00011 Betty 650-241-1192 00009 Steve 530-288-9832 00012 Anne 206-294-1298 00010 Kelly 916-992-1234 00011 Betty 650-241-1192 00012 Anne 206-294-1298 Full table T1 Table T1 split into 3 regions - R1, R2 and R3 43 Friday, April 12, 13
  • 44. Distributed  mode  (con2nued) • Regions  are  served   by  RegionServers DataNode RegionServer T1R1 00001 00002 00003 John Paul Ron 415-111-1234 408-432-9922 415-993-2124 00004 Bob 818-243-9988 • RegionServers  are   Host1 Java  processes,   T1R2 00005 Carly 206-221-9123 00006 Scott 818-231-2566 DataNode RegionServer 00007 Simon 425-112-9877 00008 Lucas 415-992-4432 typically  collocated   T1R3 00009 00010 00011 Steve Kelly Betty 530-288-9832 916-992-1234 650-241-1192 with  the  DataNodes Host2 00012 Anne 206-294-1298 DataNode RegionServer Host3 44 Friday, April 12, 13
  • 45. Finding  your  region • Two  special  tables:  -­‐ROOT-­‐  and  .META. -ROOT- table, consisting of only 1 region -ROOT- hosted on region server RS1 RS1 .META. table, consisting of 3 regions, M1 M2 M3 hosted on RS1, RS2 and RS3 RS2 RS3 RS1 User tables T1 and T2, consisting of T1R1 T1R2 T2R1 T1R3 T2R2 T2R3 T2R4 4 and 3 regions respectively, distributed amongst RS1, RS2 and RS3 RS3 RS1 RS2 RS3 RS2 RS1 RS1 45 Friday, April 12, 13
  • 46. Regions  distributed  across  the  cluster T1-R1 00001 John 415-111-1234 DataNode RegionServer 00002 Paul 408-432-9922 00003 Ron 415-993-2124 RegionServer 1 (RS1), hosting regions R1 of 00004 Bob 818-243-9988 the user table T1 and M2 of .META. .META. - M2 T1:00009-T1:00012 R3 RS2 Host1 T1-R2 00005 Carly 206-221-9123 00006 Scott 818-231-2566 00007 Simon 425-112-9877 DataNode RegionServer 00008 Lucas 415-992-4432 T1-R3 RegionServer 2 (RS2), hosting regions R2, 00009 Steve 530-288-9832 R3 of the user table and M1 of .META. 00010 Kelly 916-992-1234 00011 Betty 650-241-1192 00012 Anne 206-294-1298 Host2 .META. - M1 T1:00001-T1:00004 R1 RS1 T1:00005-T1:00008 R2 RS2 DataNode RegionServer -ROOT- M:T100001-M:T100009 M1 RS2 RegionServer 3 (RS3), hosting just -ROOT- M:T100010-M:T100012 M2 RS1 Host3 46 Friday, April 12, 13
  • 47. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper 47 Friday, April 12, 13
  • 48. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper 47 Friday, April 12, 13
  • 49. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region 47 Friday, April 12, 13
  • 50. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region Row per table region 47 Friday, April 12, 13
  • 51. What  the  client  does MyTable .META. TheRealMT -ROOT- ZooKeeper Row per META region Row per table region 47 Friday, April 12, 13