SlideShare une entreprise Scribd logo
1  sur  65
Télécharger pour lire hors ligne
Hbase	
  and	
  M7	
  Technical	
  
  Overview	
  
 Keys	
  Botzum	
  
 Senior	
  Principal	
  Technologist	
  
 MapR	
  Technologies	
  


 March	
  2013	
  
©MapR	
  Technologies	
  	
           1	
  
Agenda	
  

                            HBase	
  
                            MapR	
  
                            M7	
  
                            Containers	
  	
  




©MapR	
  Technologies	
  	
                      2	
  
 

                                       HBase	
  
             A	
  sparse,	
  distributed,	
  persistent,	
  indexed,	
  and	
  
                                          sorted	
  map	
  
                                              OR	
  
                                    A	
  NoSQL	
  database	
  
                                              OR	
  
                               A	
  Columnar	
  data	
  store	
  
                                               	
  
©MapR	
  Technologies	
  	
                  3	
  
Key-­‐Value	
  Store	
  

§       Row	
  key	
  
         –  Binary	
  sortable	
  value	
  

§       Row	
  content	
  key	
  (analogous	
  to	
  a	
  column)	
  
         –  Column	
  family	
  (string)	
  
         –  Column	
  qualifier	
  (binary)	
  
         –  Version/Omestamp	
  (number)	
  

§      A	
  row	
  key,	
  column	
  family,	
  column	
  qualifier,	
  and	
  version	
  uniquely	
  
        idenOfies	
  a	
  parOcular	
  cell	
  
         –  A	
  cell	
  contains	
  a	
  single	
  binary	
  value	
  




©MapR	
  Technologies	
  	
                                          4	
  
A	
  Row	
  	
  
          	
  C0 	
             	
       	
  C1 	
     	
       	
  C2 	
     	
               	
  C3 	
     	
      	
  C4 	
     	
      	
        	
  CN	
  

Row	
  Key	
                           Value1	
               Value2	
                       Value3	
               Value4	
                      ValueN	
  
                                                                                                                                   …	
  


                                       Column	
               Column	
  
 Row	
  Key	
                                                                                Version	
               Value1	
  
                                       Family	
               Qualifier	
  


                                       Column	
               Column	
  
 Row	
  Key	
                                                                                Version	
               Value2	
  
                                       Family	
               Qualifier	
  
                                                                              …
                                       Column	
               Column	
  
 Row	
  Key	
                                                                                Version	
              ValueN	
  
                                       Family	
               Qualifier	
  


©MapR	
  Technologies	
  	
                                                          5	
  
Not	
  A	
  TradiAonal	
  RDBMS	
  

§      Weakly	
  typed	
  and	
  schema-­‐less	
  (unstructured	
  or	
  perhaps	
  semi-­‐
        structured)	
  
         –  Almost	
  everything	
  is	
  binary	
  

§       No	
  constraints	
  
         –  You	
  can	
  put	
  any	
  binary	
  value	
  in	
  any	
  cell	
  
         –  You	
  can	
  even	
  put	
  incompaOble	
  types	
  in	
  two	
  different	
  instances	
  of	
  the	
  same	
  
                column	
  family:column	
  qualifier	
  
§       Column	
  (qualifiers)	
  are	
  created	
  implicitly	
  
§       Different	
  rows	
  can	
  have	
  different	
  columns	
  
§       No	
  transacOons/no	
  ACID	
  
         –  Only	
  unit	
  of	
  atomic	
  operaOon	
  is	
  a	
  single	
  row	
  




©MapR	
  Technologies	
  	
                                             6	
  
API	
  

§       APIs	
  for	
  querying	
  (get),	
  scanning,	
  and	
  updaOng	
  (put)	
  
         –  Operate	
  on	
  row	
  key,	
  column	
  family,	
  qualifier,	
  version,	
  and	
  values	
  
         –  Can	
  parOally	
  specify	
  and	
  will	
  retrieve	
  union	
  of	
  results	
  
                 •     if	
  just	
  specify	
  row	
  key,	
  will	
  get	
  all	
  values	
  for	
  it	
  (with	
  column	
  family,	
  qualifier)	
  
                         –      By	
  default	
  only	
  largest	
  version	
  (most	
  recent	
  if	
  Omestamp)	
  	
  is	
  returned	
  
                 •     Specify	
  row	
  key	
  and	
  column	
  family	
  to	
  get	
  will	
  retrieve	
  all	
  values	
  for	
  that	
  row	
  and	
  
                       column	
  family	
  
         –  Scanning	
  is	
  just	
  get	
  over	
  a	
  range	
  of	
  row	
  keys	
  

§       Version	
  
         –  While	
  defaults	
  to	
  a	
  Omestamp,	
  any	
  integer	
  is	
  acceptable	
  




©MapR	
  Technologies	
  	
                                                               7	
  
Columnar	
  

§      Rather	
  than	
  storing	
  table	
  rows	
  linearly	
  on	
  disk	
  and	
  each	
  row	
  on	
  
        disk	
  as	
  a	
  single	
  byte	
  range	
  with	
  fixed	
  size	
  fields,	
  store	
  columns	
  of	
  
        row	
  separately	
  
         –  Very	
  efficient	
  storage	
  for	
  sparse	
  data	
  sets	
  (NULL	
  is	
  free)	
  
         –  Compression	
  works	
  befer	
  on	
  similar	
  data	
  	
  
         –  Fetches	
  of	
  only	
  subsets	
  of	
  row	
  very	
  efficient	
  (less	
  disk	
  IO)	
  
         –  No	
  fixed	
  size	
  on	
  column	
  values	
  
         –  No	
  requirement	
  to	
  even	
  define	
  columns	
  



§       Columns	
  are	
  grouped	
  together	
  into	
  column	
  families	
  
         –  Basically	
  a	
  file	
  on	
  disk	
  
         –  A	
  unit	
  of	
  opOmizaOon	
  
         –  In	
  Hbase,	
  adding	
  column	
  is	
  implicit,	
  adding	
  column	
  family	
  is	
  explicit	
  


©MapR	
  Technologies	
  	
                                         8	
  
HBase	
  Table	
  Architecture	
  
            §       Tables	
  are	
  divided	
  into	
  key	
  ranges	
  (regions)	
  
            §       Regions	
  are	
  served	
  by	
  nodes	
  (RegionServers)	
  
            §       Columns	
  are	
  divided	
  into	
  access	
  groups	
  (columns	
  families)	
  

                            CF1	
     CF2	
           CF3	
      CF4	
     CF5	
  

R1	
  

R2	
  

R3	
  

R4	
  


©MapR	
  Technologies	
  	
                                     9	
  
Storage	
  Model	
  Highlights	
  

§       Data	
  is	
  stored	
  in	
  sorted	
  order	
  
         –  A	
  table	
  contains	
  rows	
  
         –  A	
  sequence	
  of	
  rows	
  are	
  grouped	
  together	
  into	
  a	
  region	
  
                 •     A	
  region	
  consists	
  of	
  various	
  files	
  related	
  to	
  those	
  rows	
  and	
  is	
  loaded	
  into	
  a	
  region	
  
                       server	
  
                 •     Regions	
  are	
  stored	
  in	
  HDFS	
  for	
  high	
  availability	
  
         –  A	
  single	
  region	
  server	
  manages	
  mulOple	
  regions	
  
                 •     Region	
  assignment	
  can	
  change	
  –	
  load	
  balancing,	
  failures,	
  etc.	
  
§       Clients	
  connect	
  to	
  tables	
  
         –  HBase	
  runOme	
  transparently	
  determines	
  the	
  region	
  (based	
  on	
  key	
  ranges)	
  
                and	
  contacts	
  the	
  appropriate	
  region	
  server	
  
§      At	
  any	
  given	
  Ome	
  exactly	
  one	
  region	
  server	
  provides	
  access	
  to	
  a	
  
        region	
  
         –  Master	
  region	
  servers	
  (with	
  Zookeeper)	
  manage	
  that	
  

©MapR	
  Technologies	
  	
                                                      10	
  
What’s	
  Great	
  About	
  This?	
  

§       Very	
  scalable	
  
§       Easy	
  to	
  add	
  region	
  servers	
  
§       Easy	
  to	
  move	
  regions	
  around	
  
§       Scans	
  are	
  efficient	
  
         –  Unlike	
  hashing	
  based	
  models	
  

§       Access	
  via	
  row	
  key	
  is	
  very	
  efficient	
  
         –  Note:	
  there	
  are	
  no	
  secondary	
  indexes	
  

§       No	
  schema,	
  can	
  store	
  whatever	
  you	
  want	
  when	
  you	
  want	
  
§       Strong	
  consistency	
  
§       Integrated	
  with	
  Hadoop	
  
         –  Map-­‐reduce	
  on	
  HBase	
  is	
  straighlorward	
  
         –  HDFS/MapR-­‐FS	
  provides	
  data	
  replicaOon	
  
©MapR	
  Technologies	
  	
                                   11	
  
Data	
  Storage	
  Architecture	
  
§       Data	
  from	
  a	
  region	
  column	
  family	
  is	
  stored	
  in	
  an	
  HFile	
  
         –  An	
  HFile	
  contains	
  row	
  key:column	
  qualifier:version:value	
  
            entries	
  
         –  Index	
  at	
  the	
  end	
  into	
  the	
  data	
  –	
  64KB	
  “blocks”	
  by	
  default	
  
§       Update	
  
         –  New	
  value	
  is	
  wrifen	
  persistently	
  to	
  Write	
  Ahead	
  Log	
  (WAL)	
  
         –  Cached	
  in	
  memory	
  
         –  When	
  memory	
  fills,	
  write	
  out	
  new	
  HFile	
  
§       Read	
  
         –  Checks	
  in	
  memory,	
  then	
  all	
  of	
  the	
  Hfiles	
  
         –  Read	
  data	
  cached	
  in	
  memory	
  

§       Delete	
  
         –  Create	
  a	
  tombstone	
  record	
  (purged	
  at	
  major	
  compacOon)	
  
	
  
©MapR	
  Technologies	
  	
                                12	
  
Apache	
  HBase	
  HFile	
  Structure	
  
                                  Each	
  cell	
  is	
  an	
  individual	
  key	
  +	
  value	
  
                                  	
  	
  -­‐	
  a	
  row	
  repeats	
  the	
  key	
  for	
  each	
  column	
  


                                                               64Kbyte	
  blocks	
  
                Key-­‐value	
  
                                                               are	
  compressed	
  
                pairs	
  are	
  
                                                               	
  
                laid	
  out	
  in	
  
                increasing	
  
                order	
  




                                                                                              An	
  index	
  into	
  the	
  
                                                                                              compressed	
  blocks	
  is	
  
                                                                                              created	
  as	
  a	
  btree	
  


©MapR	
  Technologies	
  	
                                               13	
  
HBase	
  Region	
  OperaAon	
  
            §       Typical	
  region	
  size	
  is	
  a	
  few	
  GB,	
  someOmes	
  even	
  10G	
  or	
  20G	
  
            §       RS	
  	
  holds	
  data	
  in	
  memory	
  unOl	
  full,	
  then	
  writes	
  a	
  new	
  HFile	
  
                      –  Logical	
  view	
  of	
  database	
  constructed	
  by	
  layering	
  these	
  files,	
  with	
  the	
  
                                latest	
  on	
  top	
  
                      	
  

                                                                                                                      newest	
  




                                                                                                                       oldest	
  


                                                   Key	
  range	
  represented	
  by	
  this	
  region	
  


©MapR	
  Technologies	
  	
                                                        14	
  
HBase	
  Read	
  AmplificaAon	
  
               §       When	
  a	
  get/scan	
  comes	
  in,	
  all	
  the	
  files	
  have	
  to	
  be	
  examined	
  
                        –  schema-­‐less,	
  so	
  where	
  is	
  the	
  column?	
  
                        –  Done	
  in-­‐memory	
  and	
  does	
  not	
  change	
  what's	
  on	
  disk	
  
                                •    Bloom-­‐filters	
  do	
  not	
  help	
  in	
  scans	
  

                                                                                                          newest	
  




                                                                                                           oldest	
  


                  With	
  7	
  files,	
  a	
  1K-­‐record	
  get()	
  potenOally	
  takes	
  about	
  30	
  seeks,	
  	
  
                  7	
  block	
  fetches	
  and	
  decompressions,	
  from	
  HDFS.	
  Even	
  with	
  the	
  index	
  in	
  memory	
  
                  7	
  seeks	
  and	
  7	
  block	
  fetches	
  are	
  required.	
  

©MapR	
  Technologies	
  	
                                                            15	
  
HBase	
  Write	
  AmplificaAon	
  

               §       To	
  reduce	
  the	
  read-­‐amplificaOon,	
  HBase	
  merges	
  the	
  HFiles	
  
                        periodically	
  
                        –  process	
  called	
  compacOon	
  
                        –  runs	
  automaOcally	
  when	
  too	
  many	
  files	
  
                        –  usually	
  turned	
  off	
  due	
  to	
  I/O	
  storms	
  which	
  interfere	
  with	
  client	
  
                           access	
  
                        –  and	
  kicked-­‐off	
  manually	
  on	
  weekends	
  


                                                                                  Major	
  compacOon	
  reads	
  all	
  files	
  and	
  
                                                                                  merges	
  	
  into	
  a	
  single	
  HFile	
  




©MapR	
  Technologies	
  	
                                              16	
  
 HBase	
  Server	
  Architecture	
  

                                                              Zookeeper	
  

                                                                                                    HDFS	
  Server	
  
                                                                                  Coordinates	
  
                                               Lookup	
  
                                                            Hbase	
  Master	
                           Linux	
  
                                Client	
                                                             Filesystem	
  

                                             Data	
  
                                                            Hbase	
  Region	
  
                                                               Server	
  

                                                                                                    HFiles	
  


                                                                                                     WAL	
  


©MapR	
  Technologies	
  	
                                         17	
  
WAL	
  File	
  

§       A	
  persistent	
  record	
  of	
  every	
  update/insert	
  in	
  sequence	
  order	
  
         –  Shared	
  by	
  all	
  regions	
  on	
  one	
  region	
  server	
  
         –  WAL	
  files	
  periodically	
  rolled	
  to	
  limit	
  size	
  but	
  older	
  WALs	
  sOll	
  needed	
  
         –  WAL	
  file	
  no	
  longer	
  needed	
  once	
  every	
  region	
  with	
  updates	
  in	
  WAL	
  file	
  has	
  
                flushed	
  those	
  from	
  memory	
  to	
  an	
  HFile	
  
                 •     Remember	
  that	
  more	
  HFiles	
  slow	
  read	
  path!	
  
§      Must	
  be	
  replayed	
  as	
  part	
  of	
  recovery	
  process	
  since	
  in	
  memory	
  
        updates	
  are	
  “lost”	
  
         –  This	
  is	
  very	
  expensive	
  and	
  delays	
  bringing	
  a	
  region	
  back	
  online	
  




©MapR	
  Technologies	
  	
                                               18	
  
What’s	
  Not	
  So	
  Good	
  
                    Reliability	
  
                       •  Complex	
  coordinaOon	
  between	
  ZK,	
  HDFS,	
  HBase	
  
                          Master,	
  and	
  Region	
  Server	
  during	
  region	
  movement	
  
                       •  CompacOons	
  disrupt	
  operaOons	
  
                       •  Very	
  slow	
  crash	
  recovery	
  because	
  of	
  
                          •  CoordinaOon	
  complexity	
  
                          •  WAL	
  log	
  reading	
  (one	
  log/server)	
  

                    Business	
  conAnuity	
  
                       •  Many	
  administraOve	
  acOons	
  require	
  downOme	
  
                       •  Not	
  well	
  integrated	
  into	
  MapR-­‐FS	
  mirroring	
  and	
  
                          snapshot	
  funcOonality	
  

©MapR	
  Technologies	
  	
                                   19	
  
What’s	
  Not	
  So	
  Good	
  
                    Performance	
  
                       •  Very	
  long	
  read/write	
  path	
  
                       •  Significant	
  read	
  and	
  write	
  amplificaOon	
  
                       •  MulOple	
  JVMs	
  in	
  read/write	
  path	
  –	
  GC	
  delays!	
  

                    Manageability	
  
                       •  CompacOons,	
  splits	
  and	
  merges	
  must	
  be	
  done	
  
                          manually	
  (in	
  reality)	
  
                       •  Lots	
  of	
  “well	
  known”	
  problems	
  maintaining	
  reliable	
  
                          cluster	
  –	
  spliwng,	
  compacOons,	
  region	
  assignment,	
  etc.	
  
                       •  PracOcal	
  limits	
  on	
  number	
  of	
  regions/region	
  server	
  and	
  
                          size	
  of	
  regions	
  –	
  can	
  make	
  it	
  hard	
  to	
  fully	
  uOlize	
  
                          hardware	
  
©MapR	
  Technologies	
  	
                                       20	
  
Region	
  Assignment	
  in	
  Apache	
  HBase	
  




©MapR	
  Technologies	
  	
                      21	
  
Apache	
  HBase	
  on	
  MapR	
  




                   Limited	
  data	
  management,	
  data	
  protecOon	
  and	
  disaster	
  recovery	
  for	
  tables.	
  	
  

©MapR	
  Technologies	
  	
                                            22	
  
Agenda	
  

                            HBase	
  
                            MapR	
  
                            M7	
  
                            Containers	
  	
  




©MapR	
  Technologies	
  	
                      23	
  
MapR	
  
                  A	
  provider	
  of	
  enterprise	
  grade	
  Hadoop	
  with	
  
                           uniquely	
  differenOated	
  features	
  
                                                	
  




©MapR	
  Technologies	
  	
                      24	
  
MapR:	
  The	
  Enterprise	
  Grade	
  DistribuAon	
  




©MapR	
  Technologies	
  	
     25	
  
One	
  PlaVorm	
  for	
  Big	
  Data	
  
    Broad	
  	
  
                   RecommendaOon	
  Engines	
   Fraud	
  DetecOon	
      Billing	
           LogisOcs	
  
   range	
  of	
  
 applicaOons	
          Risk	
  Modeling	
     Market	
  SegmentaOon	
      Inventory	
  ForecasOng	
  
                                                                                                                               …	
  
  Batch	
                                                     InteracOve	
                                                   Real-­‐Ome	
  




                   Map	
          File-­‐Based	
        SQL	
                                                            Stream	
  
                                                                              Database	
             Search	
  
                  Reduce	
       ApplicaOons	
                                                                         Processing	
  
                                                                                                                                          …


        99.999%	
                  Data	
            Disaster	
                Scalability	
  	
      Enterprise	
           MulO-­‐
                                                                                   &	
  
           HA	
                 ProtecOon	
          Recovery	
               Performance	
          IntegraOon	
           tenancy	
  


©MapR	
  Technologies	
  	
                                          26	
  
Dependable:	
  Lights	
  Out	
  Data	
  Center	
  Ready	
  


             Reliable	
  Compute	
                                        Dependable	
  Storage	
  




      §  Automated	
  stateful	
  failover	
              §  Business	
  conOnuity	
  with	
  	
  
      §  Automated	
  re-­‐replicaOon	
                     snapshots	
  	
  and	
  mirrors	
  
                                                           §  Recover	
  to	
  a	
  point	
  in	
  Ome	
  
      §  Self-­‐healing	
  from	
  HW	
  	
  
            and	
  SW	
  failures	
                        §  End-­‐to-­‐end	
  check	
  summing	
  	
  

      §  Load	
  balancing	
                              §  Strong	
  consistency	
  

      §  No	
  lost	
  jobs	
  or	
  data	
               §  Data	
  safe	
  

      §  99999’s	
  of	
  upOme	
                         §  Mirror	
  across	
  sites	
  to	
  meet	
  
                                                             Recovery	
  Time	
  ObjecOves               	
  
©MapR	
  Technologies	
  	
                       27	
  
Fast:	
  World	
  Record	
  Performance	
  

   Benchmark	
                   MapR	
  2.1.1	
                CDH	
  4.1.1	
                MapR	
  Speed	
  
                                                                                              Increase	
  
   Terasort	
  (1x	
  replicaOon,	
  compression	
  disabled)	
  
   Total	
                       13m	
  35s	
                   26m	
  6s	
                              2X	
  
   Map	
                         7m	
  58s	
                    21m	
  8s	
                              3X	
  
   Reduce	
                      13m	
  32s	
                   23m	
  37s	
                           1.8X	
  
   DFSIO	
  throughput/node	
  
   Read	
                        1003	
  MB/s	
                 656	
  MB/s	
                          1.5X	
                 MinuteSort	
  Record	
  
   Write	
                       924	
  MB/s	
                  654	
  MB/s	
                          1.4X	
  
                                                                                                                              1.5	
  TB	
  in	
  60	
  seconds	
  
   YCSB	
  (50%	
  read,	
  50%	
  update)	
                                                                                             2103	
  nodes	
  
   Throughput	
                  36,584.4	
  op/s	
             12,500.5	
  op/s	
                     2.9X	
  
   RunOme	
                      3.80	
  hr	
                   11.11	
  hr	
                          2.9X	
  
   YCSB	
  (95%	
  read,	
  5%	
  update)	
  
   Throughput	
                  24,704.3	
  op/s	
             10,776.4	
  op/s	
                     2.3X	
  
   RunOme	
                      0.56	
  hr	
                   1.29	
  hr	
                           2.3X	
  

   Benchmark	
  hardware	
  configuraOon:	
  	
  
   10	
  servers,	
  12	
  x	
  2	
  cores	
  (2.4	
  GHz),	
  12	
  x	
  2TB,	
  48	
  GB,	
  1	
  x	
  10GbE	
  


©MapR	
  Technologies	
  	
                                                                                          28	
  
The	
  Cloud	
  Leaders	
  Pick	
  MapR	
  




                                Amazon	
  EMR	
  is	
  the	
  largest	
                Google	
  chose	
  MapR	
  to	
  
                                Hadoop	
  provider	
  in	
  revenue	
                provide	
  Hadoop	
  on	
  Google	
  
                                    and	
  #	
  of	
  clusters	
                          Compute	
  Engine	
  




©MapR	
  Technologies	
  	
                                                 29	
  
MapR	
  Supports	
  Broad	
  Set	
  of	
  Customers	
  
Global	
  Credit	
  Card	
  Issuer	
                                                 Leading	
  Retailer	
  
§        RecommendaOon	
  Engine	
                                      §          Customer	
  Behavior	
  Analysis	
                  §    Customer	
  targeOng	
  
§        Fraud	
  detecOon	
  and	
  PrevenOon	
                        §          Brand	
  Monitoring	
                               §    Viewer	
  Behavioral	
  analyOcs	
  


                                                                                                                                                                  §    Global	
  threat	
  	
  
                                                                                                                                                                        analyOcs	
  
§        Intrusion	
  detecOon	
  &	
  prevenOon	
                    §           RecommendaOon	
  Engine	
                                                     §    Virus	
  analysis	
  
§        Forensic	
  analysis	
                                       §           Family	
  tree	
  connecOons	
                                                      	
  

                                                                                                                                                      §      Clickstream	
  Analysis	
  
                                            §    PaOent	
  care	
                                           §    Log	
  analysis	
                  §      Quality	
  profiling/field	
  
                                                  monitoring	
                                               §    HBase	
                                    failure	
  analysis	
  




§        Fraud	
  DetecOon	
  	
                                            §      AdverOsing	
  exchange	
                            §    Monitoring	
  and	
  measuring	
  
§        Channel	
  analyOcs	
                                                      analysis	
  and	
  opOmizaOon	
                           online	
  behavior	
  
                                                                             	
  

                                      §    Customer	
  Revenue	
                                                                                           §    Enterprise	
  Grade	
  
                                            AnalyOcs	
                                      §    Customer	
  targeOng	
                                          Plalorm	
  
                                      §    ETL	
  Offload	
                                  §    Social	
  media	
  analysis	
                             §    COOP	
  features	
  
      ©MapR	
  Technologies	
  	
                                                                   30	
  
MapR	
  EdiAons	
  


     §       Control	
  System	
       §    Control	
  System	
               §    All	
  the	
  Features	
  of	
  M5	
  
     §       NFS	
  Access	
           §    NFS	
  Access	
                   §    Simplified	
  
                                        §    Performance	
                           AdministraOon	
  for	
  
     §       Performance	
                                                           HBase	
  
     §       Unlimited	
  Nodes	
      §    High	
  Availability	
  
                                                                                §    Increased	
  Performance	
  
     §       Free	
  	
                §    Snapshots	
  &	
  Mirroring	
  
                                                                                §    Consistent	
  Low	
  Latency	
  
                                        §    24	
  X	
  7	
  Support	
  
                                                                                §    Unified	
  Snapshots,	
  
                                        §    Annual	
  SubscripOon	
                 Mirroring	
  


                                       Also	
  Available	
  through:	
  	
  




                                       Compute	
  Engine	
  


©MapR	
  Technologies	
  	
                                    31	
  
Agenda	
  

                            Hbase	
  
                            MapR	
  
                            M7	
  
                            Containers	
  	
  




©MapR	
  Technologies	
  	
                      32	
  
M7	
  
                                    An	
  integrated	
  system	
  for	
  
                                unstructured	
  and	
  structured	
  data	
  
                                                   	
  




©MapR	
  Technologies	
  	
                          33	
  
Introducing	
  MapR	
  M7	
  

                 §  An	
  integrated	
  system	
  
                          –  Unified	
  namespace	
  for	
  files	
  and	
  tables	
  
                          –  Built-­‐in	
  data	
  management	
  &	
  protecOon	
  
                          –  No	
  extra	
  administraOon	
  



                 §  Architected	
  for	
  reliability	
  and	
  performance	
  
                          –  Fewer	
  layers	
  
                          –  Single	
  hop	
  to	
  data	
  
                          –  No	
  compacOons,	
  low	
  i/o	
  amplificaOon	
  
                          –  Seamless	
  splits,	
  automaOc	
  merges	
  
                          –  Instant	
  recovery	
  



©MapR	
  Technologies	
  	
                                        34	
  
Binary	
  CompaAble	
  with	
  HBase	
  APIs	
  
               §       HBase	
  applicaOons	
  work	
  "as	
  is"	
  with	
  M7	
  
                        –  No	
  need	
  to	
  recompile	
  (binary	
  compaOble)	
  


               §       Can	
  run	
  M7	
  and	
  HBase	
  side-­‐by-­‐side	
  on	
  the	
  same	
  cluster	
  
                        –  e.g.,	
  during	
  a	
  migraOon	
  
                        –  can	
  access	
  both	
  M7	
  table	
  and	
  HBase	
  table	
  in	
  same	
  program	
  
                                	
  
               §       Use	
  standard	
  Apache	
  HBase	
  CopyTable	
  tool	
  to	
  copy	
  a	
  table	
  
                        from	
  HBase	
  to	
  M7	
  or	
  vice-­‐versa	
  
                        	
  
                        %	
  hbase	
  org.apache.hadoop.hbase.mapreduce.CopyTable	
  	
  
                        	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐new.name=/user/srivas/mytable	
  oldtable	
  




©MapR	
  Technologies	
  	
                                                   35	
  
M7:	
  	
  Remove	
  Layers,	
  Simplify	
  

                                                                     Take	
  note!	
  No	
  JVM!	
  




                                                                         MapR	
  	
  	
  M7	
  




©MapR	
  Technologies	
  	
                          36	
  
M7:	
  	
  No	
  Master	
  and	
  No	
  RegionServers	
  



                                                                             No	
  JVM	
  problems	
  




     One	
  hop	
  to	
  data	
                                                      Unified	
  cache	
  



                                No	
  extra	
  daemons	
  to	
  manage	
  

©MapR	
  Technologies	
  	
                                    37	
  
Region	
  Assignment	
  in	
  Apache	
  HBase	
  
   None	
  of	
  this	
  complexity	
  is	
  present	
  in	
  MapR	
  M7	
  




©MapR	
  Technologies	
  	
                                        38	
  
Unified	
  Namespace	
  for	
  Files	
  and	
  Tables	
  
   $	
  pwd	
  
   /mapr/default/user/dave	
  
   	
  
   $	
  ls	
  
   file1	
  	
  file2	
  	
  table1	
  	
  table2	
  
   	
  
   $	
  hbase	
  shell	
  
   hbase(main):003:0>	
  create	
  '/user/dave/table3',	
  'cf1',	
  'cf2',	
  'cf3'	
  
   0	
  row(s)	
  in	
  0.1570	
  seconds	
  
   	
  
   $	
  ls	
  
   file1	
  	
  file2	
  	
  table1	
  	
  table2	
  	
  table3	
  
   	
  
   $	
  hadoop	
  fs	
  -­‐ls	
  /user/dave	
  
   Found	
  5	
  items	
  
   -­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  16	
  2012-­‐09-­‐28	
  08:34	
  /user/dave/file1	
  
   -­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  22	
  2012-­‐09-­‐28	
  08:34	
  /user/dave/file2	
  
   trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:32	
  /user/dave/table1	
  
   trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:33	
  /user/dave/table2	
  
   trwxr-­‐xr-­‐x	
  	
  	
  3	
  mapr	
  mapr	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  2012-­‐09-­‐28	
  08:38	
  /user/dave/table3	
  

©MapR	
  Technologies	
  	
                                              39	
  
Tables	
  for	
  End	
  Users	
  
                 §       Users	
  can	
  create	
  and	
  manage	
  their	
  own	
  tables	
  
                          –  Unlimited	
  #	
  of	
  tables	
  

                 	
  
                 §       Tables	
  can	
  be	
  created	
  in	
  any	
  directory	
  
                          –  Tables	
  count	
  towards	
  volume	
  and	
  user	
  quotas	
  



                 §       No	
  admin	
  intervenOon	
  needed	
  
                          –  I	
  can	
  create	
  a	
  file	
  or	
  a	
  directory	
  without	
  opening	
  a	
  Ocket	
  with	
  
                             admin	
  team,	
  why	
  not	
  a	
  table?	
  
                          –  Do	
  stuff	
  on	
  the	
  fly,	
  	
  no	
  stop/restart	
  servers	
  


                 §       AutomaOc	
  data	
  protecOon	
  and	
  disaster	
  recovery	
  
                          –  Users	
  can	
  recover	
  from	
  snapshots/mirrors	
  on	
  their	
  own	
  

©MapR	
  Technologies	
  	
                                                   40	
  
M7	
  –	
  An	
  Integrated	
  System	
  




©MapR	
  Technologies	
  	
                        41	
  
M7	
  
                                        ComparaOve	
  Analysis	
  with	
  
                                	
  Apache	
  HBase,	
  Level-­‐DB	
  and	
  a	
  BTree	
  




©MapR	
  Technologies	
  	
                                 42	
  
HBase	
  Write	
  AmplificaAon	
  Analysis	
  

           §       Assume	
  10G	
  per	
  region,	
  write	
  10%	
  per	
  day,	
  grow	
  10%	
  per	
  week	
  
                    –  1G	
  of	
  writes	
  
                    –  a€er	
  7	
  days,	
  7	
  files	
  of	
  1G	
  and	
  1file	
  of	
  10G	
  (only	
  1G	
  is	
  growth)	
  

           §       IO	
  Cost	
  
                    –  Wrote	
  7G	
  to	
  WAL	
  +	
  7G	
  to	
  HFiles	
  
                    –  CompacOon	
  adds	
  sOll	
  more	
  
                            •    read:	
  17G	
  	
  (=	
  7	
  x	
  1G	
  	
  +	
  1	
  x	
  10G)	
  
                            •    write:	
  	
  11G	
  write	
  to	
  new	
  Hfile	
  
                    –  WAF	
  –	
  wrote	
  7G	
  “for	
  real”	
  but	
  actual	
  disk	
  IO	
  a€er	
  compacOon	
  is	
  read	
  
                           17G	
  +	
  write	
  25G	
  and	
  that’s	
  assuming	
  no	
  applicaOon	
  reads!	
  
           §       IO	
  Cost	
  of	
  1000	
  regions	
  similar	
  to	
  above	
  
                    –  read	
  17T,	
  	
  write	
  25T	
  	
  è	
  major	
  impact	
  on	
  node	
  

           §       Best	
  pracOce,	
  limit	
  #	
  of	
  regions/node	
  à	
  can’t	
  fully	
  uOlize	
  
                    storage	
  
©MapR	
  Technologies	
  	
                                                                              43	
  
AlternaAve:	
  Level-­‐DB	
  
              §       Tiered,	
  logarithmic	
  increase	
  
                       –  L1:	
  2	
  x	
  1M	
  	
  files	
  
                       –  L2:	
  	
  10	
  x	
  1M	
  
                       –  L3:	
  	
  100	
  x	
  1M	
  
                       –  L4:	
  	
  	
  1,000	
  x	
  1M,	
  etc	
  


              §       CompacOon	
  overhead	
  
                       –  avoids	
  IO	
  storms	
  	
  (i/o	
  done	
  in	
  smaller	
  increments	
  of	
  	
  ~10M)	
  
                       –  but	
  significantly	
  extra	
  bandwidth	
  compared	
  to	
  HBase	
  


              §       Read	
  overhead	
  is	
  sOll	
  high	
  
                       –  10-­‐15	
  seeks,	
  perhaps	
  more	
  if	
  the	
  lowest	
  level	
  is	
  very	
  large	
  
                       –  40K	
  -­‐	
  60K	
  	
  read	
  from	
  disk	
  to	
  retrieve	
  a	
  1K	
  record	
  



©MapR	
  Technologies	
  	
                                                       44	
  
BTree	
  analysis	
  
            §       Read	
  finds	
  data	
  directly,	
  proven	
  to	
  be	
  fastest	
  
                      –  interior	
  nodes	
  only	
  hold	
  keys	
  
                      –  very	
  large	
  branching	
  factor	
  
                      –  values	
  only	
  at	
  leaves	
  
                      –  thus	
  index	
  caches	
  work	
  
                      –  R	
  =	
  logN	
  seeks,	
  if	
  no	
  caching	
  
                      –  1K	
  record	
  read	
  will	
  transfer	
  about	
  logN	
  blocks	
  from	
  disk	
  


            §       Writes	
  are	
  slow	
  on	
  inserts	
  
                      –  inserted	
  into	
  correct	
  place	
  right	
  away	
  
                      –  otherwise	
  read	
  will	
  not	
  find	
  it	
  
                      –  requires	
  btree	
  to	
  be	
  conOnuously	
  rebalanced	
  
                      –  causes	
  extreme	
  random	
  i/o	
  in	
  insert	
  path	
  
                      –  W	
  =	
  2.5x	
  +	
  logN	
  seeks	
  if	
  no	
  caching	
  



©MapR	
  Technologies	
  	
                                                     45	
  
Log-­‐Structured	
  Merge	
  Trees	
  
             §       LSM	
  Trees	
  reduce	
  insert	
  cost	
  by	
  deferring	
  and	
  batching	
  index	
  changes	
  
                       –  If	
  don't	
  compact	
  o€en,	
  read	
  perf	
  is	
  impacted	
  
                       –  If	
  compact	
  too	
  o€en,	
  write	
  perf	
  is	
  impacted	
  
                                	
  
             §       B-­‐Trees	
  are	
  great	
  for	
  reads	
  
                       –  but	
  expensive	
  to	
  update	
  in	
  real-­‐Ome	
  
                                	
  

                                               Can	
  we	
  combine	
  both	
  ideas?	
  
                 	
  
                 Writes	
  cannot	
  be	
  done	
  befer	
  than	
  W	
  =	
  2.5x	
  
                      write	
  to	
  log	
  	
  +	
  	
  write	
  data	
  to	
  somewhere	
  	
  +	
  	
  update	
  meta-­‐data	
  
                 	
  
                                                                     Memory                                 Disk

                                                                        Index                                Log
                                         Write

                                         Read                                                               Index




©MapR	
  Technologies	
  	
                                                       46	
  
M7	
  from	
  MapR	
  
                  §       TwisOng	
  BTree's	
  
                            –  leaves	
  are	
  variable	
  size	
  (8K	
  -­‐	
  8M	
  or	
  larger)	
  
                            –  can	
  stay	
  unbalanced	
  for	
  long	
  periods	
  of	
  Ome	
  
                                •    more	
  inserts	
  will	
  balance	
  it	
  eventually	
  
                                •    automaOcally	
  throfles	
  updates	
  to	
  interior	
  btree	
  nodes	
  
                            –  M7	
  inserts	
  "close	
  to"	
  where	
  the	
  data	
  is	
  supposed	
  to	
  go	
  


                  §       Reads	
  
                            –  Uses	
  BTree	
  structure	
  to	
  get	
  "close"	
  very	
  fast	
  
                                •    very	
  high	
  branching	
  with	
  key-­‐prefix-­‐compression	
  
                            –  UOlizes	
  a	
  separate	
  lower-­‐level	
  index	
  to	
  find	
  it	
  exactly	
  
                                •    updated	
  "in-­‐place”	
  bloom-­‐filters	
  for	
  gets,	
  range-­‐maps	
  for	
  scans	
  
                                     	
  

                  §       Overhead	
  
                            –  1K	
  record	
  read	
  will	
  transfer	
  about	
  32K	
  from	
  disk	
  in	
  logN	
  seeks	
  

©MapR	
  Technologies	
  	
                                                       47	
  
M7	
  	
  provides	
  Instant	
  Recovery	
  
       §       Instead	
  of	
  having	
  one	
  WAL/region	
  server	
  or	
  even	
  one/region,	
  
                we	
  have	
  many	
  micro-­‐WALs/region	
  
       §       0-­‐40	
  microWALs	
  per	
  region	
  
                –  idle	
  WALs	
  “compacted”,	
  so	
  most	
  are	
  empty	
  
                –  region	
  is	
  up	
  before	
  all	
  microWALs	
  are	
  recovered	
  
                –  recovers	
  region	
  in	
  background	
  in	
  parallel	
  
                –  when	
  a	
  key	
  is	
  accessed,	
  that	
  microWAL	
  is	
  recovered	
  inline	
  
                –  1000-­‐10000x	
  faster	
  recovery	
  

       §       Never	
  perform	
  equivalent	
  of	
  HBase	
  major	
  or	
  minor	
  
                compacOon	
  
       §       Why	
  doesn't	
  HBase	
  do	
  this?	
  M7	
  uses	
  MapR-­‐FS,	
  not	
  HDFS	
  
                –  No	
  limit	
  to	
  #	
  of	
  files	
  on	
  disk	
  
                –  No	
  limit	
  to	
  #	
  open	
  files	
  
                –  I/O	
  path	
  translates	
  random	
  writes	
  to	
  sequenOal	
  writes	
  on	
  disk	
  


©MapR	
  Technologies	
  	
                                                 48	
  
Summary	
  
                                     1K	
  record	
  -­‐read	
               CompacAon	
                  Recovery	
  
                                     amplificaAon	
  

  HBase	
  with	
  7	
  hfiles	
      30	
  seeks	
                 IO	
  Storms	
              Huge	
  WAL	
  to	
  recover	
  
                                     130K	
  xfer	
                good	
  bandwidth	
  
                                                                   	
  
  HBase	
  with	
  3	
  hfiles	
      15	
  seeks,	
                IO	
  Storms	
              Huge	
  WAL	
  to	
  recover	
  
                                     70K	
  xfer	
                 high	
  bandwidth	
  
                                                                   	
  
  LevelDB	
  with	
  5	
  levels	
   13	
  seeks	
                 No	
  i/o	
  storms	
       WAL	
  is	
  Ony	
  
                                     48K	
  xfer	
                 Very	
  high	
  b/w	
  
                                                                   	
  
  BTree	
                            logN	
  seeks	
               No	
  i/o	
  storms	
       WAL	
  is	
  proporOonal	
  to	
  
                                     logN	
  xfer	
                but	
  100%	
  random	
     concurrency	
  +	
  cache	
  

  MapR	
  	
  M7	
                   logN	
  seeks	
               No	
  i/o	
  storms	
       microWALs	
  	
  allow	
  
                                     32K	
  xfer	
                 low	
  bandwidth	
          	
  recovery	
  <	
  100ms	
  

©MapR	
  Technologies	
  	
                                         49	
  
M7:	
  	
  Fileservers	
  Serve	
  Regions	
  

               §       Region	
  lives	
  enOrely	
  inside	
  a	
  container	
  
                        –  Does	
  not	
  coordinate	
  through	
  ZooKeeper	
  
                                	
  
               §       Containers	
  support	
  distributed	
  transacOons	
  
                        –  with	
  replicaOon	
  built-­‐in	
  



               §       Only	
  coordinaOon	
  in	
  the	
  system	
  is	
  for	
  splits	
  
                        –  Between	
  region-­‐map	
  and	
  data-­‐container	
  
                        –  already	
  solved	
  this	
  problem	
  for	
  files	
  and	
  its	
  chunks	
  

               	
  



©MapR	
  Technologies	
  	
                                             50	
  
Agenda	
  

                            Hbase	
  
                            MapR	
  
                            M7	
  
                            Containers	
  	
  




©MapR	
  Technologies	
  	
                      51	
  
 
	
  
	
  

                                What's	
  a	
  MapR	
  container?	
  




©MapR	
  Technologies	
  	
                      52	
  
MapR's	
  Containers	
  
                                 Files/directories	
  are	
  sharded	
  into	
  blocks,	
  
                                 and	
  	
  placed	
  in	
  containers	
  on	
  disks	
  

                                                               l    Each	
  container	
  contains	
  
                                                                      l  Directories	
  &	
  files	
  


                                                                      l  Data	
  blocks	
  


   Containers	
  are	
                                                l  BTrees	
  

   ~32	
  GB	
  segments	
  of	
  
                                                                     100%	
  random	
  writes	
  
   disk,	
  placed	
  on	
  
                                                               l 


   nodes	
  

                                                                                Patent	
  Pending	
  


©MapR	
  Technologies	
  	
                          53	
  
M7	
  Containers	
  

             §       Container	
  holds	
  many	
  files	
  
                       –  regular,	
  dir,	
  symlink,	
  btree,	
  chunk-­‐map,	
  region-­‐map,	
  …	
  
                       –  all	
  random-­‐write	
  capable	
  


             §       Container	
  is	
  replicated	
  to	
  servers	
  
                       –  unit	
  of	
  resynchronizaOon	
  



             §       Region	
  lives	
  enOrely	
  inside	
  1	
  container	
  
                       –  all	
  files	
  +	
  WALs	
  +	
  btree's	
  +	
  bloom-­‐filters	
  +	
  range-­‐maps	
  



©MapR	
  Technologies	
  	
                                       54	
  
Read-­‐write	
  ReplicaAon	
  
           §        Write	
  are	
  synchronous	
                                                    client2	
  
                     –  All	
  copies	
  have	
  same	
  data	
                         client1	
  
                                                                                                           clientN	
  
                            	
  
           §       Data	
  is	
  replicated	
  in	
  a	
  "chain"	
  
                    fashion	
  
                     –  befer	
  bandwidth,	
  uOlizes	
  full-­‐duplex	
  
                            network	
  links	
  well	
  


           §       Meta-­‐data	
  is	
  replicated	
  in	
  a	
  "star"	
  
                    manner	
  
                     –  response	
  Ome	
  befer,	
  bandwidth	
  not	
  
                        of	
  concern	
  
                     –  data	
  can	
  also	
  be	
  done	
  this	
  way	
  	
  



©MapR	
  Technologies	
  	
                                                    55	
  
                                                                                                                    55	
  
Random	
  WriAng	
  in	
  MapR	
  
                                                                S1
                                        Ask	
  for	
  
                     Client	
           64M	
  block	
  
                     wriAng	
                                                                                 CLDB	
  
                                                                                 Create	
  cont.	
  
                      data	
                                                                                             S1, S2, S4
                                                           afach	
                                                       S1, S3
                    Write	
                                                                                              S1, S4, S5
                    next	
  chunk	
            S2
                                                                          Picks	
  master	
                              S2, S4, S5
                    to	
  S2	
  
                                                                          and	
  2	
  replica	
  slaves	
                S3
                                                                                                                         S2, S3, S5




                                                                                                   S4                         S5
                                                           S3



©MapR	
  Technologies	
  	
                                            56	
  
Container	
  Balancing	
  
                  •  Servers	
  keep	
  a	
  bunch	
  of	
  containers	
  "ready	
  to	
  go".	
  
                  •  Writes	
  get	
  distributed	
  around	
  the	
  cluster.	
  

                                                            l              As	
  data	
  size	
  increases,	
  writes	
  
                                                                            spread	
  more,	
  like	
  dropping	
  a	
  
                                                                            pebble	
  in	
  a	
  pond	
  
                                                                            	
  
                                                            l              Larger	
  pebbles	
  spread	
  the	
  
                                                                            ripples	
  farther	
  
                                                                            	
  
                                                            l              Space	
  balanced	
  by	
  moving	
  idle	
  
                                                                            containers	
  
                                                            	
  
                                                            	
  



©MapR	
  Technologies	
  	
                                        57	
  
Failure	
  Handling	
  
                            Containers	
  managed	
  at	
  CLDB	
  (HB,	
  container-­‐reports).	
  

                                                                               l    HB	
  loss	
  	
  +	
  	
  upstream	
  
                                                                                     enOty	
  reports	
  failure	
  
                                                                                     	
  	
  	
  	
  =>	
  server	
  dead	
  
                                                                                     	
  
                                                                               l    Incr	
  epoch	
  at	
  CLDB	
  
                                                                               l    Rearrange	
  repl	
  path	
  
                                                                               l    Exact	
  same	
  code	
  for	
  files	
  
                           Container	
  LocaOon	
  DataBase	
  	
  
                                                                                     and	
  M7	
  tables	
  
                                         (CLDB)	
                              l    No	
  ZK	
  



©MapR	
  Technologies	
  	
                                           58	
  
Architectural	
  Params	
  
                                                                                                           HDFS	
  'block'	
  

§             Unit	
  of	
  I/O	
  
        –           4K/8K	
  	
  (8K	
  in	
  MapR)	
  
                                                                                       10^3	
                  10^6	
                10^9	
  
                                                                i/o	
  
                                                                                       map-­‐red	
             resync	
              admin	
  

§         Unit	
  of	
  Chunking	
  	
  (a	
  map-­‐reduce	
  
           split)	
                                                         §    Unit	
  of	
  AdministraOon	
  	
  (snap,	
  
        –               10-­‐100's	
  of	
  megabytes	
                           repl,	
  mirror,	
  quota,	
  backup)	
  
                                                                                  –  1	
  gigabyte	
  -­‐	
  1000's	
  of	
  terabytes	
  
§         Unit	
  of	
  Resync	
  	
  	
  (a	
  replica)	
                       –  volume	
  in	
  MapR	
  

        –               10-­‐100's	
  of	
  gigabytes	
                           –  what	
  data	
  is	
  affected	
  by	
  my	
  
                                                                                     missing	
  blocks?	
  
        –               container	
  in	
  MapR	
  
                                                                                     	
  
        	
  


      ©MapR	
  Technologies	
  	
                                  59	
  
Other	
  M7	
  Features	
  

                                §     Smaller	
  disk	
  footprint	
  
                                       –  M7	
  never	
  repeats	
  the	
  key	
  or	
  column	
  name	
  

                                	
  
                                §     Columnar	
  layout	
  
                                       –  M7	
  supports	
  64	
  column	
  families	
  
                                       –  in-­‐memory	
  column-­‐families	
  



                                §     Online	
  admin	
  
                                       –  M7	
  schema	
  changes	
  on	
  the	
  fly	
  
                                       –  delete/rename/redistribute	
  tables	
  	
  
                                       	
  
©MapR	
  Technologies	
  	
                                               60	
  
Thank	
  you!	
  
                                    	
  
                                QuesAons?	
  


©MapR	
  Technologies	
  	
            61	
  
Examples:	
  Reliability	
  Issues	
  
          §       CompacAons	
  disrupt	
  HBase	
  operaAons:	
  	
  I/O	
  bursts	
  overwhelm	
  
                   nodes	
  (hfp://hbase.apache.org/book.html#compacOon)	
  


          §       Very	
  slow	
  crash	
  recovery:	
  RegionServer	
  crash	
  can	
  cause	
  data	
  to	
  be	
  
                   unavailable	
  for	
  up	
  to	
  30	
  minutes	
  while	
  WALs	
  are	
  replayed	
  for	
  
                   impacted	
  regions.	
  (HBASE-­‐1111)	
  

          §       Unreliable	
  splibng:	
  Region	
  spliwng	
  may	
  cause	
  data	
  to	
  be	
  
                   inconsistent	
  and	
  unavailable.	
  (
                   hfp://chilinglam.blogspot.com/2011/12/my-­‐experience-­‐with-­‐
                   hbase-­‐dynamic.html)	
  


          §       No	
  client	
  throcling:	
  HBase	
  client	
  can	
  easily	
  overwhelm	
  
                   RegionServers	
  and	
  cause	
  downOme.	
  (HBASE-­‐5161,	
  HBASE-­‐5162)	
  


©MapR	
  Technologies	
  	
                                       62	
  
Examples:	
  Business	
  ConAnuity	
  Issues	
  
                §       No	
  Snapshots:	
  MapR	
  provides	
  all-­‐or-­‐nothing	
  snapshots	
  for	
  HBase.	
  
                         The	
  WALs	
  are	
  shared	
  among	
  tables	
  so	
  single-­‐table	
  and	
  selecOve	
  
                         mulO-­‐table	
  snapshots	
  are	
  not	
  possible.	
  (HDFS-­‐2802,	
  HDFS-­‐3370,	
  
                         HBASE-­‐50,	
  HBASE-­‐6055)	
  


                §       Complex	
  Backup	
  Process:	
  	
  complex,	
  unreliable	
  and	
  inefficient.	
  
                         (
                         hfp://bruteforcedata.blogspot.com/2012/08/hbase-­‐disaster-­‐
                         recovery-­‐and-­‐whisky.html)	
  


                §       AdministraAon	
  Requires	
  DownAme:	
  The	
  enOre	
  cluster	
  must	
  
                         be	
  taken	
  down	
  in	
  order	
  to	
  merge	
  regions.	
  Tables	
  must	
  be	
  disabled	
  to	
  
                         change	
  schema,	
  replicaOon	
  and	
  other	
  properOes.	
  (HBASE-­‐420,	
  
                         HBASE-­‐1621,	
  HBASE-­‐5504,	
  HBASE-­‐5335,	
  HBASE-­‐3909)	
  



©MapR	
  Technologies	
  	
                                               63	
  
Examples:	
  Performance	
  Issues	
  
               §       Limited	
  support	
  for	
  mulAple	
  column	
  families:	
  HBase	
  has	
  
                        issues	
  handling	
  mulOple	
  column	
  family	
  due	
  to	
  compacOons.	
  The	
  standard	
  
                        HBase	
  documentaOon	
  recommends	
  no	
  more	
  than	
  2-­‐3	
  column	
  families.	
  
                        (HBASE-­‐3149)	
  


               §       Limited	
  data	
  locality:	
  HBase	
  does	
  not	
  take	
  into	
  account	
  block	
  
                        locaOons	
  when	
  assigning	
  regions.	
  A€er	
  a	
  reboot,	
  RegionServers	
  are	
  o€en	
  
                        reading	
  data	
  over	
  the	
  network	
  rather	
  than	
  the	
  local	
  drives.	
  (HBASE-­‐4755,	
  
                        HBASE-­‐4491)	
  


               §       Cannot	
  uAlize	
  disk	
  space:	
  HBase	
  RegionServers	
  struggle	
  with	
  more	
  
                        than	
  50-­‐150	
  regions	
  per	
  RegionServer	
  so	
  a	
  commodity	
  server	
  can	
  only	
  handle	
  
                        about	
  1TB	
  of	
  HBase	
  data,	
  wasOng	
  disk	
  space.	
  (
                        hfp://hbase.apache.org/book/important_configuraOons.html,	
  
                        hfp://www.cloudera.com/blog/2011/04/hbase-­‐dos-­‐and-­‐donts/)	
  


               §       Limited	
  #	
  of	
  tables:	
  A	
  single	
  cluster	
  can	
  only	
  handle	
  several	
  tens	
  of	
  
                        tables	
  effecOvely.	
  (
                        hfp://hbase.apache.org/book/important_configuraOons.html)	
  

©MapR	
  Technologies	
  	
                                                  64	
  
Examples:	
  Manageability	
  Issues	
  
         §       Manual	
  major	
  compacAons:	
  HBase	
  major	
  compacOons	
  are	
  disrupOve	
  
                  so	
  producOon	
  clusters	
  keep	
  them	
  disabled	
  and	
  rely	
  on	
  the	
  administrator	
  to	
  
                  manually	
  trigger	
  compacOons.	
  (
                  hfp://hbase.apache.org/book.html#compacOon)	
  
                  	
  
         §       Manual	
  splibng:	
  HBase	
  auto-­‐spliwng	
  does	
  not	
  work	
  properly	
  in	
  a	
  busy	
  
                  cluster	
  so	
  users	
  must	
  pre-­‐split	
  a	
  table	
  based	
  on	
  their	
  esOmate	
  of	
  data	
  size/
                  growth.	
  (
                  hfp://chilinglam.blogspot.com/2011/12/my-­‐experience-­‐with-­‐hbase-­‐
                  dynamic.html)	
  
         §       Manual	
  merging:	
  HBase	
  does	
  not	
  automaOcally	
  merge	
  regions	
  that	
  are	
  
                  too	
  small.	
  The	
  administrator	
  must	
  take	
  down	
  the	
  cluster	
  and	
  trigger	
  the	
  
                  merges	
  manually.	
  
                  	
  
         §       Basic	
  administraAon	
  is	
  complex:	
  Renaming	
  a	
  table	
  requires	
  copying	
  
                  all	
  the	
  data.	
  Backing	
  up	
  a	
  cluster	
  is	
  a	
  complex	
  process.	
  (HBASE-­‐643)	
  	
  
         	
  
©MapR	
  Technologies	
  	
                                                 65	
  

Contenu connexe

En vedette

микрогранты презентация микрогранты 18.05.2016
микрогранты презентация микрогранты 18.05.2016микрогранты презентация микрогранты 18.05.2016
микрогранты презентация микрогранты 18.05.2016The Skolkovo Foundation
 
2015 09-24 приложение к презентации-new
2015 09-24 приложение к презентации-new2015 09-24 приложение к презентации-new
2015 09-24 приложение к презентации-newThe Skolkovo Foundation
 
3паблик арт проект значения 2
3паблик арт  проект значения 23паблик арт  проект значения 2
3паблик арт проект значения 2The Skolkovo Foundation
 
2015 07 16_проект новая земля2
2015 07 16_проект новая земля22015 07 16_проект новая земля2
2015 07 16_проект новая земля2The Skolkovo Foundation
 
микрогранты презентация для участников Vo feb
микрогранты презентация для участников Vo febмикрогранты презентация для участников Vo feb
микрогранты презентация для участников Vo febThe Skolkovo Foundation
 
The future of mobile apps
The future of mobile appsThe future of mobile apps
The future of mobile appsMonika Mikowska
 
The Physical Interface
The Physical InterfaceThe Physical Interface
The Physical InterfaceJosh Clark
 
Network Effects
Network EffectsNetwork Effects
Network Effectsa16z
 
Guided Reading: Making the Most of It
Guided Reading: Making the Most of ItGuided Reading: Making the Most of It
Guided Reading: Making the Most of ItJennifer Jones
 
Mobile Is Eating the World (2016)
Mobile Is Eating the World (2016)Mobile Is Eating the World (2016)
Mobile Is Eating the World (2016)a16z
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 

En vedette (18)

микрогранты презентация микрогранты 18.05.2016
микрогранты презентация микрогранты 18.05.2016микрогранты презентация микрогранты 18.05.2016
микрогранты презентация микрогранты 18.05.2016
 
ионин от Space 1 0 к space 3 0
ионин от Space 1 0 к space 3 0ионин от Space 1 0 к space 3 0
ионин от Space 1 0 к space 3 0
 
2015 09-24 приложение к презентации-new
2015 09-24 приложение к презентации-new2015 09-24 приложение к презентации-new
2015 09-24 приложение к презентации-new
 
2проект знаки
2проект знаки2проект знаки
2проект знаки
 
3паблик арт проект значения 2
3паблик арт  проект значения 23паблик арт  проект значения 2
3паблик арт проект значения 2
 
2015 07 16_проект новая земля2
2015 07 16_проект новая земля22015 07 16_проект новая земля2
2015 07 16_проект новая земля2
 
Patent School 2015
Patent School 2015Patent School 2015
Patent School 2015
 
297 пр жюри
297 пр жюри297 пр жюри
297 пр жюри
 
микрогранты презентация для участников Vo feb
микрогранты презентация для участников Vo febмикрогранты презентация для участников Vo feb
микрогранты презентация для участников Vo feb
 
2015 07 16_cc_испр.ег1
2015 07 16_cc_испр.ег12015 07 16_cc_испр.ег1
2015 07 16_cc_испр.ег1
 
презентация сайт исп
презентация сайт исппрезентация сайт исп
презентация сайт исп
 
микрогранты сколково
микрогранты сколковомикрогранты сколково
микрогранты сколково
 
The future of mobile apps
The future of mobile appsThe future of mobile apps
The future of mobile apps
 
The Physical Interface
The Physical InterfaceThe Physical Interface
The Physical Interface
 
Network Effects
Network EffectsNetwork Effects
Network Effects
 
Guided Reading: Making the Most of It
Guided Reading: Making the Most of ItGuided Reading: Making the Most of It
Guided Reading: Making the Most of It
 
Mobile Is Eating the World (2016)
Mobile Is Eating the World (2016)Mobile Is Eating the World (2016)
Mobile Is Eating the World (2016)
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 

Similaire à PhillyDB Hbase and MapR M7 - March 2013

Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013jdfiori
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shaintrihug
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Benoit Perroud
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting StrategyReducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting StrategyJaemyung Kim
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overviewbtoddb
 
Using Scala for building DSLs
Using Scala for building DSLsUsing Scala for building DSLs
Using Scala for building DSLsIndicThreads
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory systemchidabdu
 
Distributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring RiakDistributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring Riaksamof76
 

Similaire à PhillyDB Hbase and MapR M7 - March 2013 (20)

Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting StrategyReducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
01 hbase
01 hbase01 hbase
01 hbase
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Using Scala for building DSLs
Using Scala for building DSLsUsing Scala for building DSLs
Using Scala for building DSLs
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory system
 
Distributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring RiakDistributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring Riak
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

PhillyDB Hbase and MapR M7 - March 2013

  • 1. Hbase  and  M7  Technical   Overview   Keys  Botzum   Senior  Principal  Technologist   MapR  Technologies   March  2013   ©MapR  Technologies     1  
  • 2. Agenda   HBase   MapR   M7   Containers     ©MapR  Technologies     2  
  • 3.   HBase   A  sparse,  distributed,  persistent,  indexed,  and   sorted  map   OR   A  NoSQL  database   OR   A  Columnar  data  store     ©MapR  Technologies     3  
  • 4. Key-­‐Value  Store   §  Row  key   –  Binary  sortable  value   §  Row  content  key  (analogous  to  a  column)   –  Column  family  (string)   –  Column  qualifier  (binary)   –  Version/Omestamp  (number)   §  A  row  key,  column  family,  column  qualifier,  and  version  uniquely   idenOfies  a  parOcular  cell   –  A  cell  contains  a  single  binary  value   ©MapR  Technologies     4  
  • 5. A  Row      C0      C1      C2      C3      C4        CN   Row  Key   Value1   Value2   Value3   Value4   ValueN   …   Column   Column   Row  Key   Version   Value1   Family   Qualifier   Column   Column   Row  Key   Version   Value2   Family   Qualifier   … Column   Column   Row  Key   Version   ValueN   Family   Qualifier   ©MapR  Technologies     5  
  • 6. Not  A  TradiAonal  RDBMS   §  Weakly  typed  and  schema-­‐less  (unstructured  or  perhaps  semi-­‐ structured)   –  Almost  everything  is  binary   §  No  constraints   –  You  can  put  any  binary  value  in  any  cell   –  You  can  even  put  incompaOble  types  in  two  different  instances  of  the  same   column  family:column  qualifier   §  Column  (qualifiers)  are  created  implicitly   §  Different  rows  can  have  different  columns   §  No  transacOons/no  ACID   –  Only  unit  of  atomic  operaOon  is  a  single  row   ©MapR  Technologies     6  
  • 7. API   §  APIs  for  querying  (get),  scanning,  and  updaOng  (put)   –  Operate  on  row  key,  column  family,  qualifier,  version,  and  values   –  Can  parOally  specify  and  will  retrieve  union  of  results   •  if  just  specify  row  key,  will  get  all  values  for  it  (with  column  family,  qualifier)   –  By  default  only  largest  version  (most  recent  if  Omestamp)    is  returned   •  Specify  row  key  and  column  family  to  get  will  retrieve  all  values  for  that  row  and   column  family   –  Scanning  is  just  get  over  a  range  of  row  keys   §  Version   –  While  defaults  to  a  Omestamp,  any  integer  is  acceptable   ©MapR  Technologies     7  
  • 8. Columnar   §  Rather  than  storing  table  rows  linearly  on  disk  and  each  row  on   disk  as  a  single  byte  range  with  fixed  size  fields,  store  columns  of   row  separately   –  Very  efficient  storage  for  sparse  data  sets  (NULL  is  free)   –  Compression  works  befer  on  similar  data     –  Fetches  of  only  subsets  of  row  very  efficient  (less  disk  IO)   –  No  fixed  size  on  column  values   –  No  requirement  to  even  define  columns   §  Columns  are  grouped  together  into  column  families   –  Basically  a  file  on  disk   –  A  unit  of  opOmizaOon   –  In  Hbase,  adding  column  is  implicit,  adding  column  family  is  explicit   ©MapR  Technologies     8  
  • 9. HBase  Table  Architecture   §  Tables  are  divided  into  key  ranges  (regions)   §  Regions  are  served  by  nodes  (RegionServers)   §  Columns  are  divided  into  access  groups  (columns  families)   CF1   CF2   CF3   CF4   CF5   R1   R2   R3   R4   ©MapR  Technologies     9  
  • 10. Storage  Model  Highlights   §  Data  is  stored  in  sorted  order   –  A  table  contains  rows   –  A  sequence  of  rows  are  grouped  together  into  a  region   •  A  region  consists  of  various  files  related  to  those  rows  and  is  loaded  into  a  region   server   •  Regions  are  stored  in  HDFS  for  high  availability   –  A  single  region  server  manages  mulOple  regions   •  Region  assignment  can  change  –  load  balancing,  failures,  etc.   §  Clients  connect  to  tables   –  HBase  runOme  transparently  determines  the  region  (based  on  key  ranges)   and  contacts  the  appropriate  region  server   §  At  any  given  Ome  exactly  one  region  server  provides  access  to  a   region   –  Master  region  servers  (with  Zookeeper)  manage  that   ©MapR  Technologies     10  
  • 11. What’s  Great  About  This?   §  Very  scalable   §  Easy  to  add  region  servers   §  Easy  to  move  regions  around   §  Scans  are  efficient   –  Unlike  hashing  based  models   §  Access  via  row  key  is  very  efficient   –  Note:  there  are  no  secondary  indexes   §  No  schema,  can  store  whatever  you  want  when  you  want   §  Strong  consistency   §  Integrated  with  Hadoop   –  Map-­‐reduce  on  HBase  is  straighlorward   –  HDFS/MapR-­‐FS  provides  data  replicaOon   ©MapR  Technologies     11  
  • 12. Data  Storage  Architecture   §  Data  from  a  region  column  family  is  stored  in  an  HFile   –  An  HFile  contains  row  key:column  qualifier:version:value   entries   –  Index  at  the  end  into  the  data  –  64KB  “blocks”  by  default   §  Update   –  New  value  is  wrifen  persistently  to  Write  Ahead  Log  (WAL)   –  Cached  in  memory   –  When  memory  fills,  write  out  new  HFile   §  Read   –  Checks  in  memory,  then  all  of  the  Hfiles   –  Read  data  cached  in  memory   §  Delete   –  Create  a  tombstone  record  (purged  at  major  compacOon)     ©MapR  Technologies     12  
  • 13. Apache  HBase  HFile  Structure   Each  cell  is  an  individual  key  +  value      -­‐  a  row  repeats  the  key  for  each  column   64Kbyte  blocks   Key-­‐value   are  compressed   pairs  are     laid  out  in   increasing   order   An  index  into  the   compressed  blocks  is   created  as  a  btree   ©MapR  Technologies     13  
  • 14. HBase  Region  OperaAon   §  Typical  region  size  is  a  few  GB,  someOmes  even  10G  or  20G   §  RS    holds  data  in  memory  unOl  full,  then  writes  a  new  HFile   –  Logical  view  of  database  constructed  by  layering  these  files,  with  the   latest  on  top     newest   oldest   Key  range  represented  by  this  region   ©MapR  Technologies     14  
  • 15. HBase  Read  AmplificaAon   §  When  a  get/scan  comes  in,  all  the  files  have  to  be  examined   –  schema-­‐less,  so  where  is  the  column?   –  Done  in-­‐memory  and  does  not  change  what's  on  disk   •  Bloom-­‐filters  do  not  help  in  scans   newest   oldest   With  7  files,  a  1K-­‐record  get()  potenOally  takes  about  30  seeks,     7  block  fetches  and  decompressions,  from  HDFS.  Even  with  the  index  in  memory   7  seeks  and  7  block  fetches  are  required.   ©MapR  Technologies     15  
  • 16. HBase  Write  AmplificaAon   §  To  reduce  the  read-­‐amplificaOon,  HBase  merges  the  HFiles   periodically   –  process  called  compacOon   –  runs  automaOcally  when  too  many  files   –  usually  turned  off  due  to  I/O  storms  which  interfere  with  client   access   –  and  kicked-­‐off  manually  on  weekends   Major  compacOon  reads  all  files  and   merges    into  a  single  HFile   ©MapR  Technologies     16  
  • 17.  HBase  Server  Architecture   Zookeeper   HDFS  Server   Coordinates   Lookup   Hbase  Master   Linux   Client   Filesystem   Data   Hbase  Region   Server   HFiles   WAL   ©MapR  Technologies     17  
  • 18. WAL  File   §  A  persistent  record  of  every  update/insert  in  sequence  order   –  Shared  by  all  regions  on  one  region  server   –  WAL  files  periodically  rolled  to  limit  size  but  older  WALs  sOll  needed   –  WAL  file  no  longer  needed  once  every  region  with  updates  in  WAL  file  has   flushed  those  from  memory  to  an  HFile   •  Remember  that  more  HFiles  slow  read  path!   §  Must  be  replayed  as  part  of  recovery  process  since  in  memory   updates  are  “lost”   –  This  is  very  expensive  and  delays  bringing  a  region  back  online   ©MapR  Technologies     18  
  • 19. What’s  Not  So  Good   Reliability   •  Complex  coordinaOon  between  ZK,  HDFS,  HBase   Master,  and  Region  Server  during  region  movement   •  CompacOons  disrupt  operaOons   •  Very  slow  crash  recovery  because  of   •  CoordinaOon  complexity   •  WAL  log  reading  (one  log/server)   Business  conAnuity   •  Many  administraOve  acOons  require  downOme   •  Not  well  integrated  into  MapR-­‐FS  mirroring  and   snapshot  funcOonality   ©MapR  Technologies     19  
  • 20. What’s  Not  So  Good   Performance   •  Very  long  read/write  path   •  Significant  read  and  write  amplificaOon   •  MulOple  JVMs  in  read/write  path  –  GC  delays!   Manageability   •  CompacOons,  splits  and  merges  must  be  done   manually  (in  reality)   •  Lots  of  “well  known”  problems  maintaining  reliable   cluster  –  spliwng,  compacOons,  region  assignment,  etc.   •  PracOcal  limits  on  number  of  regions/region  server  and   size  of  regions  –  can  make  it  hard  to  fully  uOlize   hardware   ©MapR  Technologies     20  
  • 21. Region  Assignment  in  Apache  HBase   ©MapR  Technologies     21  
  • 22. Apache  HBase  on  MapR   Limited  data  management,  data  protecOon  and  disaster  recovery  for  tables.     ©MapR  Technologies     22  
  • 23. Agenda   HBase   MapR   M7   Containers     ©MapR  Technologies     23  
  • 24. MapR   A  provider  of  enterprise  grade  Hadoop  with   uniquely  differenOated  features     ©MapR  Technologies     24  
  • 25. MapR:  The  Enterprise  Grade  DistribuAon   ©MapR  Technologies     25  
  • 26. One  PlaVorm  for  Big  Data   Broad     RecommendaOon  Engines   Fraud  DetecOon   Billing   LogisOcs   range  of   applicaOons   Risk  Modeling   Market  SegmentaOon   Inventory  ForecasOng   …   Batch   InteracOve   Real-­‐Ome   Map   File-­‐Based   SQL   Stream   Database   Search   Reduce   ApplicaOons   Processing   … 99.999%   Data   Disaster   Scalability     Enterprise   MulO-­‐ &   HA   ProtecOon   Recovery   Performance   IntegraOon   tenancy   ©MapR  Technologies     26  
  • 27. Dependable:  Lights  Out  Data  Center  Ready   Reliable  Compute   Dependable  Storage   §  Automated  stateful  failover   §  Business  conOnuity  with     §  Automated  re-­‐replicaOon   snapshots    and  mirrors   §  Recover  to  a  point  in  Ome   §  Self-­‐healing  from  HW     and  SW  failures   §  End-­‐to-­‐end  check  summing     §  Load  balancing   §  Strong  consistency   §  No  lost  jobs  or  data   §  Data  safe   §  99999’s  of  upOme   §  Mirror  across  sites  to  meet   Recovery  Time  ObjecOves   ©MapR  Technologies     27  
  • 28. Fast:  World  Record  Performance   Benchmark   MapR  2.1.1   CDH  4.1.1   MapR  Speed   Increase   Terasort  (1x  replicaOon,  compression  disabled)   Total   13m  35s   26m  6s   2X   Map   7m  58s   21m  8s   3X   Reduce   13m  32s   23m  37s   1.8X   DFSIO  throughput/node   Read   1003  MB/s   656  MB/s   1.5X   MinuteSort  Record   Write   924  MB/s   654  MB/s   1.4X   1.5  TB  in  60  seconds   YCSB  (50%  read,  50%  update)   2103  nodes   Throughput   36,584.4  op/s   12,500.5  op/s   2.9X   RunOme   3.80  hr   11.11  hr   2.9X   YCSB  (95%  read,  5%  update)   Throughput   24,704.3  op/s   10,776.4  op/s   2.3X   RunOme   0.56  hr   1.29  hr   2.3X   Benchmark  hardware  configuraOon:     10  servers,  12  x  2  cores  (2.4  GHz),  12  x  2TB,  48  GB,  1  x  10GbE   ©MapR  Technologies     28  
  • 29. The  Cloud  Leaders  Pick  MapR   Amazon  EMR  is  the  largest   Google  chose  MapR  to   Hadoop  provider  in  revenue   provide  Hadoop  on  Google   and  #  of  clusters   Compute  Engine   ©MapR  Technologies     29  
  • 30. MapR  Supports  Broad  Set  of  Customers   Global  Credit  Card  Issuer   Leading  Retailer   §  RecommendaOon  Engine   §  Customer  Behavior  Analysis   §  Customer  targeOng   §  Fraud  detecOon  and  PrevenOon   §  Brand  Monitoring   §  Viewer  Behavioral  analyOcs   §  Global  threat     analyOcs   §  Intrusion  detecOon  &  prevenOon   §  RecommendaOon  Engine   §  Virus  analysis   §  Forensic  analysis   §  Family  tree  connecOons     §  Clickstream  Analysis   §  PaOent  care   §  Log  analysis   §  Quality  profiling/field   monitoring   §  HBase   failure  analysis   §  Fraud  DetecOon     §  AdverOsing  exchange   §  Monitoring  and  measuring   §  Channel  analyOcs   analysis  and  opOmizaOon   online  behavior     §  Customer  Revenue   §  Enterprise  Grade   AnalyOcs   §  Customer  targeOng   Plalorm   §  ETL  Offload   §  Social  media  analysis   §  COOP  features   ©MapR  Technologies     30  
  • 31. MapR  EdiAons   §  Control  System   §  Control  System   §  All  the  Features  of  M5   §  NFS  Access   §  NFS  Access   §  Simplified   §  Performance   AdministraOon  for   §  Performance   HBase   §  Unlimited  Nodes   §  High  Availability   §  Increased  Performance   §  Free     §  Snapshots  &  Mirroring   §  Consistent  Low  Latency   §  24  X  7  Support   §  Unified  Snapshots,   §  Annual  SubscripOon   Mirroring   Also  Available  through:     Compute  Engine   ©MapR  Technologies     31  
  • 32. Agenda   Hbase   MapR   M7   Containers     ©MapR  Technologies     32  
  • 33. M7   An  integrated  system  for   unstructured  and  structured  data     ©MapR  Technologies     33  
  • 34. Introducing  MapR  M7   §  An  integrated  system   –  Unified  namespace  for  files  and  tables   –  Built-­‐in  data  management  &  protecOon   –  No  extra  administraOon   §  Architected  for  reliability  and  performance   –  Fewer  layers   –  Single  hop  to  data   –  No  compacOons,  low  i/o  amplificaOon   –  Seamless  splits,  automaOc  merges   –  Instant  recovery   ©MapR  Technologies     34  
  • 35. Binary  CompaAble  with  HBase  APIs   §  HBase  applicaOons  work  "as  is"  with  M7   –  No  need  to  recompile  (binary  compaOble)   §  Can  run  M7  and  HBase  side-­‐by-­‐side  on  the  same  cluster   –  e.g.,  during  a  migraOon   –  can  access  both  M7  table  and  HBase  table  in  same  program     §  Use  standard  Apache  HBase  CopyTable  tool  to  copy  a  table   from  HBase  to  M7  or  vice-­‐versa     %  hbase  org.apache.hadoop.hbase.mapreduce.CopyTable                            -­‐-­‐new.name=/user/srivas/mytable  oldtable   ©MapR  Technologies     35  
  • 36. M7:    Remove  Layers,  Simplify   Take  note!  No  JVM!   MapR      M7   ©MapR  Technologies     36  
  • 37. M7:    No  Master  and  No  RegionServers   No  JVM  problems   One  hop  to  data   Unified  cache   No  extra  daemons  to  manage   ©MapR  Technologies     37  
  • 38. Region  Assignment  in  Apache  HBase   None  of  this  complexity  is  present  in  MapR  M7   ©MapR  Technologies     38  
  • 39. Unified  Namespace  for  Files  and  Tables   $  pwd   /mapr/default/user/dave     $  ls   file1    file2    table1    table2     $  hbase  shell   hbase(main):003:0>  create  '/user/dave/table3',  'cf1',  'cf2',  'cf3'   0  row(s)  in  0.1570  seconds     $  ls   file1    file2    table1    table2    table3     $  hadoop  fs  -­‐ls  /user/dave   Found  5  items   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  16  2012-­‐09-­‐28  08:34  /user/dave/file1   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  22  2012-­‐09-­‐28  08:34  /user/dave/file2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:32  /user/dave/table1   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:33  /user/dave/table2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:38  /user/dave/table3   ©MapR  Technologies     39  
  • 40. Tables  for  End  Users   §  Users  can  create  and  manage  their  own  tables   –  Unlimited  #  of  tables     §  Tables  can  be  created  in  any  directory   –  Tables  count  towards  volume  and  user  quotas   §  No  admin  intervenOon  needed   –  I  can  create  a  file  or  a  directory  without  opening  a  Ocket  with   admin  team,  why  not  a  table?   –  Do  stuff  on  the  fly,    no  stop/restart  servers   §  AutomaOc  data  protecOon  and  disaster  recovery   –  Users  can  recover  from  snapshots/mirrors  on  their  own   ©MapR  Technologies     40  
  • 41. M7  –  An  Integrated  System   ©MapR  Technologies     41  
  • 42. M7   ComparaOve  Analysis  with    Apache  HBase,  Level-­‐DB  and  a  BTree   ©MapR  Technologies     42  
  • 43. HBase  Write  AmplificaAon  Analysis   §  Assume  10G  per  region,  write  10%  per  day,  grow  10%  per  week   –  1G  of  writes   –  a€er  7  days,  7  files  of  1G  and  1file  of  10G  (only  1G  is  growth)   §  IO  Cost   –  Wrote  7G  to  WAL  +  7G  to  HFiles   –  CompacOon  adds  sOll  more   •  read:  17G    (=  7  x  1G    +  1  x  10G)   •  write:    11G  write  to  new  Hfile   –  WAF  –  wrote  7G  “for  real”  but  actual  disk  IO  a€er  compacOon  is  read   17G  +  write  25G  and  that’s  assuming  no  applicaOon  reads!   §  IO  Cost  of  1000  regions  similar  to  above   –  read  17T,    write  25T    è  major  impact  on  node   §  Best  pracOce,  limit  #  of  regions/node  à  can’t  fully  uOlize   storage   ©MapR  Technologies     43  
  • 44. AlternaAve:  Level-­‐DB   §  Tiered,  logarithmic  increase   –  L1:  2  x  1M    files   –  L2:    10  x  1M   –  L3:    100  x  1M   –  L4:      1,000  x  1M,  etc   §  CompacOon  overhead   –  avoids  IO  storms    (i/o  done  in  smaller  increments  of    ~10M)   –  but  significantly  extra  bandwidth  compared  to  HBase   §  Read  overhead  is  sOll  high   –  10-­‐15  seeks,  perhaps  more  if  the  lowest  level  is  very  large   –  40K  -­‐  60K    read  from  disk  to  retrieve  a  1K  record   ©MapR  Technologies     44  
  • 45. BTree  analysis   §  Read  finds  data  directly,  proven  to  be  fastest   –  interior  nodes  only  hold  keys   –  very  large  branching  factor   –  values  only  at  leaves   –  thus  index  caches  work   –  R  =  logN  seeks,  if  no  caching   –  1K  record  read  will  transfer  about  logN  blocks  from  disk   §  Writes  are  slow  on  inserts   –  inserted  into  correct  place  right  away   –  otherwise  read  will  not  find  it   –  requires  btree  to  be  conOnuously  rebalanced   –  causes  extreme  random  i/o  in  insert  path   –  W  =  2.5x  +  logN  seeks  if  no  caching   ©MapR  Technologies     45  
  • 46. Log-­‐Structured  Merge  Trees   §  LSM  Trees  reduce  insert  cost  by  deferring  and  batching  index  changes   –  If  don't  compact  o€en,  read  perf  is  impacted   –  If  compact  too  o€en,  write  perf  is  impacted     §  B-­‐Trees  are  great  for  reads   –  but  expensive  to  update  in  real-­‐Ome     Can  we  combine  both  ideas?     Writes  cannot  be  done  befer  than  W  =  2.5x   write  to  log    +    write  data  to  somewhere    +    update  meta-­‐data     Memory Disk Index Log Write Read Index ©MapR  Technologies     46  
  • 47. M7  from  MapR   §  TwisOng  BTree's   –  leaves  are  variable  size  (8K  -­‐  8M  or  larger)   –  can  stay  unbalanced  for  long  periods  of  Ome   •  more  inserts  will  balance  it  eventually   •  automaOcally  throfles  updates  to  interior  btree  nodes   –  M7  inserts  "close  to"  where  the  data  is  supposed  to  go   §  Reads   –  Uses  BTree  structure  to  get  "close"  very  fast   •  very  high  branching  with  key-­‐prefix-­‐compression   –  UOlizes  a  separate  lower-­‐level  index  to  find  it  exactly   •  updated  "in-­‐place”  bloom-­‐filters  for  gets,  range-­‐maps  for  scans     §  Overhead   –  1K  record  read  will  transfer  about  32K  from  disk  in  logN  seeks   ©MapR  Technologies     47  
  • 48. M7    provides  Instant  Recovery   §  Instead  of  having  one  WAL/region  server  or  even  one/region,   we  have  many  micro-­‐WALs/region   §  0-­‐40  microWALs  per  region   –  idle  WALs  “compacted”,  so  most  are  empty   –  region  is  up  before  all  microWALs  are  recovered   –  recovers  region  in  background  in  parallel   –  when  a  key  is  accessed,  that  microWAL  is  recovered  inline   –  1000-­‐10000x  faster  recovery   §  Never  perform  equivalent  of  HBase  major  or  minor   compacOon   §  Why  doesn't  HBase  do  this?  M7  uses  MapR-­‐FS,  not  HDFS   –  No  limit  to  #  of  files  on  disk   –  No  limit  to  #  open  files   –  I/O  path  translates  random  writes  to  sequenOal  writes  on  disk   ©MapR  Technologies     48  
  • 49. Summary   1K  record  -­‐read   CompacAon   Recovery   amplificaAon   HBase  with  7  hfiles   30  seeks   IO  Storms   Huge  WAL  to  recover   130K  xfer   good  bandwidth     HBase  with  3  hfiles   15  seeks,   IO  Storms   Huge  WAL  to  recover   70K  xfer   high  bandwidth     LevelDB  with  5  levels   13  seeks   No  i/o  storms   WAL  is  Ony   48K  xfer   Very  high  b/w     BTree   logN  seeks   No  i/o  storms   WAL  is  proporOonal  to   logN  xfer   but  100%  random   concurrency  +  cache   MapR    M7   logN  seeks   No  i/o  storms   microWALs    allow   32K  xfer   low  bandwidth    recovery  <  100ms   ©MapR  Technologies     49  
  • 50. M7:    Fileservers  Serve  Regions   §  Region  lives  enOrely  inside  a  container   –  Does  not  coordinate  through  ZooKeeper     §  Containers  support  distributed  transacOons   –  with  replicaOon  built-­‐in   §  Only  coordinaOon  in  the  system  is  for  splits   –  Between  region-­‐map  and  data-­‐container   –  already  solved  this  problem  for  files  and  its  chunks     ©MapR  Technologies     50  
  • 51. Agenda   Hbase   MapR   M7   Containers     ©MapR  Technologies     51  
  • 52.       What's  a  MapR  container?   ©MapR  Technologies     52  
  • 53. MapR's  Containers   Files/directories  are  sharded  into  blocks,   and    placed  in  containers  on  disks   l  Each  container  contains   l  Directories  &  files   l  Data  blocks   Containers  are   l  BTrees   ~32  GB  segments  of   100%  random  writes   disk,  placed  on   l  nodes   Patent  Pending   ©MapR  Technologies     53  
  • 54. M7  Containers   §  Container  holds  many  files   –  regular,  dir,  symlink,  btree,  chunk-­‐map,  region-­‐map,  …   –  all  random-­‐write  capable   §  Container  is  replicated  to  servers   –  unit  of  resynchronizaOon   §  Region  lives  enOrely  inside  1  container   –  all  files  +  WALs  +  btree's  +  bloom-­‐filters  +  range-­‐maps   ©MapR  Technologies     54  
  • 55. Read-­‐write  ReplicaAon   §  Write  are  synchronous   client2   –  All  copies  have  same  data   client1   clientN     §  Data  is  replicated  in  a  "chain"   fashion   –  befer  bandwidth,  uOlizes  full-­‐duplex   network  links  well   §  Meta-­‐data  is  replicated  in  a  "star"   manner   –  response  Ome  befer,  bandwidth  not   of  concern   –  data  can  also  be  done  this  way     ©MapR  Technologies     55   55  
  • 56. Random  WriAng  in  MapR   S1 Ask  for   Client   64M  block   wriAng   CLDB   Create  cont.   data   S1, S2, S4 afach   S1, S3 Write   S1, S4, S5 next  chunk   S2 Picks  master   S2, S4, S5 to  S2   and  2  replica  slaves   S3 S2, S3, S5 S4 S5 S3 ©MapR  Technologies     56  
  • 57. Container  Balancing   •  Servers  keep  a  bunch  of  containers  "ready  to  go".   •  Writes  get  distributed  around  the  cluster.   l  As  data  size  increases,  writes   spread  more,  like  dropping  a   pebble  in  a  pond     l  Larger  pebbles  spread  the   ripples  farther     l  Space  balanced  by  moving  idle   containers       ©MapR  Technologies     57  
  • 58. Failure  Handling   Containers  managed  at  CLDB  (HB,  container-­‐reports).   l  HB  loss    +    upstream   enOty  reports  failure          =>  server  dead     l  Incr  epoch  at  CLDB   l  Rearrange  repl  path   l  Exact  same  code  for  files   Container  LocaOon  DataBase     and  M7  tables   (CLDB)   l  No  ZK   ©MapR  Technologies     58  
  • 59. Architectural  Params   HDFS  'block'   §  Unit  of  I/O   –  4K/8K    (8K  in  MapR)   10^3   10^6   10^9   i/o   map-­‐red   resync   admin   §  Unit  of  Chunking    (a  map-­‐reduce   split)   §  Unit  of  AdministraOon    (snap,   –  10-­‐100's  of  megabytes   repl,  mirror,  quota,  backup)   –  1  gigabyte  -­‐  1000's  of  terabytes   §  Unit  of  Resync      (a  replica)   –  volume  in  MapR   –  10-­‐100's  of  gigabytes   –  what  data  is  affected  by  my   missing  blocks?   –  container  in  MapR       ©MapR  Technologies     59  
  • 60. Other  M7  Features   §  Smaller  disk  footprint   –  M7  never  repeats  the  key  or  column  name     §  Columnar  layout   –  M7  supports  64  column  families   –  in-­‐memory  column-­‐families   §  Online  admin   –  M7  schema  changes  on  the  fly   –  delete/rename/redistribute  tables       ©MapR  Technologies     60  
  • 61. Thank  you!     QuesAons?   ©MapR  Technologies     61  
  • 62. Examples:  Reliability  Issues   §  CompacAons  disrupt  HBase  operaAons:    I/O  bursts  overwhelm   nodes  (hfp://hbase.apache.org/book.html#compacOon)   §  Very  slow  crash  recovery:  RegionServer  crash  can  cause  data  to  be   unavailable  for  up  to  30  minutes  while  WALs  are  replayed  for   impacted  regions.  (HBASE-­‐1111)   §  Unreliable  splibng:  Region  spliwng  may  cause  data  to  be   inconsistent  and  unavailable.  ( hfp://chilinglam.blogspot.com/2011/12/my-­‐experience-­‐with-­‐ hbase-­‐dynamic.html)   §  No  client  throcling:  HBase  client  can  easily  overwhelm   RegionServers  and  cause  downOme.  (HBASE-­‐5161,  HBASE-­‐5162)   ©MapR  Technologies     62  
  • 63. Examples:  Business  ConAnuity  Issues   §  No  Snapshots:  MapR  provides  all-­‐or-­‐nothing  snapshots  for  HBase.   The  WALs  are  shared  among  tables  so  single-­‐table  and  selecOve   mulO-­‐table  snapshots  are  not  possible.  (HDFS-­‐2802,  HDFS-­‐3370,   HBASE-­‐50,  HBASE-­‐6055)   §  Complex  Backup  Process:    complex,  unreliable  and  inefficient.   ( hfp://bruteforcedata.blogspot.com/2012/08/hbase-­‐disaster-­‐ recovery-­‐and-­‐whisky.html)   §  AdministraAon  Requires  DownAme:  The  enOre  cluster  must   be  taken  down  in  order  to  merge  regions.  Tables  must  be  disabled  to   change  schema,  replicaOon  and  other  properOes.  (HBASE-­‐420,   HBASE-­‐1621,  HBASE-­‐5504,  HBASE-­‐5335,  HBASE-­‐3909)   ©MapR  Technologies     63  
  • 64. Examples:  Performance  Issues   §  Limited  support  for  mulAple  column  families:  HBase  has   issues  handling  mulOple  column  family  due  to  compacOons.  The  standard   HBase  documentaOon  recommends  no  more  than  2-­‐3  column  families.   (HBASE-­‐3149)   §  Limited  data  locality:  HBase  does  not  take  into  account  block   locaOons  when  assigning  regions.  A€er  a  reboot,  RegionServers  are  o€en   reading  data  over  the  network  rather  than  the  local  drives.  (HBASE-­‐4755,   HBASE-­‐4491)   §  Cannot  uAlize  disk  space:  HBase  RegionServers  struggle  with  more   than  50-­‐150  regions  per  RegionServer  so  a  commodity  server  can  only  handle   about  1TB  of  HBase  data,  wasOng  disk  space.  ( hfp://hbase.apache.org/book/important_configuraOons.html,   hfp://www.cloudera.com/blog/2011/04/hbase-­‐dos-­‐and-­‐donts/)   §  Limited  #  of  tables:  A  single  cluster  can  only  handle  several  tens  of   tables  effecOvely.  ( hfp://hbase.apache.org/book/important_configuraOons.html)   ©MapR  Technologies     64  
  • 65. Examples:  Manageability  Issues   §  Manual  major  compacAons:  HBase  major  compacOons  are  disrupOve   so  producOon  clusters  keep  them  disabled  and  rely  on  the  administrator  to   manually  trigger  compacOons.  ( hfp://hbase.apache.org/book.html#compacOon)     §  Manual  splibng:  HBase  auto-­‐spliwng  does  not  work  properly  in  a  busy   cluster  so  users  must  pre-­‐split  a  table  based  on  their  esOmate  of  data  size/ growth.  ( hfp://chilinglam.blogspot.com/2011/12/my-­‐experience-­‐with-­‐hbase-­‐ dynamic.html)   §  Manual  merging:  HBase  does  not  automaOcally  merge  regions  that  are   too  small.  The  administrator  must  take  down  the  cluster  and  trigger  the   merges  manually.     §  Basic  administraAon  is  complex:  Renaming  a  table  requires  copying   all  the  data.  Backing  up  a  cluster  is  a  complex  process.  (HBASE-­‐643)       ©MapR  Technologies     65