SlideShare a Scribd company logo
1 of 43
June	
  13,	
  2012	
  

HBase Consistency and
Performance Improvements
Esteban	
  Gu+errez,	
  Gregory	
  Chanan	
  
{esteban,	
  gchanan}@cloudera.com	
  
HBase Consistency

    •  ACID guarantees within a single row
    •  “Any row returned by the scan will be a
       consistent view (i.e. that version of the
       complete row existed at some point in
       time)”[1]

    [1] http://hbase.apache.org/acid-semantics.html



2
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Consistency Issues

    •  Write Consistency Issues
    •  Read Consistency Issues




3
                    ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

    •  Importing Multiple CFs HFiles
       is not an atomic operation




4
                     ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

•  Importing Multiple CFs HFiles
was not an atomic operation
   is




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                       HRegion.bulkLoadHFile()


                       HFile1:         HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1       fam2:col2                  fam3:col3   fam4:col4

                      val1
     T1   Scan



     T2   Scan        val1               val2

     T3   Scan
                      val1               val2                       val3

     T4   Scan
                      val1               val2                       val3      val4

                                                                                        < HBase 0.90.5


6
                             ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); ç lock.writeLock().lock()!
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


7
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); ç lock.writeLock().unlock()!
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


8
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:         HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3       fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan         val1                  val2                          val3      val4
                                                                                              ≥ HBase 0.90.5


9
                               ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856

 •  Seen only twice in the
    wilderness
 •  Hard to detect if application
    monitoring is not
    implemented


10
                   ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 •  Table size ≈ 50 M records
 •  Large number of CFs
 •  New records are continuously added to
    the table
 •  Concurrent MR Jobs on the same table
 •  Cluster has to meet strict SLAs


11
                 ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1
              …             …                                                …
                            SPLIT_RAW_FILES                                  …
     Map-Reduce Framework
                            Map output records                               500,000




12
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2
              …             …                                                …           …
                            SPLIT_RAW_FILES                                  …           …
     Map-Reduce Framework
                            Map output records                               500,000     499,997




13
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
              …             …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001




14
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1



15
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1
      Scale testing shows between 0.5% to 2% of inconsistent results between runs


16
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy




17
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy
     — “Where is my data?”




18
                    ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found




19
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible




20
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible SLAs!




21
                  ©2011 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t1          row1            val1                               val1




22
                        ©2012 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t2          row1            val2                               val2
     t1          row1            val1                               val1
 •  Reads never have to block
 •  Note this timestamp is not externally visible!
    Internally called “memStoreTs”


23
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

 1.  Write to WAL (per RegionServer)
 2.  Write to In-Memory Sorted Map (MemStore)
     (per Region+ColumnFamily)
 3.  Flush MemStore to disk as HFile when
     MemStore hits configurable
     hbase.hregion.memstore.flush.size




24
                   ©2012 Cloudera, Inc. All Rights Reserved.
Internals / Bug




     Now that we know the internals – what
               could go wrong?




25
                  ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1




26
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1



 And start a scan.




27
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.
 And concurrently put.




28
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.                                                       HFile
 And concurrently put.                                        Row           fam2:col2:

 Which causes a flush.                                        row1          val2
                                                              row1          val1




29
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                 MemStore
     Ts          Row           fam1:col1
     t2          row1          val2
     t1          row1          val1

                  HFile
          Row           fam2:col2:
          row1          val2
          row1          val1




30
                                ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile
             Row           fam2:col2:
             row1          val2
             row1          val1
          But HFile has no timestamp!




31
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile                                                     Inconsistent Result
             Row           fam2:col2:                      Row                    fam1:col1     fam2:col2
             row1          val2                            row1                   val1          val2
             row1          val1
          But HFile has no timestamp!




32
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Solution
 Store the timestamp in the Hfile
          MemStore                                                      HFile
Ts        Row       fam1:col1                       Ts                 Row      fam2:col2:
t2        row1      val2                            t2                 row1     val2
t1        row1      val1                            t1                 row1     val1


                           Correct Result
             Row             fam1:col1                          fam2:col2
             row1            val1                               val2


 Now we have all the information we need


33
                           ©2012 Cloudera, Inc. All Rights Reserved.
Consistency
 •  Only some of the consistency issues in 0.90
    –  e.g. HBASE-5121: MajorCompaction may
       affect scan's correctness
 •  Solution: Upgrade to 0.92 or 0.94




34
                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase 0.94




        “Performance Release”




35
              ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




36
                     ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




37
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047
 •  HDFS stores checksum is separate file
            HFile              Checksum




 •  So each file read actually requires two disk iops
 •  HBase often bottlenecked by random disk ipos




38
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047 Solution
 •  Solution: Store checksum in HFile block
              HFile                                   HFile Block
                                                            Chksum

                                                               Data




 •  On by default (“hbase.regionserver.checksum.verify”)
 •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) –
    default is 16K




39
                         ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




40
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5199
 •  User can specify TTL per column family
 •  If all values in the HFile are expired, delete HFile rather
    than compact




 •  Off by default, turn on via
    ("hbase.store.delete.expired.storefile“)


41
                             ©2012 Cloudera, Inc. All Rights Reserved.
Conclusion
 •  Most consistency issues fixed in 0.92/
    CDH4
 •  Performance improvements in 0.94
 •  0.94 is wire compatible with 0.92, so will
    be in a CDH4 update




42
                   ©2012 Cloudera, Inc. All Rights Reserved.
References
 •  HBase Acid Semantics,
    http://hbase.apache.org/acid-semantics.html
 •  Apache HBase Meetup @ SU, Michael Stack.
    http://files.meetup.com/
    1350427/20120327hbase_meetup.pdf
 •  HBase Internals, Lars Hofhansl.
    http://www.cloudera.com/resource/hbasecon-2012-
    learning-hbase-internals/




43
                      ©2012 Cloudera, Inc. All Rights Reserved.

More Related Content

What's hot

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 

What's hot (20)

Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Hadoop
HadoopHadoop
Hadoop
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 

Viewers also liked

Viewers also liked (20)

Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Apache HBase 0.98
Apache HBase 0.98Apache HBase 0.98
Apache HBase 0.98
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

Similar to HBase Consistency and Performance Improvements

"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
Ryosuke IWANAGA
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Yukinori Suda
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking
Mark Broadbent
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata Platform
Alluxio, Inc.
 

Similar to HBase Consistency and Performance Improvements (20)

"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata Platform
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Steps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesSteps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issues
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning more
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
adilkhan87451
 

Recently uploaded (20)

Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 

HBase Consistency and Performance Improvements

  • 1. June  13,  2012   HBase Consistency and Performance Improvements Esteban  Gu+errez,  Gregory  Chanan   {esteban,  gchanan}@cloudera.com  
  • 2. HBase Consistency •  ACID guarantees within a single row •  “Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time)”[1] [1] http://hbase.apache.org/acid-semantics.html 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. HBase Consistency Issues •  Write Consistency Issues •  Read Consistency Issues 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles is not an atomic operation 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles was not an atomic operation is 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. Write Consistency HBASE-4552 HRegion.bulkLoadHFile() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 val1 T1 Scan T2 Scan val1 val2 T3 Scan val1 val2 val3 T4 Scan val1 val2 val3 val4 < HBase 0.90.5 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ç lock.writeLock().lock()! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ç lock.writeLock().unlock()! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan val1 val2 val3 val4 ≥ HBase 0.90.5 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. Read Consistency HBASE-2856 •  Seen only twice in the wilderness •  Hard to detect if application monitoring is not implemented 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. Read Consistency HBASE-2856 •  Table size ≈ 50 M records •  Large number of CFs •  New records are continuously added to the table •  Concurrent MR Jobs on the same table •  Cluster has to meet strict SLAs 11 ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Read Consistency HBASE-2856 Symptoms Run 1 … … … SPLIT_RAW_FILES … Map-Reduce Framework Map output records 500,000 12 ©2011 Cloudera, Inc. All Rights Reserved.
  • 13. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 … … … … SPLIT_RAW_FILES … … Map-Reduce Framework Map output records 500,000 499,997 13 ©2011 Cloudera, Inc. All Rights Reserved.
  • 14. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 14 ©2011 Cloudera, Inc. All Rights Reserved.
  • 15. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 15 ©2011 Cloudera, Inc. All Rights Reserved.
  • 16. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 Scale testing shows between 0.5% to 2% of inconsistent results between runs 16 ©2011 Cloudera, Inc. All Rights Reserved.
  • 17. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy 17 ©2011 Cloudera, Inc. All Rights Reserved.
  • 18. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy — “Where is my data?” 18 ©2011 Cloudera, Inc. All Rights Reserved.
  • 19. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found 19 ©2011 Cloudera, Inc. All Rights Reserved.
  • 20. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible 20 ©2011 Cloudera, Inc. All Rights Reserved.
  • 21. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible SLAs! 21 ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 •  Reads never have to block •  Note this timestamp is not externally visible! Internally called “memStoreTs” 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. HBase Write Path 1.  Write to WAL (per RegionServer) 2.  Write to In-Memory Sorted Map (MemStore) (per Region+ColumnFamily) 3.  Flush MemStore to disk as HFile when MemStore hits configurable hbase.hregion.memstore.flush.size 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. Internals / Bug Now that we know the internals – what could go wrong? 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 And start a scan. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. And concurrently put. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. HFile And concurrently put. Row fam2:col2: Which causes a flush. row1 val2 row1 val1 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 But HFile has no timestamp! 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Inconsistent Result Row fam2:col2: Row fam1:col1 fam2:col2 row1 val2 row1 val1 val2 row1 val1 But HFile has no timestamp! 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. Solution Store the timestamp in the Hfile MemStore HFile Ts Row fam1:col1 Ts Row fam2:col2: t2 row1 val2 t2 row1 val2 t1 row1 val1 t1 row1 val1 Correct Result Row fam1:col1 fam2:col2 row1 val1 val2 Now we have all the information we need 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Consistency •  Only some of the consistency issues in 0.90 –  e.g. HBASE-5121: MajorCompaction may affect scan's correctness •  Solution: Upgrade to 0.92 or 0.94 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. HBase 0.94 “Performance Release” 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. HBASE-5047 •  HDFS stores checksum is separate file HFile Checksum •  So each file read actually requires two disk iops •  HBase often bottlenecked by random disk ipos 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. HBASE-5047 Solution •  Solution: Store checksum in HFile block HFile HFile Block Chksum Data •  On by default (“hbase.regionserver.checksum.verify”) •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) – default is 16K 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. HBASE-5199 •  User can specify TTL per column family •  If all values in the HFile are expired, delete HFile rather than compact •  Off by default, turn on via ("hbase.store.delete.expired.storefile“) 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Conclusion •  Most consistency issues fixed in 0.92/ CDH4 •  Performance improvements in 0.94 •  0.94 is wire compatible with 0.92, so will be in a CDH4 update 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. References •  HBase Acid Semantics, http://hbase.apache.org/acid-semantics.html •  Apache HBase Meetup @ SU, Michael Stack. http://files.meetup.com/ 1350427/20120327hbase_meetup.pdf •  HBase Internals, Lars Hofhansl. http://www.cloudera.com/resource/hbasecon-2012- learning-hbase-internals/ 43 ©2012 Cloudera, Inc. All Rights Reserved.