SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
HBase and HDFS
    Todd Lipcon
 todd@cloudera.com
  Twitter: @tlipcon
#hbase IRC: tlipcon




  March 10, 2010
Outline

  HDFS Overview

  HDFS meets HBase

  Solving the HDFS-HBase problems
     Small Random Reads
     Single-Client Fault Tolerance
     Durable Record Appends

  Summary
HDFS Overview
What is HDFS?

         Hadoop’s Distributed File System
         Modeled after Google’s GFS
         Scalable, reliable data storage

         All persistent HBase storage is on HDFS
         HDFS reliability and performance are key to
         HBase reliability and performance
HDFS Architecture
HDFS Design Goals
     Store large amounts of data
     Data should be reliable
     Storage and performance should scale with
     number of nodes.

     Primary use: bulk processing with MapReduce
Requirements for MapReduce
     MR Task Outputs
         Large streaming writes of entire files
     MR Task Inputs
         Medium-size partial reads
     Each task usually has 1 reader, 1 writer; 8-16
     tasks per node.
         DataNodes usually servicing few concurrent clients
     MapReduce can restart tasks with ease (they
     are idempotent)
Requirements for HBase
  All of the requirements of MapReduce, plus:
      Constantly append small records to an edit log
      (WAL)
      Small-size random reads
      Many concurrent readers
      Clients cannot restart → single-client fault
      tolerance is necessary.
HDFS Requirements Matrix
                   Requirement MR HBase
                 Scalable storage
          System fault tolerance
          Large streaming writes
           Large streaming reads
             Small random reads    -
     Single client fault tolerance -
        Durable record appends     -
HDFS Requirements Matrix
                   Requirement MR HBase
                 Scalable storage    ©
          System fault tolerance     ©
          Large streaming writes     ©
           Large streaming reads     ©
             Small random reads    - §
     Single client fault tolerance - §
        Durable record appends     - §
Solutions
...turn that frown upside-down
     hard ↔ easy

                   Configuration Tuning
                   HBase-side workarounds
                   HDFS Development/Patching
Small Random Reads
Configuration Tuning

          HBase often has more concurrent clients than
          MapReduce.
          Typical problems:
           xceiverCount 257 exceeds the limit of
           concurrent xcievers 256
                Increase dfs.datanode.max.xcievers → 1024
                (or greater)

           Too many open files

                Edit /etc/security/limits.conf to increase
                nofile → 32768
Small Random Reads
HBase Features
          HBase block cache
                 Avoids the need to hit HDFS for many reads

          Finer grained synchronization in HFile reads
          (HBASE-2180)
                 Allow parallel clients to read data in parallel for
                 higher throughput

          Seek-and-read vs pread API (HBASE-1505)
                 In current HDFS, these have different performance
                 characteristics
Small Random Reads
HDFS Development in Progress
          Client↔DN connection reuse (HDFS-941,
          HDFS-380)
                Eliminates TCP handshake latency
                Avoids restarting TCP Slow-Start algorithm for
                each read

          Multiplexed BlockSender (HDFS-918)
                Reduces number of threads and open files in DN

          Netty DataNode (hack in progress)
                Non-blocking IO may be more efficient for high
                concurrency
Single-Client Fault Tolerance
What exactly do I mean?
          If a MapReduce task fails to write, the MR
          framework will restart the task.
                MR relies on idempotence → task failures are not
                a big deal.
                Thus, fault tolerance of a single client is not as
                important to MR

          If an HBase region fails to write, it cannot
          recreate the data easily
          HBase may access a single file for a day at a
          time → must ride over transient errors
Single-Client Fault Tolerance
HDFS Patches
         HDFS-127 / HDFS-927
               Clients used to give up after N read failures on a
               file, with no regard for time. This patch resets the
               failure count after successful reads.

         HDFS-630
               Fixes block allocation to exclude nodes client
               knows to be bad
               Important for small clusters!
               Backported to 0.20 in CDH2

         Various other write pipeline recovery fixes in
         0.20.2 (HDFS-101, HDFS-793)
Durable Record Appends
What exactly is the infamous sync()/append()?

          Well, it’s really hflush()
          HBase accepts writes into memory (the
          MemStore)
          It also logs them to disk (the HLog / WAL)
          Each write needs to be on disk before claiming
          durability.
          hflush() provides this guarantee (almost)
          Unfortunately, it doesn’t work in Apache
          Hadoop 0.20.x
Durable Record Appends
HBase Workarounds

         HDFS files are durable once closed
         Currently, HBase rolls the edit log periodically
         After a roll, previous edits are safe
Durable Record Appends
HBase Workarounds

         HDFS files are durable once closed
         Currently, HBase rolls the edit log periodically
         After a roll, previous edits are safe

         Not much of a workaround §
               A crash will lose any edits since last roll.
               Rolling constantly results in small files
                     Bad for NN metadata efficiency.
                     Triggers frequent flushes → bad for region server
                     efficiency
Durable Record Appends
HDFS Development
         On Apache trunk: HDFS-265
              New append re-implementation for 0.21/0.22
              Will work great, but essentially a very large set of
              patches
              Not released yet - running unreleased Hadoop is
              “daring”

         In 0.20.x distributions: HDFS-200 patch
              Fixes bugs in old hflush() implementation
              Not quite as efficient as HDFS-265, but good
              enough and simpler
              Dhruba Borthakur from Facebook testing and
              improving
              Cloudera will test and merge this into CDH3
Summary
    HDFS’s original target workload was
    MapReduce, and HBase has different (harder)
    requirements.
    Engineers from the HBase team plus Facebook,
    Cloudera, and Yahoo are working together to
    improve things.
    Cloudera will integrate all necessary HDFS
    patches in CDH3, available for testing soon.
          Contact me if you’d like to help test in April.
todd@cloudera.com
  Twitter: @tlipcon
#hbase IRC: tlipcon

   P.S. we’re hiring!

Contenu connexe

Tendances

Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
Yury Kaliaha
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
Edward Yoon
 
Distributed Caching Essential Lessons (Ts 1402)
Distributed Caching   Essential Lessons (Ts 1402)Distributed Caching   Essential Lessons (Ts 1402)
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
inside-BigData.com
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Tendances (18)

HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Date-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series DataDate-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series Data
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Distributed Caching Essential Lessons (Ts 1402)
Distributed Caching   Essential Lessons (Ts 1402)Distributed Caching   Essential Lessons (Ts 1402)
Distributed Caching Essential Lessons (Ts 1402)
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

En vedette

Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
jbellis
 
Fosdem 2012
Fosdem 2012Fosdem 2012
Fosdem 2012
pcmanus
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 

En vedette (20)

Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebase
 
Fosdem 2012
Fosdem 2012Fosdem 2012
Fosdem 2012
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
python.ppt
python.pptpython.ppt
python.ppt
 
KRDB2010-GoodRelations
KRDB2010-GoodRelationsKRDB2010-GoodRelations
KRDB2010-GoodRelations
 
Hidden Treasures of the Python Standard Library
Hidden Treasures of the Python Standard LibraryHidden Treasures of the Python Standard Library
Hidden Treasures of the Python Standard Library
 
Scalable PHP Applications With Cassandra
Scalable PHP Applications With CassandraScalable PHP Applications With Cassandra
Scalable PHP Applications With Cassandra
 
Cassandra Summit 2010 - Operations & Troubleshooting Intro
Cassandra Summit 2010 - Operations & Troubleshooting IntroCassandra Summit 2010 - Operations & Troubleshooting Intro
Cassandra Summit 2010 - Operations & Troubleshooting Intro
 
Finch.io - Purely Functional REST API with Finagle
Finch.io - Purely Functional REST API with FinagleFinch.io - Purely Functional REST API with Finagle
Finch.io - Purely Functional REST API with Finagle
 
Cassandra - PHP
Cassandra - PHPCassandra - PHP
Cassandra - PHP
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
CQL Under the Hood
CQL Under the HoodCQL Under the Hood
CQL Under the Hood
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Perl-C/C++ Integration with Swig
Perl-C/C++ Integration with SwigPerl-C/C++ Integration with Swig
Perl-C/C++ Integration with Swig
 
Building a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with CassandraBuilding a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with Cassandra
 

Similaire à HBase User Group #9: HBase and HDFS

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
NAVER D2
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
thevijayps
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
yarapavan
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 

Similaire à HBase User Group #9: HBase and HDFS (20)

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
H base
H baseH base
H base
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hbase
HbaseHbase
Hbase
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
HDFS
HDFSHDFS
HDFS
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

HBase User Group #9: HBase and HDFS

  • 1. HBase and HDFS Todd Lipcon todd@cloudera.com Twitter: @tlipcon #hbase IRC: tlipcon March 10, 2010
  • 2. Outline HDFS Overview HDFS meets HBase Solving the HDFS-HBase problems Small Random Reads Single-Client Fault Tolerance Durable Record Appends Summary
  • 3. HDFS Overview What is HDFS? Hadoop’s Distributed File System Modeled after Google’s GFS Scalable, reliable data storage All persistent HBase storage is on HDFS HDFS reliability and performance are key to HBase reliability and performance
  • 5. HDFS Design Goals Store large amounts of data Data should be reliable Storage and performance should scale with number of nodes. Primary use: bulk processing with MapReduce
  • 6. Requirements for MapReduce MR Task Outputs Large streaming writes of entire files MR Task Inputs Medium-size partial reads Each task usually has 1 reader, 1 writer; 8-16 tasks per node. DataNodes usually servicing few concurrent clients MapReduce can restart tasks with ease (they are idempotent)
  • 7. Requirements for HBase All of the requirements of MapReduce, plus: Constantly append small records to an edit log (WAL) Small-size random reads Many concurrent readers Clients cannot restart → single-client fault tolerance is necessary.
  • 8. HDFS Requirements Matrix Requirement MR HBase Scalable storage System fault tolerance Large streaming writes Large streaming reads Small random reads - Single client fault tolerance - Durable record appends -
  • 9. HDFS Requirements Matrix Requirement MR HBase Scalable storage © System fault tolerance © Large streaming writes © Large streaming reads © Small random reads - § Single client fault tolerance - § Durable record appends - §
  • 10. Solutions ...turn that frown upside-down hard ↔ easy Configuration Tuning HBase-side workarounds HDFS Development/Patching
  • 11. Small Random Reads Configuration Tuning HBase often has more concurrent clients than MapReduce. Typical problems: xceiverCount 257 exceeds the limit of concurrent xcievers 256 Increase dfs.datanode.max.xcievers → 1024 (or greater) Too many open files Edit /etc/security/limits.conf to increase nofile → 32768
  • 12. Small Random Reads HBase Features HBase block cache Avoids the need to hit HDFS for many reads Finer grained synchronization in HFile reads (HBASE-2180) Allow parallel clients to read data in parallel for higher throughput Seek-and-read vs pread API (HBASE-1505) In current HDFS, these have different performance characteristics
  • 13. Small Random Reads HDFS Development in Progress Client↔DN connection reuse (HDFS-941, HDFS-380) Eliminates TCP handshake latency Avoids restarting TCP Slow-Start algorithm for each read Multiplexed BlockSender (HDFS-918) Reduces number of threads and open files in DN Netty DataNode (hack in progress) Non-blocking IO may be more efficient for high concurrency
  • 14. Single-Client Fault Tolerance What exactly do I mean? If a MapReduce task fails to write, the MR framework will restart the task. MR relies on idempotence → task failures are not a big deal. Thus, fault tolerance of a single client is not as important to MR If an HBase region fails to write, it cannot recreate the data easily HBase may access a single file for a day at a time → must ride over transient errors
  • 15. Single-Client Fault Tolerance HDFS Patches HDFS-127 / HDFS-927 Clients used to give up after N read failures on a file, with no regard for time. This patch resets the failure count after successful reads. HDFS-630 Fixes block allocation to exclude nodes client knows to be bad Important for small clusters! Backported to 0.20 in CDH2 Various other write pipeline recovery fixes in 0.20.2 (HDFS-101, HDFS-793)
  • 16. Durable Record Appends What exactly is the infamous sync()/append()? Well, it’s really hflush() HBase accepts writes into memory (the MemStore) It also logs them to disk (the HLog / WAL) Each write needs to be on disk before claiming durability. hflush() provides this guarantee (almost) Unfortunately, it doesn’t work in Apache Hadoop 0.20.x
  • 17. Durable Record Appends HBase Workarounds HDFS files are durable once closed Currently, HBase rolls the edit log periodically After a roll, previous edits are safe
  • 18. Durable Record Appends HBase Workarounds HDFS files are durable once closed Currently, HBase rolls the edit log periodically After a roll, previous edits are safe Not much of a workaround § A crash will lose any edits since last roll. Rolling constantly results in small files Bad for NN metadata efficiency. Triggers frequent flushes → bad for region server efficiency
  • 19. Durable Record Appends HDFS Development On Apache trunk: HDFS-265 New append re-implementation for 0.21/0.22 Will work great, but essentially a very large set of patches Not released yet - running unreleased Hadoop is “daring” In 0.20.x distributions: HDFS-200 patch Fixes bugs in old hflush() implementation Not quite as efficient as HDFS-265, but good enough and simpler Dhruba Borthakur from Facebook testing and improving Cloudera will test and merge this into CDH3
  • 20. Summary HDFS’s original target workload was MapReduce, and HBase has different (harder) requirements. Engineers from the HBase team plus Facebook, Cloudera, and Yahoo are working together to improve things. Cloudera will integrate all necessary HDFS patches in CDH3, available for testing soon. Contact me if you’d like to help test in April.
  • 21. todd@cloudera.com Twitter: @tlipcon #hbase IRC: tlipcon P.S. we’re hiring!