SlideShare a Scribd company logo
1 of 54
HBase Tuning
Performance and Correctness
Lars Hofhansl
Principal Architect, Salesforce (10 years!)
HBase, Phoenix Committer, PMC
Apache Incubator PMC
Apache Foundation Member
http://hadoop-hbase.blogspot.com/
Boring Topic
Experiment with Colorful Slides
Agenda
• HDFS
• HBase – Server
• HBase – Client
• Correctness
• Performance
HDFS
hdfs-site.xml
HDFS - Background
• Stores HBase WAL and HFiles
• No sync-to-disk by default
• Datanode writes tmp file, moves it into place
• Old data lost on power outage
HDFS Correctness Settings
• dfs.datanode.synconclose = true
(since Hadoop 1.1)
• mount ext4 with dirsync! Or use XFS
• You must do this!
HDFS Performance Settings
1. Sync behind writes
2. Stale Datanode Detection
3. Short Circuit Reads
4. Miscellaneous Settings
HDFS Sync Behind Writes
• Syncs partial blocks to disk – best effort
(OK, since blocks are immutable)
• Necessary with sync-on-close for performance
• Always enable this
• dfs.datanode.sync.behind.writes = true
(Since Hadoop 1.1)
Stale Datanodes - Background
• Datanodes (DNs) send block reports to the
Namenode (NN)
• After 10min(!) w/o a report, DN is declared dead
• NN will still direct reads and writes to those DNs
• Bad for recovery. Down by 1 DN by definition.
(every 3rd read/write goes to a bad DN)
Stale Datanodes - Detection
Don’t use a DN for read or write when it looks like it is
stale (default off)
• dfs.namenode.avoid.read.stale.datanode = true
• dfs.namenode.avoid.write.stale.datanode = true
• dfs.namenode.stale.datanode.interval = 30000
(default)
HDFS short circuit reads
Read local blocks directly without DN, when
RegionServers and DNs are co-located.
• dfs.client.read.shortcircuit = true
• dfs.client.read.shortcircuit.buffer.size = 131072
(important, OOM on direct buffers, default on 0.98+)
• hbase.regionserver.checksum.verify = true
(default on 0.98+)
• dfs.domain.socket.path
(local Unix domain socket, not group or world readable)
Misc HDFS tips
Keep DN running with some failed disks
• dfs.datanode.failed.volumes.tolerated = <N>
(tolerate losing this many disks)
Distribute data across disks at a DN
• dfs.datanode.fsdataset.volume.choosing.policy =
AvailableSpaceVolumeChoosingPolicy
(HDFS-1804 hit drives with more space with higher probability for writes when free space
differs by more than 10GB by default)
Misc HDFS settings
(just trust me on these)
• dfs.block.size = 268435456
(note that WAL is rolled at 95% of this)
• ipc.server.tcpnodelay = true
• ipc.client.tcpnodelay = true
Misc HDFS settings
(just trust me on these, really)
• dfs.datanode.max.xcievers = 8192
• dfs.namenode.handler.count = 64
• dfs.datanode.handler.count = 8
(match number of spindles)
HBase
RegionServer Settings
hbase-site.xml
Compactions
Compactions - Background
• Writes are buffered in the memstore
• Memstore contents flushed to disk as HFiles
• Need to limit # HFiles by rewriting small HFiles
into fewer larger ones
• Remove deleted and expired Cells
• Same data written multiple times => Write
Amplification!
Read vs. Write
• Read requires merging HFiles => fewer is
better
• Write throughput better with fewer
compactions => leads to more files
• Optimize for Read or Write, not both
Write Amplification
Vs.
Read Performance
Control the number of HFiles
• hbase.hstore.blockingStoreFiles = 10
(do not allow more flushes when there more than <N> files)
small for read, large for write, will stop flushes and writes
• hbase.hstore.compactionThreshold = 3
(number of files that starts a compaction)
small for read, large for write
• hbase.hregion.memstore.flush.size = 128
(max memstore size, default is good)
larger good for fewer compaction (watch Region Server heap)
Time Based Compactions
• HBase does time based major compactions
• expensive, always at wrong time
• hbase.hregion.majorcompaction = 604800000
(week, default)
• hbase.hregion.majorcompaction.jitter = 0.5 (½
week, default)
Memstore/Cache Sizing
• hbase.hregion.memstore.flush.size = 128
• hbase.hregion.memstore.block.multiplier
(allow single memstore to grow by this multiplier, good for heavy, bursty
writes)
• hbase.regionserver.global.memstore.upperLimit (0.98)
hbase.regionserver.global.memstore.size (1.0+)
(percent of heap, default 0.4, decrease for read heavy load)
• hfile.block.cache.size
(percent heap used for the block cache, default 0.4)
Autotune BlockCache vs. Memstores (1.0+)
HBASE-5349, not well tested, Must Experiment
• hbase.regionserver.global.memstore.size.{max|min}.range
• hfile.block.cache.size.{max|min}.range
• hbase.regionserver.heapmemory.tuner.class
• hbase.regionserver.heapmemory.tuner.period
Data Locality
• Essential for Short Circuit Reads
• hbase.hstore.min.locality.to.skip.major.compact
(compact even when unnecessary to restore locality)
• hbase.master.wait.on.regionservers.timeout
(allow master to wait a bit upon restart, so not all region go to the first servers
who sign in 30-90s is good. Default it 4.5s)
• Don’t use the HDFS balancer!
HBase
Column Family
Settings
Block Encoding
• NONE, FAST_DIFF, PREFIX, etc
• alter 'test', { NAME => 'cf',
DATA_BLOCK_ENCODING => 'FAST_DIFF' }
• Scan friendly, decodes as you scan
• Not so Get friendly (might need to decode many
previous Cells)
• Currently produces a lot of extra garbage
• Safe to enable, always
Compression
• NONE, GZIP, SNAPPY, etc
• create ’test', {NAME => ’cf', COMPRESSION => 'SNAPPY’}}
• Compresses entire blocks, not Scan or Get friendly
• Typically does not achieve much over block encoding
• Blocks cached decompressed, unless
hbase.block.data.cachecompressed = true
(more cache capacity, but every access needs decompressions)
• Need to test with your data
HFile Block Size
• Don’t confuse with HDFS block size!
• create ‘test′,{NAME => ‘cf′, BLOCKSIZE => ’4096'}
• Default 64k good compromise between Scans
and point Gets
• Increase for large Scans
• Decrease for many point gets
• Rarely want to change this, likely never > 1mb
RegionServer - Garbage Collection
(source: http://www.everystockphoto.com)
Weak Generational Hypothesis
Most Allocated Objects Die Young
Garbage Collection - Background
HotSpot manages four generations (CMS collector):
• Eden for all new objects
• Survivor I and II where surviving objects are promoted when
eden is collected
• Tenured space. Objects surviving a few rounds (16 by default)
of eden/survivor collection are promoted into the tenured
space
• Perm gen for classes, interned strings, and other more or less
permanent objects. (gone, finally, in JDK8)
Garbage Collection - HBase
• Garbage from operations is shortlived (single
RPC)
• Memstore is relatively long-lived
(allocated in 2mb chunks)
• Blockcache is long-lived
(allocation in 64k blocks)
• Deal with the “operational” garbage efficiently
Garbage Collection (CMS)
-Xmn512m
very small eden space
-XX:+UseParNewGC
collect eden in parallel
-XX:+UseConcMarkSweepGC
use the non-moving CMS collector
-XX:CMSInitiatingOccupancyFraction=70
start collecting when 70% of tenured gen is full, avoid collection under pressure
-XX:+UseCMSInitiatingOccupancyOnly
do not try to adjust CMS setting
RegionServer Machine Sizing
RegionServer Machine Sizing
• How much RAM/Heap?
• How many disks?
• What size of disk?
• Network?
• Number of cores?
RegionServer Disk/Java Heap ratio
• Disk/Heap ratio:
RegionSize / MemstoreSize *
ReplicationFactor *
HeapFractionForMemstores * 2
(assuming memstores on average ½ filled)
• 10gb/128mb * 3 * 0.4 * 2 = 192, with default
settings
RegionServer Disk/Java Heap ratio
• Each 192 bytes on disk need 1 byte of Heap
• With 32gb of heap, can barely fill 6T
disk/machine
(32gb * 192 = 6tb)
192?!
W.T.F.
How about 1gb regions?
1gb/128mb * 3 * 0.4 * 2 = 19
(source: http://www.everystockphoto.com)
RegionServer sizing configs
• hbase.hregion.max.filesize (default 10g is good)
• hbase.hregion.memstore.flush.size (default 128mb)
(decrease for read heavy loads)
• hbase.regionserver.maxlogs
(HDFS blocksize * 0.95 * <this> should larger than
0.4*JavaHeap)
RegionServer Hardware
• <= 6T disk space per machine
• Enough heap (~diskspace/200)
• Many cores are good. HBase is CPU intensive.
• Match network and disk throughput
(1ge and 24 disks is not good 125mb/s vs 2.4gb/s)
(10ge and 24 disks is OK, 1ge and 4 or 6 disks is OK)
• But… For reads with filters more disks are still better.
HBase Client Settings
Client/Server RPC chunk size
• No streaming RPC in HBase
• Can only asymptotically approach the
full network bandwidth
• Typical intra datacenter latency: 0.1ms-1ms
• Transmitting 2mb over 1ge: 150ms
• Transmitting 2mb over 10ge: 15ms
2mb chunks between Client and Server are good
But, how Should I do that?
Client Chunk Size Settings
Write:
• hbase.client.write.buffer = 2mb (default write buffer, good)
Read
• Scan.setCaching(<n>) (default 100 rows)
(but… how large are the rows? Must guess!)
• hbase.client.scanner.max.result.size = 2mb (default scan
buffer, 0.98.12+ only)
Client
Consider RPC size * hbase.regionserver.handler.count for
server GC
Need to be able to ride over splits and region moves:
hbase.client.pause = 100
hbase.client.retries.number = 35
hbase.ipc.client.tcpnodelay = true
Replication (trust me)
• hbase.zookeeper.useMulti = true (needs ZK 3.4)
this one is important for correctness
Other defaults are good:
• replication.sleep.before.failover = 30000
• replication.source.maxretriesmultiplier = 300
• replication.source.ratio = 0.10
Linux
• Turn THP (Transparent Huge Pages) OFF
• Set Swappiness to 0
• Set vm.min_free_kbytes to AT LEAST 1GB (8GB on
larger systems, server allocation immediately)
• Set zone_reclaim_mode to 0
(one cache on NUMA)
• dirsync mount option for EXT4, or use XFS
Not Covered
• Security/Kerberos
• HA NameNode/QJM
• ZK/Disk Layout
• Obscure Configs
• Offheap Caching, G1 GC
(source: http://www.morguefile.com)
TL;DR:
• Enable HDFS Sync on close, Sync behind writes
• Mount EXT4 with dirsync
• Enabled Stale Datanode detection
• Tune HBase read vs. write load
• Set HFile block size for your load
• Get RPC Client/Server chunk size right
Thank You!
http://hadoop-hbase.blogspot.com/

More Related Content

What's hot

HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 

What's hot (20)

HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 

Viewers also liked

HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 

Viewers also liked (11)

HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 

Similar to Apache HBase Performance Tuning

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeoverbigbase
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 

Similar to Apache HBase Performance Tuning (20)

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Apache HBase Performance Tuning

  • 1. HBase Tuning Performance and Correctness Lars Hofhansl Principal Architect, Salesforce (10 years!) HBase, Phoenix Committer, PMC Apache Incubator PMC Apache Foundation Member http://hadoop-hbase.blogspot.com/
  • 2.
  • 4. Agenda • HDFS • HBase – Server • HBase – Client • Correctness • Performance
  • 6. HDFS - Background • Stores HBase WAL and HFiles • No sync-to-disk by default • Datanode writes tmp file, moves it into place • Old data lost on power outage
  • 7. HDFS Correctness Settings • dfs.datanode.synconclose = true (since Hadoop 1.1) • mount ext4 with dirsync! Or use XFS • You must do this!
  • 8. HDFS Performance Settings 1. Sync behind writes 2. Stale Datanode Detection 3. Short Circuit Reads 4. Miscellaneous Settings
  • 9. HDFS Sync Behind Writes • Syncs partial blocks to disk – best effort (OK, since blocks are immutable) • Necessary with sync-on-close for performance • Always enable this • dfs.datanode.sync.behind.writes = true (Since Hadoop 1.1)
  • 10. Stale Datanodes - Background • Datanodes (DNs) send block reports to the Namenode (NN) • After 10min(!) w/o a report, DN is declared dead • NN will still direct reads and writes to those DNs • Bad for recovery. Down by 1 DN by definition. (every 3rd read/write goes to a bad DN)
  • 11. Stale Datanodes - Detection Don’t use a DN for read or write when it looks like it is stale (default off) • dfs.namenode.avoid.read.stale.datanode = true • dfs.namenode.avoid.write.stale.datanode = true • dfs.namenode.stale.datanode.interval = 30000 (default)
  • 12. HDFS short circuit reads Read local blocks directly without DN, when RegionServers and DNs are co-located. • dfs.client.read.shortcircuit = true • dfs.client.read.shortcircuit.buffer.size = 131072 (important, OOM on direct buffers, default on 0.98+) • hbase.regionserver.checksum.verify = true (default on 0.98+) • dfs.domain.socket.path (local Unix domain socket, not group or world readable)
  • 13. Misc HDFS tips Keep DN running with some failed disks • dfs.datanode.failed.volumes.tolerated = <N> (tolerate losing this many disks) Distribute data across disks at a DN • dfs.datanode.fsdataset.volume.choosing.policy = AvailableSpaceVolumeChoosingPolicy (HDFS-1804 hit drives with more space with higher probability for writes when free space differs by more than 10GB by default)
  • 14. Misc HDFS settings (just trust me on these) • dfs.block.size = 268435456 (note that WAL is rolled at 95% of this) • ipc.server.tcpnodelay = true • ipc.client.tcpnodelay = true
  • 15. Misc HDFS settings (just trust me on these, really) • dfs.datanode.max.xcievers = 8192 • dfs.namenode.handler.count = 64 • dfs.datanode.handler.count = 8 (match number of spindles)
  • 16.
  • 19. Compactions - Background • Writes are buffered in the memstore • Memstore contents flushed to disk as HFiles • Need to limit # HFiles by rewriting small HFiles into fewer larger ones • Remove deleted and expired Cells • Same data written multiple times => Write Amplification!
  • 20. Read vs. Write • Read requires merging HFiles => fewer is better • Write throughput better with fewer compactions => leads to more files • Optimize for Read or Write, not both
  • 22. Control the number of HFiles • hbase.hstore.blockingStoreFiles = 10 (do not allow more flushes when there more than <N> files) small for read, large for write, will stop flushes and writes • hbase.hstore.compactionThreshold = 3 (number of files that starts a compaction) small for read, large for write • hbase.hregion.memstore.flush.size = 128 (max memstore size, default is good) larger good for fewer compaction (watch Region Server heap)
  • 23. Time Based Compactions • HBase does time based major compactions • expensive, always at wrong time • hbase.hregion.majorcompaction = 604800000 (week, default) • hbase.hregion.majorcompaction.jitter = 0.5 (½ week, default)
  • 24. Memstore/Cache Sizing • hbase.hregion.memstore.flush.size = 128 • hbase.hregion.memstore.block.multiplier (allow single memstore to grow by this multiplier, good for heavy, bursty writes) • hbase.regionserver.global.memstore.upperLimit (0.98) hbase.regionserver.global.memstore.size (1.0+) (percent of heap, default 0.4, decrease for read heavy load) • hfile.block.cache.size (percent heap used for the block cache, default 0.4)
  • 25. Autotune BlockCache vs. Memstores (1.0+) HBASE-5349, not well tested, Must Experiment • hbase.regionserver.global.memstore.size.{max|min}.range • hfile.block.cache.size.{max|min}.range • hbase.regionserver.heapmemory.tuner.class • hbase.regionserver.heapmemory.tuner.period
  • 26. Data Locality • Essential for Short Circuit Reads • hbase.hstore.min.locality.to.skip.major.compact (compact even when unnecessary to restore locality) • hbase.master.wait.on.regionservers.timeout (allow master to wait a bit upon restart, so not all region go to the first servers who sign in 30-90s is good. Default it 4.5s) • Don’t use the HDFS balancer!
  • 28. Block Encoding • NONE, FAST_DIFF, PREFIX, etc • alter 'test', { NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF' } • Scan friendly, decodes as you scan • Not so Get friendly (might need to decode many previous Cells) • Currently produces a lot of extra garbage • Safe to enable, always
  • 29. Compression • NONE, GZIP, SNAPPY, etc • create ’test', {NAME => ’cf', COMPRESSION => 'SNAPPY’}} • Compresses entire blocks, not Scan or Get friendly • Typically does not achieve much over block encoding • Blocks cached decompressed, unless hbase.block.data.cachecompressed = true (more cache capacity, but every access needs decompressions) • Need to test with your data
  • 30. HFile Block Size • Don’t confuse with HDFS block size! • create ‘test′,{NAME => ‘cf′, BLOCKSIZE => ’4096'} • Default 64k good compromise between Scans and point Gets • Increase for large Scans • Decrease for many point gets • Rarely want to change this, likely never > 1mb
  • 31. RegionServer - Garbage Collection (source: http://www.everystockphoto.com)
  • 32. Weak Generational Hypothesis Most Allocated Objects Die Young
  • 33. Garbage Collection - Background HotSpot manages four generations (CMS collector): • Eden for all new objects • Survivor I and II where surviving objects are promoted when eden is collected • Tenured space. Objects surviving a few rounds (16 by default) of eden/survivor collection are promoted into the tenured space • Perm gen for classes, interned strings, and other more or less permanent objects. (gone, finally, in JDK8)
  • 34. Garbage Collection - HBase • Garbage from operations is shortlived (single RPC) • Memstore is relatively long-lived (allocated in 2mb chunks) • Blockcache is long-lived (allocation in 64k blocks) • Deal with the “operational” garbage efficiently
  • 35. Garbage Collection (CMS) -Xmn512m very small eden space -XX:+UseParNewGC collect eden in parallel -XX:+UseConcMarkSweepGC use the non-moving CMS collector -XX:CMSInitiatingOccupancyFraction=70 start collecting when 70% of tenured gen is full, avoid collection under pressure -XX:+UseCMSInitiatingOccupancyOnly do not try to adjust CMS setting
  • 37. RegionServer Machine Sizing • How much RAM/Heap? • How many disks? • What size of disk? • Network? • Number of cores?
  • 38. RegionServer Disk/Java Heap ratio • Disk/Heap ratio: RegionSize / MemstoreSize * ReplicationFactor * HeapFractionForMemstores * 2 (assuming memstores on average ½ filled) • 10gb/128mb * 3 * 0.4 * 2 = 192, with default settings
  • 39. RegionServer Disk/Java Heap ratio • Each 192 bytes on disk need 1 byte of Heap • With 32gb of heap, can barely fill 6T disk/machine (32gb * 192 = 6tb) 192?! W.T.F.
  • 40. How about 1gb regions? 1gb/128mb * 3 * 0.4 * 2 = 19
  • 42. RegionServer sizing configs • hbase.hregion.max.filesize (default 10g is good) • hbase.hregion.memstore.flush.size (default 128mb) (decrease for read heavy loads) • hbase.regionserver.maxlogs (HDFS blocksize * 0.95 * <this> should larger than 0.4*JavaHeap)
  • 43. RegionServer Hardware • <= 6T disk space per machine • Enough heap (~diskspace/200) • Many cores are good. HBase is CPU intensive. • Match network and disk throughput (1ge and 24 disks is not good 125mb/s vs 2.4gb/s) (10ge and 24 disks is OK, 1ge and 4 or 6 disks is OK) • But… For reads with filters more disks are still better.
  • 45. Client/Server RPC chunk size • No streaming RPC in HBase • Can only asymptotically approach the full network bandwidth • Typical intra datacenter latency: 0.1ms-1ms • Transmitting 2mb over 1ge: 150ms • Transmitting 2mb over 10ge: 15ms
  • 46. 2mb chunks between Client and Server are good But, how Should I do that?
  • 47. Client Chunk Size Settings Write: • hbase.client.write.buffer = 2mb (default write buffer, good) Read • Scan.setCaching(<n>) (default 100 rows) (but… how large are the rows? Must guess!) • hbase.client.scanner.max.result.size = 2mb (default scan buffer, 0.98.12+ only)
  • 48. Client Consider RPC size * hbase.regionserver.handler.count for server GC Need to be able to ride over splits and region moves: hbase.client.pause = 100 hbase.client.retries.number = 35 hbase.ipc.client.tcpnodelay = true
  • 49. Replication (trust me) • hbase.zookeeper.useMulti = true (needs ZK 3.4) this one is important for correctness Other defaults are good: • replication.sleep.before.failover = 30000 • replication.source.maxretriesmultiplier = 300 • replication.source.ratio = 0.10
  • 50. Linux • Turn THP (Transparent Huge Pages) OFF • Set Swappiness to 0 • Set vm.min_free_kbytes to AT LEAST 1GB (8GB on larger systems, server allocation immediately) • Set zone_reclaim_mode to 0 (one cache on NUMA) • dirsync mount option for EXT4, or use XFS
  • 51. Not Covered • Security/Kerberos • HA NameNode/QJM • ZK/Disk Layout • Obscure Configs • Offheap Caching, G1 GC
  • 53. TL;DR: • Enable HDFS Sync on close, Sync behind writes • Mount EXT4 with dirsync • Enabled Stale Datanode detection • Tune HBase read vs. write load • Set HFile block size for your load • Get RPC Client/Server chunk size right