SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Accordion:
HBase Breathes w ith In-Memor y Compaction

Eshcar Hillel, Anastasia Braginsky, Edward Bortnikov ⎪ HBaseCon, Jun 12, 2017
The Team
2
Edward Bortnikov

Anastasia Braginsky

(committer)

Eshcar Hillel

(committer)

Michael Stack

Anoop Sam John

Ramkrishna Vasudevan
Quest: The User’s Holy Grail
3
Reliable 

Persistent

Storage

In-Memory

Database

Performance
What is Accordion?
4
Novel Write-Path Algorithm



Better Performance of Write-Intensive Workloads


Write Throughput ì, Read Latency î



Better Disk Use


Write amplification î



GA in HBase 2.0 (becomes default MemStore implementation)
In a Nutshell
5
Inspired by Log-Structured-Merge (LSM) Tree Design

Transforms random I/O to sequential I/O (efficient!)

Governs the HBase storage organization



Accordion reapplies the LSM Tree design to RAM data

à Efficient resource use – data lives in memory longer

à  Less disk I/O

à  Ultimately, higher speed
How LSM Trees Work
6
MemStore

HFile

HFile

HFile

RAM

Disk

Put
 Get/Scan

Flush

Compaction

HRegion 

Data updates
stored as versions



Compaction
eliminates

redundancies
LSM Trees in Action
7
MemStore
 MemStore
 MemStore
 MemStore
 MemStore

HFile
 HFile

HFile

HFile

HFile

HFile

HFile

HFile

Flush!
 Flush!
 Flush!
 Compaction!
Accordion: In-Memory LSM Tree
8
Active Segment

HFile

Immutable Segment

Immutable Segment



Immutable Segment

HFile

Compacting

MemStore

Flush

Put
 Get/Scan

Compaction

RAM

Disk
Accordion in Action
9
Active

Segment

Active

Segment

Active

Segment

Active

Segment



Active

Segment



Immutable

Segment

Immutable

Segment

Immutable

Segment



Immutable

Segment



In-Memory

Flush!

In-Memory

Flush!

In-Memory

Compaction!

Snapshot



Disk Flush!

Compaction

Pipeline
Flat Immutable Segment Index
10
Hhjs
iutkldfk;wjt;w
iejerg;iopp
Jkkgkykytkt
gcccdddeiuy
oweuoweiuo
ieu
Poiuytrewqa
sdfaaabbbm
nppppbvcxq
qaaaxcvb
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkdddfkgbbbd
iwpoqqqaaacc
cdddeiuyoweu
oweiuoieu
k;wjt;wiej;iwj
opppqqqyrta
aajeutkiyt
Jkkgkykytcc
dddeiuyowe
uoweiuoieu
Poiuytrewqa
hmnppppbv
cxqqaaaxcv
b
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkkgkykytktjjjjo
ooooooqqbyfjt
dhghfhfngfhfg
bcccdddeiuyo
weuoweiuoieu
Hhjjuuyrqaa
ss
iuaaajeutkiyt
Jkkgkykytktg
kg;diwpoqeu
oweiuoieu
Poiuytrejkl;;
mnppppbvcx
qqaaaxcvb
qqqwertyuioas
dfghjklrtyuioplk
jhgfpppwwwm
nbvcmnb
Jkdddfkaabbb
cccdddeiuyow
euoweiuoieu
utkldfk;ioppp
qqqyrtaaaje
utkiyt
diwpoqqqaa
abbbcccddw
euoweiuoieu
hjkl;;mnpppp
bvcxqqaaax
cvb
qqqwertyuioas
dfghjklrtyuioplk
jhgfpppwwwm
nbvcmnb
Jkkgkyaaabbyf
jtdhghfhfngfhfg
bcccdddeiuyo
weuoweiuoieu
Cell Storage

Cell Storage

Flatten

Skiplist Index
 CellArrayMap Index

Lean footprint – the smaller the cells the better!

KV-
Objects
Redundancy Elimination
11
In-Memory Compaction merges the pipelined segments 

Get access latency under control (less segments to scan)



BASIC compaction 

Multiple indexes merged into one, cell data remains in place



EAGER compaction 

Redundant data versions eliminated (SQM scan)
BASIC vs EAGER
12
BASIC: universal optimization, avoids physical data copy 




EAGER: high value for highly redundant workloads

SQM scan is expensive

Data relocation cost may be high (think MSLAB!) 



Configuration

BASIC is default, EAGER may be configured

Future implementation may figure out the right mode automatically
Compaction Pipeline: Correctness & Performance
13
Shared Data Structure

Read access: Get, Scan, in-memory compaction

Write access: in-memory flush, in-memory compaction, disk flush



Design Choice: Non-Blocking Reads

Read-Only pipeline clone – no synchronization upon read access

Copy-on-Write upon modification 

Versioning prevents compaction concurrent to other updates
More Memory Efficiency - KV Object Elimination
14
Hhjs
iutkldfk;wjt;w
iejerg;iopp
Jkkgkykytkt
gcccdddeiuy
oweuoweiuo
ieu
Poiuytrewqa
sdfaaabbbm
nppppbvcxq
qaaaxcvb
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkdddfkgbbbd
iwpoqqqaaacc
cdddeiuyoweu
oweiuoieu
k;wjt;wiej;iwj
opppqqqyrta
aajeutkiyt
Jkkgkykytcc
dddeiuyowe
uoweiuoieu
Poiuytrewqa
hmnppppbv
cxqqaaaxcv
b
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkkgkykytktjjjjo
ooooooqqbyfjt
dhghfhfngfhfg
bcccdddeiuyo
weuoweiuoieu
Cell Storage

CellArrayMap Index

Lean Footprint (no KV-Objects). Friendly to Off-Heap Implementation. 

Hhjs
iutkldfk;wjt;w
iejerg;iopp
Jkkgkykytkt
gcccdddeiuy
oweuoweiuo
ieu
Poiuytrewqa
sdfaaabbbm
nppppbvcxq
qaaaxcvb
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkdddfkgbbbd
iwpoqqqaaacc
cdddeiuyoweu
oweiuoieu
k;wjt;wiej;iwj
opppqqqyrta
aajeutkiyt
Jkkgkykytcc
dddeiuyowe
uoweiuoieu
Poiuytrewqa
hmnppppbv
cxqqaaaxcv
b
qqqwertyuioas
dfghjklrtyuiopl
kjhgfpppwww
mnbvcmnb
Jkkgkykytktjjjjo
ooooooqqbyfjt
dhghfhfngfhfg
bcccdddeiuyo
weuoweiuoieu
Cell Storage

CellChunkMap Index
The Software Side: What’s New?
15
CompactingMemStore: BASIC and EAGER configurations

DefaultMemStore: NONE configuration



Segment Class Hierarchy: Mutable, Immutable, Composite



NavigableMap Implementations: CellArrayMap, CellChunkMap



MemStoreCompactor: compaction algorithms implementation
CellChunkMap Support (Experimental)
16
Cell objects embedded directly into CellChunkMap (CCM)

New cell type - reference data by unique ChunkID



ChunkCreator: Chunk allocation + ChunkID management

Stores mapping of ChunkID’s to Chunk references

Strong references to chunks managed by CCM’s, weak to the rest

The CCM’s themselves are allocated via the same mechanism



Some exotic use cases 

E.g., jumbo cells allocated in one-time chunks outside chunk pools
Evaluation Setup
17
System 


2-node HBase on top of 3-node HDFS, 1Gbps interconnect

Intel Xeon E5620 (12-core), 2.8TB SSD storage, 48GB RAM

RS config: 16GB RAM (40% Cache/40% MemStore), on-heap, no MSLAB

Data 



1 table (100 regions, 50 columns), 30GB-100GB

Workload Driver 

YCSB (1 node, 12 threads)

Batched (async) writes (10KB buffer)
Experiments
18
Metrics 

Write throughput, read latency (distribution), disk footprint/amplification



Workloads 
 (varied at client side)

Write-Only (100% Put) vs Mixed (50% Put/50% Get)

Uniform vs Zipfian Key Distributions

Small Values (100B) vs Big Values (1K)



Configurations (varied at server side)

Most experiments exercise Async WAL
19




Write Throughput
+25% +44% 100GB Dataset

100% Writes 

100B Values



Every write updates

a single column

Gains less pronounced 

with big values (1KB)

+11%
(why?)
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Zipf Uniform
Throughput,ops/sec
NONE
BASIC
EAGER
20




Single-Key Write Latency
0
1
2
3
4
5
6
7
50% (median) 75% 95% 99% (tail)
Latency,ms
NONE
BASIC
EAGER
100GB Dataset

Zipf distribution

100% Writes 

100B Values
21




Single-Key Read Latency
30GB Dataset

Zipf Distribution

50% Writes/50% Reads 

100B Values

0
1
2
3
4
5
6
50% (median) 75% 95% 99% (tail)
Latency,ms
NONE
BASIC
EAGER
+9%
(why?)
-13%
22




Disk Footprint/Write Amplification
100GB Dataset

Zipf Distribution

100% Writes

100B Values
-29%
0
200
400
600
800
1000
1200
Flushes Compactions Data Written (GB)
NONE
BASIC
EAGER
Status
23
In-Memory Compaction GA in HBase 2.0 

Master JIRA HBASE-14918 complete (~20 subtasks)

Major refactoring/extension of the MemStore code

Many details in Apache HBase blog posts



CellChunkMap Index, Off-Heap support in progress 

Master JIRA HBASE-16421
Summary
24
Accordion = a leaner and faster write path



Space-Efficient Index + Redundancy Elimination à less I/O

Less Frequent Flushes à increased write throughput

Less On-Disk Compaction à reduced write amplification

Data stays longer in RAM à reduced tail read latency



Edging Closer to In-Memory Database Performance
Thanks to Our Partners for Being Awesome

25

Contenu connexe

Tendances

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
Microsoft azure for sql server professionals
Microsoft azure for sql server professionalsMicrosoft azure for sql server professionals
Microsoft azure for sql server professionalsArmando Lacerda
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
PostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter TuningPostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter TuningAshnikbiz
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
WiredTiger MongoDB Integration
WiredTiger MongoDB Integration WiredTiger MongoDB Integration
WiredTiger MongoDB Integration MongoDB
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...MongoDB
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Answering the Database Scale Out Problem with PCI SSDs
Answering the Database Scale Out Problem with PCI SSDsAnswering the Database Scale Out Problem with PCI SSDs
Answering the Database Scale Out Problem with PCI SSDsanswers
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 

Tendances (20)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
Microsoft azure for sql server professionals
Microsoft azure for sql server professionalsMicrosoft azure for sql server professionals
Microsoft azure for sql server professionals
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
PostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter TuningPostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter Tuning
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
WiredTiger MongoDB Integration
WiredTiger MongoDB Integration WiredTiger MongoDB Integration
WiredTiger MongoDB Integration
 
Rit 2011 ats
Rit 2011 atsRit 2011 ats
Rit 2011 ats
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Answering the Database Scale Out Problem with PCI SSDs
Answering the Database Scale Out Problem with PCI SSDsAnswering the Database Scale Out Problem with PCI SSDs
Answering the Database Scale Out Problem with PCI SSDs
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 

Similaire à Accordion HBaseCon 2017

02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 
Design Tradeoffs for SSD Performance
Design Tradeoffs for SSD PerformanceDesign Tradeoffs for SSD Performance
Design Tradeoffs for SSD Performancejimmytruong
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
 
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & BenchmarkingCeph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & BenchmarkingCeph Community
 
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeHow to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeAerospike, Inc.
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...In-Memory Computing Summit
 
Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ceph Day Santa Clara: Ceph Performance & Benchmarking Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ceph Day Santa Clara: Ceph Performance & Benchmarking Ceph Community
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
MySQL Performance - SydPHP October 2011
MySQL Performance - SydPHP October 2011MySQL Performance - SydPHP October 2011
MySQL Performance - SydPHP October 2011Graham Weldon
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemYahoo Developer Network
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 

Similaire à Accordion HBaseCon 2017 (20)

02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Design Tradeoffs for SSD Performance
Design Tradeoffs for SSD PerformanceDesign Tradeoffs for SSD Performance
Design Tradeoffs for SSD Performance
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
 
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & BenchmarkingCeph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & Benchmarking
 
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeHow to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ceph Day Santa Clara: Ceph Performance & Benchmarking Ceph Day Santa Clara: Ceph Performance & Benchmarking
Ceph Day Santa Clara: Ceph Performance & Benchmarking
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
MySQL Performance - SydPHP October 2011
MySQL Performance - SydPHP October 2011MySQL Performance - SydPHP October 2011
MySQL Performance - SydPHP October 2011
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 

Dernier

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 

Dernier (20)

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 

Accordion HBaseCon 2017

  • 1. Accordion: HBase Breathes w ith In-Memor y Compaction Eshcar Hillel, Anastasia Braginsky, Edward Bortnikov ⎪ HBaseCon, Jun 12, 2017
  • 2. The Team 2 Edward Bortnikov Anastasia Braginsky (committer) Eshcar Hillel (committer) Michael Stack Anoop Sam John Ramkrishna Vasudevan
  • 3. Quest: The User’s Holy Grail 3 Reliable Persistent Storage In-Memory Database Performance
  • 4. What is Accordion? 4 Novel Write-Path Algorithm Better Performance of Write-Intensive Workloads Write Throughput ì, Read Latency î Better Disk Use Write amplification î GA in HBase 2.0 (becomes default MemStore implementation)
  • 5. In a Nutshell 5 Inspired by Log-Structured-Merge (LSM) Tree Design Transforms random I/O to sequential I/O (efficient!) Governs the HBase storage organization Accordion reapplies the LSM Tree design to RAM data à Efficient resource use – data lives in memory longer à  Less disk I/O à  Ultimately, higher speed
  • 6. How LSM Trees Work 6 MemStore HFile HFile HFile RAM Disk Put Get/Scan Flush Compaction HRegion Data updates stored as versions Compaction eliminates redundancies
  • 7. LSM Trees in Action 7 MemStore MemStore MemStore MemStore MemStore HFile HFile HFile HFile HFile HFile HFile HFile Flush! Flush! Flush! Compaction!
  • 8. Accordion: In-Memory LSM Tree 8 Active Segment HFile Immutable Segment Immutable Segment Immutable Segment HFile Compacting MemStore Flush Put Get/Scan Compaction RAM Disk
  • 10. Flat Immutable Segment Index 10 Hhjs iutkldfk;wjt;w iejerg;iopp Jkkgkykytkt gcccdddeiuy oweuoweiuo ieu Poiuytrewqa sdfaaabbbm nppppbvcxq qaaaxcvb qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkdddfkgbbbd iwpoqqqaaacc cdddeiuyoweu oweiuoieu k;wjt;wiej;iwj opppqqqyrta aajeutkiyt Jkkgkykytcc dddeiuyowe uoweiuoieu Poiuytrewqa hmnppppbv cxqqaaaxcv b qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkkgkykytktjjjjo ooooooqqbyfjt dhghfhfngfhfg bcccdddeiuyo weuoweiuoieu Hhjjuuyrqaa ss iuaaajeutkiyt Jkkgkykytktg kg;diwpoqeu oweiuoieu Poiuytrejkl;; mnppppbvcx qqaaaxcvb qqqwertyuioas dfghjklrtyuioplk jhgfpppwwwm nbvcmnb Jkdddfkaabbb cccdddeiuyow euoweiuoieu utkldfk;ioppp qqqyrtaaaje utkiyt diwpoqqqaa abbbcccddw euoweiuoieu hjkl;;mnpppp bvcxqqaaax cvb qqqwertyuioas dfghjklrtyuioplk jhgfpppwwwm nbvcmnb Jkkgkyaaabbyf jtdhghfhfngfhfg bcccdddeiuyo weuoweiuoieu Cell Storage Cell Storage Flatten Skiplist Index CellArrayMap Index Lean footprint – the smaller the cells the better! KV- Objects
  • 11. Redundancy Elimination 11 In-Memory Compaction merges the pipelined segments Get access latency under control (less segments to scan) BASIC compaction Multiple indexes merged into one, cell data remains in place EAGER compaction Redundant data versions eliminated (SQM scan)
  • 12. BASIC vs EAGER 12 BASIC: universal optimization, avoids physical data copy EAGER: high value for highly redundant workloads SQM scan is expensive Data relocation cost may be high (think MSLAB!) Configuration BASIC is default, EAGER may be configured Future implementation may figure out the right mode automatically
  • 13. Compaction Pipeline: Correctness & Performance 13 Shared Data Structure Read access: Get, Scan, in-memory compaction Write access: in-memory flush, in-memory compaction, disk flush Design Choice: Non-Blocking Reads Read-Only pipeline clone – no synchronization upon read access Copy-on-Write upon modification Versioning prevents compaction concurrent to other updates
  • 14. More Memory Efficiency - KV Object Elimination 14 Hhjs iutkldfk;wjt;w iejerg;iopp Jkkgkykytkt gcccdddeiuy oweuoweiuo ieu Poiuytrewqa sdfaaabbbm nppppbvcxq qaaaxcvb qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkdddfkgbbbd iwpoqqqaaacc cdddeiuyoweu oweiuoieu k;wjt;wiej;iwj opppqqqyrta aajeutkiyt Jkkgkykytcc dddeiuyowe uoweiuoieu Poiuytrewqa hmnppppbv cxqqaaaxcv b qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkkgkykytktjjjjo ooooooqqbyfjt dhghfhfngfhfg bcccdddeiuyo weuoweiuoieu Cell Storage CellArrayMap Index Lean Footprint (no KV-Objects). Friendly to Off-Heap Implementation. Hhjs iutkldfk;wjt;w iejerg;iopp Jkkgkykytkt gcccdddeiuy oweuoweiuo ieu Poiuytrewqa sdfaaabbbm nppppbvcxq qaaaxcvb qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkdddfkgbbbd iwpoqqqaaacc cdddeiuyoweu oweiuoieu k;wjt;wiej;iwj opppqqqyrta aajeutkiyt Jkkgkykytcc dddeiuyowe uoweiuoieu Poiuytrewqa hmnppppbv cxqqaaaxcv b qqqwertyuioas dfghjklrtyuiopl kjhgfpppwww mnbvcmnb Jkkgkykytktjjjjo ooooooqqbyfjt dhghfhfngfhfg bcccdddeiuyo weuoweiuoieu Cell Storage CellChunkMap Index
  • 15. The Software Side: What’s New? 15 CompactingMemStore: BASIC and EAGER configurations DefaultMemStore: NONE configuration Segment Class Hierarchy: Mutable, Immutable, Composite NavigableMap Implementations: CellArrayMap, CellChunkMap MemStoreCompactor: compaction algorithms implementation
  • 16. CellChunkMap Support (Experimental) 16 Cell objects embedded directly into CellChunkMap (CCM) New cell type - reference data by unique ChunkID ChunkCreator: Chunk allocation + ChunkID management Stores mapping of ChunkID’s to Chunk references Strong references to chunks managed by CCM’s, weak to the rest The CCM’s themselves are allocated via the same mechanism Some exotic use cases E.g., jumbo cells allocated in one-time chunks outside chunk pools
  • 17. Evaluation Setup 17 System 2-node HBase on top of 3-node HDFS, 1Gbps interconnect Intel Xeon E5620 (12-core), 2.8TB SSD storage, 48GB RAM RS config: 16GB RAM (40% Cache/40% MemStore), on-heap, no MSLAB Data 1 table (100 regions, 50 columns), 30GB-100GB Workload Driver YCSB (1 node, 12 threads) Batched (async) writes (10KB buffer)
  • 18. Experiments 18 Metrics Write throughput, read latency (distribution), disk footprint/amplification Workloads (varied at client side) Write-Only (100% Put) vs Mixed (50% Put/50% Get) Uniform vs Zipfian Key Distributions Small Values (100B) vs Big Values (1K) Configurations (varied at server side) Most experiments exercise Async WAL
  • 19. 19 Write Throughput +25% +44% 100GB Dataset 100% Writes 100B Values Every write updates a single column Gains less pronounced with big values (1KB) +11% (why?) - 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 Zipf Uniform Throughput,ops/sec NONE BASIC EAGER
  • 20. 20 Single-Key Write Latency 0 1 2 3 4 5 6 7 50% (median) 75% 95% 99% (tail) Latency,ms NONE BASIC EAGER 100GB Dataset Zipf distribution 100% Writes 100B Values
  • 21. 21 Single-Key Read Latency 30GB Dataset Zipf Distribution 50% Writes/50% Reads 100B Values 0 1 2 3 4 5 6 50% (median) 75% 95% 99% (tail) Latency,ms NONE BASIC EAGER +9% (why?) -13%
  • 22. 22 Disk Footprint/Write Amplification 100GB Dataset Zipf Distribution 100% Writes 100B Values -29% 0 200 400 600 800 1000 1200 Flushes Compactions Data Written (GB) NONE BASIC EAGER
  • 23. Status 23 In-Memory Compaction GA in HBase 2.0 Master JIRA HBASE-14918 complete (~20 subtasks) Major refactoring/extension of the MemStore code Many details in Apache HBase blog posts CellChunkMap Index, Off-Heap support in progress Master JIRA HBASE-16421
  • 24. Summary 24 Accordion = a leaner and faster write path Space-Efficient Index + Redundancy Elimination à less I/O Less Frequent Flushes à increased write throughput Less On-Disk Compaction à reduced write amplification Data stays longer in RAM à reduced tail read latency Edging Closer to In-Memory Database Performance
  • 25. Thanks to Our Partners for Being Awesome 25