SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
2007 JavaOne Conference

A Lock-Free Hash Table

A Lock-Free Hash Table

Dr. Cliff Click
Chief JVM Architect & Distinguished Engineer
blogs.azulsystems.com/cliff
Azul Systems
May 8, 2007
Hash Tables
www.azulsystems.com

•
•
•
•

Constant-Time Key-Value Mapping
Fast arbitrary function
Extendable, defined at runtime
Used for symbol tables, DB caching, network
access, url caching, web content, etc
• Crucial for Large Business Applications
─ > 1MLOC
• Used in Very heavily multi-threaded apps
─ > 1000 threads

|

2
©2003 Azul Systems, Inc.
Popular Java Implementations
www.azulsystems.com

• Java's HashTable
─ Single threaded; scaling bottleneck

• HashMap

─ Faster but NOT multi-thread safe

• java.util.concurrent.HashMap

─ Striped internal locks; 16-way the default

• Azul, IBM, Sun sell machines >100cpus
• Azul has customers using all cpus in same app
• Becomes a scaling bottleneck!
|

3
©2003 Azul Systems, Inc.
A Lock-Free Hash Table
www.azulsystems.com

• No locks, even during table resize
No spin-locks
No blocking while holding locks
All CAS spin-loops bounded
Make progress even if other threads die....
• Requires atomic update instruction:
CAS (Compare-And-Swap),
LL/SC (Load-Linked/Store-Conditional, PPC only)
or similar
• Uses sun.misc.Unsafe for CAS
─
─
─
─

|

4
©2003 Azul Systems, Inc.
A Faster Hash Table
www.azulsystems.com

• Slightly faster than j.u.c for 99% reads < 32 cpus
• Faster with more cpus (2x faster)
─ Even with 4096-way striping
─ 10x faster with default striping

• 3x Faster for 95% reads (30x vs default)
• 8x Faster for 75% reads (100x vs default)
• Scales well up to 768 cpus, 75% reads
─ Approaches hardware bandwidth limits

|

5
©2003 Azul Systems, Inc.
Agenda
www.azulsystems.com

•
•
•
•
•
•

|

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Performance
Q&A

6
©2003 Azul Systems, Inc.
Some “Uninteresting” Details
www.azulsystems.com

• Hashtable: A collection of Key/Value Pairs
• Works with any collection
• Scaling, locking, bottlenecks of the collection

management responsibility of that collection
• Must be fast or O(1) effects kill you
• Must be cache-aware
• I'll present a sample Java solution
─ But other solutions can work, make sense

|

7
©2003 Azul Systems, Inc.
“Uninteresting” Details
www.azulsystems.com

• Closed Power-of-2 Hash Table
─ Reprobe on collision
─ Stride-1 reprobe: better cache behavior

• Key & Value on same cache line
• Hash memoized

─ Should be same cache line as K + V
─ But hard to do in pure Java

• No allocation on get() or put()
• Auto-Resize
|

8
©2003 Azul Systems, Inc.
Example get() code
www.azulsystems.com

idx = hash = key.hashCode();
while( true ) {

// reprobing loop

idx &= (size-1);

// limit idx to table size

k = get_key(idx);

// start cache miss early

h = get_hash(idx);

// memoized hash

if( k == key || (h == hash && key.equals(k)) )
return get_val(idx);// return matching value
if( k == null ) return null;
idx++;
}

|

9
©2003 Azul Systems, Inc.

// reprobe
“Uninteresting” Details
www.azulsystems.com

• Could use prime table + MOD
─ Better hash spread, fewer reprobes
─ But MOD is 30x slower than AND

• Could use open table
put() requires allocation
Follow 'next' pointer instead of reprobe
Each 'next' is a cache miss
Lousy hash -> linked-list traversal
• Could put Key/Value/Hash on same cache line
• Other variants possible, interesting
─
─
─
─

| 10
©2003 Azul Systems, Inc.
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Performance
Q&A

| 11
©2003 Azul Systems, Inc.
Ordering and Correctness
www.azulsystems.com

• How to show table mods correct?
─ put, putIfAbsent, change, delete, etc.

• Prove via: fencing, memory model, load/store
•
•
•
•

ordering, “happens-before”?
Instead prove* via state machine
Define all possible {Key,Value} states
Define Transitions, State Machine
Show all states “legal”
*Warning: hand-wavy proof follows

| 12
©2003 Azul Systems, Inc.
State-Based Reasoning
www.azulsystems.com

• Define all {Key,Value} states and transitions
• Don't Care about memory ordering:

─ get() can read Key, Value in any order
─ put() can change Key, Value in any order
─ put() must use CAS to change Key or Value
─ But not double-CAS

• No fencing required for correctness!
─ (sometimes stronger guarantees are wanted

and will need fencing)
• Proof is simple!
| 13
©2003 Azul Systems, Inc.
Valid States
www.azulsystems.com

• A Key slot is:

─ null – empty
─ K – some Key; can never change again

• A Value slot is:

─ null – empty
─ T – tombstone
─ V – some Values

• A state is a {Key,Value} pair
• A transition is a successful CAS
| 14
©2003 Azul Systems, Inc.
State Machine
www.azulsystems.com

{null,null}

{K,T}

Empty

deleted key
insert

delete

insert
{K,null}
Partially inserted K/V pair

{null,T/V}
Partially inserted K/V pair Reader-only state

| 15
©2003 Azul Systems, Inc.

change

{K,V}

Standard K/V pair
Example put(key,newval) code:
www.azulsystems.com

idx = hash = key.hashCode();
while( true ) {

// Key-Claim stanza

idx &= (size-1);
k = get_key(idx);

// State: {k,?}

if( k == null &&

// {null,?} -> {key,?}

CAS_key(idx,null,key) )
break;
h = get_hash(idx);

// State: {key,?}
// get memoized hash

if( k == key || (h == hash && key.equals(k)) )
break;
idx++;
}
| 16
©2003 Azul Systems, Inc.

// State: {key,?}
// reprobe
Example put(key,newval) code
www.azulsystems.com

// State: {key,?}
oldval = get_val(idx);

// State: {key,oldval}

// Transition: {key,oldval} -> {key,newval}
if( CAS_val(idx,oldval,newval) ) {
// Transition worked
...

// Adjust size

} else {
// Transition failed; oldval has changed
// We can act “as if” our put() worked but
// was immediately stomped over
}
return oldval;
| 17
©2003 Azul Systems, Inc.
Some Things to Notice
www.azulsystems.com

• Once a Key is set, it never changes

─ No chance of returning Value for wrong Key
─ Means Keys leak; table fills up with dead Keys
─ Fix in a few slides...

• No ordering guarantees provided!

─ Bring Your Own Ordering/Synchronization

• Weird {null,V} state meaningful but uninteresting

─ Means reader got an empty key and so missed
─ But possibly prefetched wrong Value

| 18
©2003 Azul Systems, Inc.
Some Things to Notice
www.azulsystems.com

• There is no machine-wide coherent State!
• Nobody guaranteed to read the same State

─ Except on the same CPU with no other writers

• No need for it either
• Consider degenerate case of a single Key
• Same guarantees as:

─ single shared global variable
─ many readers & writers, no synchronization
─ i.e., darned little

| 19
©2003 Azul Systems, Inc.
A Slightly Stronger Guarantee
www.azulsystems.com

• Probably want “happens-before” on Values
─ java.util.concurrent provides this

• Similar to declaring that shared global 'volatile'
• Things written into a Value before put()
─ Are guaranteed to be seen after a get()

• Requires st/st fence before CAS'ing Value
─ “free” on Sparc, X86

• Requires ld/ld fence after loading Value
─ “free” on Azul

| 20
©2003 Azul Systems, Inc.
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Performance
Q&A

| 21
©2003 Azul Systems, Inc.
Resizing The Table
www.azulsystems.com

• Need to resize if table gets full
• Or just re-probing too often
• Resize copies live K/V pairs

─ Doubles as cleanup of dead Keys
─ Resize (“cleanse”) after any delete
─ Throttled, once per GC cycle is plenty often

• Alas, need fencing, 'happens before'
• Hard bit for concurrent resize & put():

─ Must not drop the last update to old table

| 22
©2003 Azul Systems, Inc.
Resizing
www.azulsystems.com

• Expand State Machine
• Side-effect: mid-resize is a valid State
• Means resize is:

─ Concurrent – readers can help, or just read&go
─ Parallel – all can help
─ Incremental – partial copy is OK

• Pay an extra indirection while resize in progress
─ So want to finish the job eventually

• Stacked partial resizes OK, expected
| 23
©2003 Azul Systems, Inc.
get/put during Resize
www.azulsystems.com

• get() works on the old table
─ Unless see a sentinel

• put() or other mod must use new table
• Must check for new table every time

─ Late writes to old table 'happens before' resize

• Copying K/V pairs is independent of get/put
• Copy has many heuristics to choose from:

─ All touching threads, only writers, unrelated

background thread(s), etc

| 24
©2003 Azul Systems, Inc.
New State: 'use new table' Sentinel
www.azulsystems.com

• X: sentinel used during table-copy

─ Means: not in old table, check new

• A Key slot is:

─ null, K
─ X – 'use new table', not any valid Key
─ null → K OR null → X

• A Value slot is:

─ null, T, V
─ X – 'use new table', not any valid Value
─ null → {T,V}* → X

| 25
©2003 Azul Systems, Inc.
State Machine – old table
www.azulsystems.com

{X,null}

kill

{null,null}

{K,T}

Empty
insert

check newer table

Deleted key

kill

delete

insert

{K,null}

{K,X}
copy {K,V} into newer table

{null,T/V/X}
Partially inserted
K/V pair
| 26
©2003 Azul Systems, Inc.

change

{K,V}

Standard K/V pair
States {X,T/V/X} not possible
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

| 27
©2003 Azul Systems, Inc.

{X,null}
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

{X,null}

dead or partially inserted
{K,T/null}

| 28
©2003 Azul Systems, Inc.

{K,X}
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

{X,null}

dead or partially inserted
{K,T/null}

{K,X}

alive, but old
{K,V1}

{K,X}
old table
new table

{K,V2}

| 29
©2003 Azul Systems, Inc.

{K,V2}
Copying Old To New
www.azulsystems.com

• New States V', T' – primed versions of V,T

─ Prime'd values in new table copied from old
─ Non-prime in new table is recent put()
─ “happens after” any prime'd value
─ Prime allows 2-phase commit
─ Engineering: wrapper class (Java), steal bit (C)

• Must be sure to copy late-arriving old-table write
• Attempt to copy atomically
─ May fail & copy does not make progress
─ But old, new tables not damaged

| 30
©2003 Azul Systems, Inc.
New States: Prime'd
www.azulsystems.com

• A Key slot is:
─ null, K, X

• A Value slot is:

─ null, T, V, X
─ T',V' – primed versions of T & V
─ Old things copied into the new table
─ “2-phase commit”
─ null → {T',V'}* → {T,V}* → X

• State Machine again...

| 31
©2003 Azul Systems, Inc.
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K/V pair

{K,X}
copy {K,V} into newer table

{K,T'/V'}
{null,T/T'/V/V'/X}

kill

insert

copy in from
older table

check newer table

delete

Empty

{X,null}

{K,V}
Standard K/V pair
States {X,T/T'/V/V'/X} not possible

| 32
©2003 Azul Systems, Inc.
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K/V pair

{K,X}
copy {K,V} into newer table

{K,T'/V'}
{null,T/T'/V/V'/X}

kill

insert

copy in from
older table

check newer table

delete

Empty

{X,null}

{K,V}
Standard K/V pair
States {X,T/T'/V/V'/X} not possible

| 33
©2003 Azul Systems, Inc.
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K/V pair

{K,X}
copy {K,V} into newer table

{K,T'/V'}
{null,T/T'/V/V'/X}

kill

insert

copy in from
older table

check newer table

delete

Empty

{X,null}

{K,V}
Standard K/V pair
States {X,T/T'/V/V'/X} not possible

| 34
©2003 Azul Systems, Inc.
State Machine: Copy One Pair
www.azulsystems.com

{K,V1}
read V1

K,V' in new table
X in old table

{K,X}

old
new
{K,V'x}

read V'x

{K,V'1}
partial copy

Fence

Fence
| 35
©2003 Azul Systems, Inc.

{K,V'1}
{K,V1}
copy
complete
Some Things to Notice
www.azulsystems.com

• Old value could be V or T

─ or V' or T' (if nested resize in progress)

• Skip copy if new Value is not prime'd

─ Means recent put() overwrote any old Value

• If CAS into new fails

─ Means either put() or other copy in progress
─ So this copy can quit

• Any thread can see any state at any time
─ And CAS to the next state

| 36
©2003 Azul Systems, Inc.
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Performance
Q&A

| 37
©2003 Azul Systems, Inc.
Microbenchmark
www.azulsystems.com

• Measure insert/lookup/remove of Strings
• Tight loop: no work beyond HashTable itself and

test harness (mostly RNG)
• “Guaranteed not to exceed” numbers
• All fences; full ConcurrentHashMap semantics
• Variables:
─ 99% get, 1% put (typical cache) vs 75 / 25
─ Dual Athalon, Niagara, Azul Vega1, Vega2
─ Threads from 1 to 800
─ NonBlocking vs 4096-way ConcurrentHashMap
─ 1K entry table vs 1M entry table
| 38
©2003 Azul Systems, Inc.
AMD 2.4GHz – 2 (ht) cpus
www.azulsystems.com

1K Table

1M Table
30

30

NB-99

25

CHM-99

20

M-ops/sec

20

M-ops/sec

25

NB-75

15

CHM-75

10

15

10

5

0

0

NB

5

1

2

3

4

5

Threads

| 39
©2003 Azul Systems, Inc.

6

7

8

CHM
1

2

3

4

5

Threads

6

7

8
Niagara – 8x4 cpus
www.azulsystems.com

1K Table

1M Table
80

70

70

60

60

CHM-99, NB-99

50

M-ops/sec

M-ops/sec

80

40
30

CHM-75, NB-75

50
40

NB

30

20

20

10

10

0

CHM

0
0

8

16

24

32

40

Threads

| 40
©2003 Azul Systems, Inc.

48

56

64

0

8

16

24

32

Threads

40

48

56

64
Azul Vega1 – 384 cpus
www.azulsystems.com

1K Table

1M Table

500

500

NB-99
400

300

300

M-ops/sec

M-ops/sec

400

CHM-99

200

200

NB

NB-75
100

100

CHM-75
0

0

100

200

Threads

| 41
©2003 Azul Systems, Inc.

300

400

CHM

0
0

100

200

Threads

300

400
Azul Vega2 – 768 cpus
www.azulsystems.com

1K Table

1M Table

1200

1200

NB-99

1000

1000

800

CHM-99

600

M-ops/sec

M-ops/sec

800

600

NB

400

400

NB-75
200

200

CHM-75
0

CHM
0

0

100 200 300 400 500 600 700 800

Threads

| 42
©2003 Azul Systems, Inc.

0

100

200

300

400

500

Threads

600

700 800
Summary
www.azulsystems.com

•
•
•
•
●

●
●

A faster lock-free HashTable
Faster for more CPUs
Much faster for higher table modification rate
State-Based Reasoning:
─ No ordering, no JMM, no fencing
Any thread can see any state at any time
● Must assume values change at each step
State graphs really helped coding & debugging
Resulting code is small & fast

| 43
©2003 Azul Systems, Inc.
Summary
www.azulsystems.com

●

●

●

Obvious future work:
● Tools to check states
● Tools to write code
Seems applicable to other data structures as well
● Concurrent append j.u.Vector
● Scalable near-FIFO work queues
Code & Video available at:
http://blogs.azulsystems.com/cliff/

| 44
©2003 Azul Systems, Inc.
#1 Platform for
Business Critical Java™

WWW.AZULSYSTEMS.COM
Thank You

Contenu connexe

Tendances

Atividades Sujeitos e Predicados
Atividades Sujeitos e PredicadosAtividades Sujeitos e Predicados
Atividades Sujeitos e Predicados
Geo Honório
 
Atividades O Sistema Solar
Atividades O Sistema SolarAtividades O Sistema Solar
Atividades O Sistema Solar
Doug Caesar
 
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
Jane Queirozj
 
Atividade do calendario do mês de maio 2013
Atividade do calendario do mês de maio 2013Atividade do calendario do mês de maio 2013
Atividade do calendario do mês de maio 2013
Claudinéia Pagnocelli
 
Avaliação bimestral de matemática 2º ano
Avaliação bimestral de matemática 2º anoAvaliação bimestral de matemática 2º ano
Avaliação bimestral de matemática 2º ano
Maria Terra
 
Classificar figuras e sólidos geométricos
Classificar figuras e sólidos geométricosClassificar figuras e sólidos geométricos
Classificar figuras e sólidos geométricos
Crescendo EAprendendo
 

Tendances (20)

Atividades Sujeitos e Predicados
Atividades Sujeitos e PredicadosAtividades Sujeitos e Predicados
Atividades Sujeitos e Predicados
 
Atividades O Sistema Solar
Atividades O Sistema SolarAtividades O Sistema Solar
Atividades O Sistema Solar
 
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
Atividadesdeinterpretaoeproduodetexto 4ano-130210074619-phpapp01 (1)
 
Atividades para 2 e 4 ano 2016
Atividades para 2 e 4 ano 2016Atividades para 2 e 4 ano 2016
Atividades para 2 e 4 ano 2016
 
Caça números - divisão
Caça números - divisão Caça números - divisão
Caça números - divisão
 
atividade de geografia.docx
atividade de geografia.docxatividade de geografia.docx
atividade de geografia.docx
 
Caça palavras - 4º Ano
Caça palavras - 4º AnoCaça palavras - 4º Ano
Caça palavras - 4º Ano
 
Atividade do calendario do mês de maio 2013
Atividade do calendario do mês de maio 2013Atividade do calendario do mês de maio 2013
Atividade do calendario do mês de maio 2013
 
Projeto generos textuais
Projeto generos textuaisProjeto generos textuais
Projeto generos textuais
 
Avaliação bimestral de matemática 2º ano
Avaliação bimestral de matemática 2º anoAvaliação bimestral de matemática 2º ano
Avaliação bimestral de matemática 2º ano
 
51828698 atividades-de-matematica-3º-ano
51828698 atividades-de-matematica-3º-ano51828698 atividades-de-matematica-3º-ano
51828698 atividades-de-matematica-3º-ano
 
Fusos hora´rios
Fusos hora´riosFusos hora´rios
Fusos hora´rios
 
Apostila 3 ATIVIDADES REMOTAS 3 ANO
Apostila 3 ATIVIDADES REMOTAS 3 ANOApostila 3 ATIVIDADES REMOTAS 3 ANO
Apostila 3 ATIVIDADES REMOTAS 3 ANO
 
Classificar figuras e sólidos geométricos
Classificar figuras e sólidos geométricosClassificar figuras e sólidos geométricos
Classificar figuras e sólidos geométricos
 
Tempo de cálculo 2 cristina
Tempo de cálculo 2   cristinaTempo de cálculo 2   cristina
Tempo de cálculo 2 cristina
 
Seguindo a trilha - Multiplicação e divisão.
Seguindo a trilha - Multiplicação e divisão.Seguindo a trilha - Multiplicação e divisão.
Seguindo a trilha - Multiplicação e divisão.
 
Animais pré históricos para colorir
Animais pré históricos para colorirAnimais pré históricos para colorir
Animais pré históricos para colorir
 
Teste de tabuada
Teste de tabuadaTeste de tabuada
Teste de tabuada
 
Atividades com sólidos geométricos 4º ano
Atividades com sólidos geométricos 4º anoAtividades com sólidos geométricos 4º ano
Atividades com sólidos geométricos 4º ano
 
Gabarito: Interpretação de texto: O patê de coelho – 3º ou 4º ano – Com respo...
Gabarito: Interpretação de texto: O patê de coelho – 3º ou 4º ano – Com respo...Gabarito: Interpretação de texto: O patê de coelho – 3º ou 4º ano – Com respo...
Gabarito: Interpretação de texto: O patê de coelho – 3º ou 4º ano – Com respo...
 

En vedette

En vedette (18)

Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systems
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a Microbenchmark
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
 
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
 
Zulu Embedded Java Introduction
Zulu Embedded Java IntroductionZulu Embedded Java Introduction
Zulu Embedded Java Introduction
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
C++vs java
C++vs javaC++vs java
C++vs java
 
Hashing
HashingHashing
Hashing
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 

Similaire à Lock-Free, Wait-Free Hash Table

February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
Afkham Azeez
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

Similaire à Lock-Free, Wait-Free Hash Table (20)

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
无锁编程
无锁编程无锁编程
无锁编程
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Cassandra
CassandraCassandra
Cassandra
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
DataStax: Dockerizing Cassandra on Modern Linux
DataStax: Dockerizing Cassandra on Modern LinuxDataStax: Dockerizing Cassandra on Modern Linux
DataStax: Dockerizing Cassandra on Modern Linux
 
Cassandra on Docker
Cassandra on DockerCassandra on Docker
Cassandra on Docker
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_features
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
NSBCon UK nservicebus on Azure by Yves Goeleven
NSBCon UK nservicebus on Azure by Yves GoelevenNSBCon UK nservicebus on Azure by Yves Goeleven
NSBCon UK nservicebus on Azure by Yves Goeleven
 
The Ethereum Geth Client
The Ethereum Geth ClientThe Ethereum Geth Client
The Ethereum Geth Client
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight Deployments
 

Plus de Azul Systems Inc.

Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
Azul Systems Inc.
 

Plus de Azul Systems Inc. (16)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
 
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and ToolsIntelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java Benchmarking
 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About Them
 
JVM Memory Management Details
JVM Memory Management DetailsJVM Memory Management Details
JVM Memory Management Details
 
JVM Mechanics: A Peek Under the Hood
JVM Mechanics: A Peek Under the HoodJVM Mechanics: A Peek Under the Hood
JVM Mechanics: A Peek Under the Hood
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About It
 
Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive Applications
 
Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Lock-Free, Wait-Free Hash Table

  • 1. 2007 JavaOne Conference A Lock-Free Hash Table A Lock-Free Hash Table Dr. Cliff Click Chief JVM Architect & Distinguished Engineer blogs.azulsystems.com/cliff Azul Systems May 8, 2007
  • 2. Hash Tables www.azulsystems.com • • • • Constant-Time Key-Value Mapping Fast arbitrary function Extendable, defined at runtime Used for symbol tables, DB caching, network access, url caching, web content, etc • Crucial for Large Business Applications ─ > 1MLOC • Used in Very heavily multi-threaded apps ─ > 1000 threads | 2 ©2003 Azul Systems, Inc.
  • 3. Popular Java Implementations www.azulsystems.com • Java's HashTable ─ Single threaded; scaling bottleneck • HashMap ─ Faster but NOT multi-thread safe • java.util.concurrent.HashMap ─ Striped internal locks; 16-way the default • Azul, IBM, Sun sell machines >100cpus • Azul has customers using all cpus in same app • Becomes a scaling bottleneck! | 3 ©2003 Azul Systems, Inc.
  • 4. A Lock-Free Hash Table www.azulsystems.com • No locks, even during table resize No spin-locks No blocking while holding locks All CAS spin-loops bounded Make progress even if other threads die.... • Requires atomic update instruction: CAS (Compare-And-Swap), LL/SC (Load-Linked/Store-Conditional, PPC only) or similar • Uses sun.misc.Unsafe for CAS ─ ─ ─ ─ | 4 ©2003 Azul Systems, Inc.
  • 5. A Faster Hash Table www.azulsystems.com • Slightly faster than j.u.c for 99% reads < 32 cpus • Faster with more cpus (2x faster) ─ Even with 4096-way striping ─ 10x faster with default striping • 3x Faster for 95% reads (30x vs default) • 8x Faster for 75% reads (100x vs default) • Scales well up to 768 cpus, 75% reads ─ Approaches hardware bandwidth limits | 5 ©2003 Azul Systems, Inc.
  • 6. Agenda www.azulsystems.com • • • • • • | Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A 6 ©2003 Azul Systems, Inc.
  • 7. Some “Uninteresting” Details www.azulsystems.com • Hashtable: A collection of Key/Value Pairs • Works with any collection • Scaling, locking, bottlenecks of the collection management responsibility of that collection • Must be fast or O(1) effects kill you • Must be cache-aware • I'll present a sample Java solution ─ But other solutions can work, make sense | 7 ©2003 Azul Systems, Inc.
  • 8. “Uninteresting” Details www.azulsystems.com • Closed Power-of-2 Hash Table ─ Reprobe on collision ─ Stride-1 reprobe: better cache behavior • Key & Value on same cache line • Hash memoized ─ Should be same cache line as K + V ─ But hard to do in pure Java • No allocation on get() or put() • Auto-Resize | 8 ©2003 Azul Systems, Inc.
  • 9. Example get() code www.azulsystems.com idx = hash = key.hashCode(); while( true ) { // reprobing loop idx &= (size-1); // limit idx to table size k = get_key(idx); // start cache miss early h = get_hash(idx); // memoized hash if( k == key || (h == hash && key.equals(k)) ) return get_val(idx);// return matching value if( k == null ) return null; idx++; } | 9 ©2003 Azul Systems, Inc. // reprobe
  • 10. “Uninteresting” Details www.azulsystems.com • Could use prime table + MOD ─ Better hash spread, fewer reprobes ─ But MOD is 30x slower than AND • Could use open table put() requires allocation Follow 'next' pointer instead of reprobe Each 'next' is a cache miss Lousy hash -> linked-list traversal • Could put Key/Value/Hash on same cache line • Other variants possible, interesting ─ ─ ─ ─ | 10 ©2003 Azul Systems, Inc.
  • 11. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 11 ©2003 Azul Systems, Inc.
  • 12. Ordering and Correctness www.azulsystems.com • How to show table mods correct? ─ put, putIfAbsent, change, delete, etc. • Prove via: fencing, memory model, load/store • • • • ordering, “happens-before”? Instead prove* via state machine Define all possible {Key,Value} states Define Transitions, State Machine Show all states “legal” *Warning: hand-wavy proof follows | 12 ©2003 Azul Systems, Inc.
  • 13. State-Based Reasoning www.azulsystems.com • Define all {Key,Value} states and transitions • Don't Care about memory ordering: ─ get() can read Key, Value in any order ─ put() can change Key, Value in any order ─ put() must use CAS to change Key or Value ─ But not double-CAS • No fencing required for correctness! ─ (sometimes stronger guarantees are wanted and will need fencing) • Proof is simple! | 13 ©2003 Azul Systems, Inc.
  • 14. Valid States www.azulsystems.com • A Key slot is: ─ null – empty ─ K – some Key; can never change again • A Value slot is: ─ null – empty ─ T – tombstone ─ V – some Values • A state is a {Key,Value} pair • A transition is a successful CAS | 14 ©2003 Azul Systems, Inc.
  • 15. State Machine www.azulsystems.com {null,null} {K,T} Empty deleted key insert delete insert {K,null} Partially inserted K/V pair {null,T/V} Partially inserted K/V pair Reader-only state | 15 ©2003 Azul Systems, Inc. change {K,V} Standard K/V pair
  • 16. Example put(key,newval) code: www.azulsystems.com idx = hash = key.hashCode(); while( true ) { // Key-Claim stanza idx &= (size-1); k = get_key(idx); // State: {k,?} if( k == null && // {null,?} -> {key,?} CAS_key(idx,null,key) ) break; h = get_hash(idx); // State: {key,?} // get memoized hash if( k == key || (h == hash && key.equals(k)) ) break; idx++; } | 16 ©2003 Azul Systems, Inc. // State: {key,?} // reprobe
  • 17. Example put(key,newval) code www.azulsystems.com // State: {key,?} oldval = get_val(idx); // State: {key,oldval} // Transition: {key,oldval} -> {key,newval} if( CAS_val(idx,oldval,newval) ) { // Transition worked ... // Adjust size } else { // Transition failed; oldval has changed // We can act “as if” our put() worked but // was immediately stomped over } return oldval; | 17 ©2003 Azul Systems, Inc.
  • 18. Some Things to Notice www.azulsystems.com • Once a Key is set, it never changes ─ No chance of returning Value for wrong Key ─ Means Keys leak; table fills up with dead Keys ─ Fix in a few slides... • No ordering guarantees provided! ─ Bring Your Own Ordering/Synchronization • Weird {null,V} state meaningful but uninteresting ─ Means reader got an empty key and so missed ─ But possibly prefetched wrong Value | 18 ©2003 Azul Systems, Inc.
  • 19. Some Things to Notice www.azulsystems.com • There is no machine-wide coherent State! • Nobody guaranteed to read the same State ─ Except on the same CPU with no other writers • No need for it either • Consider degenerate case of a single Key • Same guarantees as: ─ single shared global variable ─ many readers & writers, no synchronization ─ i.e., darned little | 19 ©2003 Azul Systems, Inc.
  • 20. A Slightly Stronger Guarantee www.azulsystems.com • Probably want “happens-before” on Values ─ java.util.concurrent provides this • Similar to declaring that shared global 'volatile' • Things written into a Value before put() ─ Are guaranteed to be seen after a get() • Requires st/st fence before CAS'ing Value ─ “free” on Sparc, X86 • Requires ld/ld fence after loading Value ─ “free” on Azul | 20 ©2003 Azul Systems, Inc.
  • 21. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 21 ©2003 Azul Systems, Inc.
  • 22. Resizing The Table www.azulsystems.com • Need to resize if table gets full • Or just re-probing too often • Resize copies live K/V pairs ─ Doubles as cleanup of dead Keys ─ Resize (“cleanse”) after any delete ─ Throttled, once per GC cycle is plenty often • Alas, need fencing, 'happens before' • Hard bit for concurrent resize & put(): ─ Must not drop the last update to old table | 22 ©2003 Azul Systems, Inc.
  • 23. Resizing www.azulsystems.com • Expand State Machine • Side-effect: mid-resize is a valid State • Means resize is: ─ Concurrent – readers can help, or just read&go ─ Parallel – all can help ─ Incremental – partial copy is OK • Pay an extra indirection while resize in progress ─ So want to finish the job eventually • Stacked partial resizes OK, expected | 23 ©2003 Azul Systems, Inc.
  • 24. get/put during Resize www.azulsystems.com • get() works on the old table ─ Unless see a sentinel • put() or other mod must use new table • Must check for new table every time ─ Late writes to old table 'happens before' resize • Copying K/V pairs is independent of get/put • Copy has many heuristics to choose from: ─ All touching threads, only writers, unrelated background thread(s), etc | 24 ©2003 Azul Systems, Inc.
  • 25. New State: 'use new table' Sentinel www.azulsystems.com • X: sentinel used during table-copy ─ Means: not in old table, check new • A Key slot is: ─ null, K ─ X – 'use new table', not any valid Key ─ null → K OR null → X • A Value slot is: ─ null, T, V ─ X – 'use new table', not any valid Value ─ null → {T,V}* → X | 25 ©2003 Azul Systems, Inc.
  • 26. State Machine – old table www.azulsystems.com {X,null} kill {null,null} {K,T} Empty insert check newer table Deleted key kill delete insert {K,null} {K,X} copy {K,V} into newer table {null,T/V/X} Partially inserted K/V pair | 26 ©2003 Azul Systems, Inc. change {K,V} Standard K/V pair States {X,T/V/X} not possible
  • 27. State Machine: Copy One Pair www.azulsystems.com empty {null,null} | 27 ©2003 Azul Systems, Inc. {X,null}
  • 28. State Machine: Copy One Pair www.azulsystems.com empty {null,null} {X,null} dead or partially inserted {K,T/null} | 28 ©2003 Azul Systems, Inc. {K,X}
  • 29. State Machine: Copy One Pair www.azulsystems.com empty {null,null} {X,null} dead or partially inserted {K,T/null} {K,X} alive, but old {K,V1} {K,X} old table new table {K,V2} | 29 ©2003 Azul Systems, Inc. {K,V2}
  • 30. Copying Old To New www.azulsystems.com • New States V', T' – primed versions of V,T ─ Prime'd values in new table copied from old ─ Non-prime in new table is recent put() ─ “happens after” any prime'd value ─ Prime allows 2-phase commit ─ Engineering: wrapper class (Java), steal bit (C) • Must be sure to copy late-arriving old-table write • Attempt to copy atomically ─ May fail & copy does not make progress ─ But old, new tables not damaged | 30 ©2003 Azul Systems, Inc.
  • 31. New States: Prime'd www.azulsystems.com • A Key slot is: ─ null, K, X • A Value slot is: ─ null, T, V, X ─ T',V' – primed versions of T & V ─ Old things copied into the new table ─ “2-phase commit” ─ null → {T',V'}* → {T,V}* → X • State Machine again... | 31 ©2003 Azul Systems, Inc.
  • 32. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 32 ©2003 Azul Systems, Inc.
  • 33. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 33 ©2003 Azul Systems, Inc.
  • 34. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 34 ©2003 Azul Systems, Inc.
  • 35. State Machine: Copy One Pair www.azulsystems.com {K,V1} read V1 K,V' in new table X in old table {K,X} old new {K,V'x} read V'x {K,V'1} partial copy Fence Fence | 35 ©2003 Azul Systems, Inc. {K,V'1} {K,V1} copy complete
  • 36. Some Things to Notice www.azulsystems.com • Old value could be V or T ─ or V' or T' (if nested resize in progress) • Skip copy if new Value is not prime'd ─ Means recent put() overwrote any old Value • If CAS into new fails ─ Means either put() or other copy in progress ─ So this copy can quit • Any thread can see any state at any time ─ And CAS to the next state | 36 ©2003 Azul Systems, Inc.
  • 37. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 37 ©2003 Azul Systems, Inc.
  • 38. Microbenchmark www.azulsystems.com • Measure insert/lookup/remove of Strings • Tight loop: no work beyond HashTable itself and test harness (mostly RNG) • “Guaranteed not to exceed” numbers • All fences; full ConcurrentHashMap semantics • Variables: ─ 99% get, 1% put (typical cache) vs 75 / 25 ─ Dual Athalon, Niagara, Azul Vega1, Vega2 ─ Threads from 1 to 800 ─ NonBlocking vs 4096-way ConcurrentHashMap ─ 1K entry table vs 1M entry table | 38 ©2003 Azul Systems, Inc.
  • 39. AMD 2.4GHz – 2 (ht) cpus www.azulsystems.com 1K Table 1M Table 30 30 NB-99 25 CHM-99 20 M-ops/sec 20 M-ops/sec 25 NB-75 15 CHM-75 10 15 10 5 0 0 NB 5 1 2 3 4 5 Threads | 39 ©2003 Azul Systems, Inc. 6 7 8 CHM 1 2 3 4 5 Threads 6 7 8
  • 40. Niagara – 8x4 cpus www.azulsystems.com 1K Table 1M Table 80 70 70 60 60 CHM-99, NB-99 50 M-ops/sec M-ops/sec 80 40 30 CHM-75, NB-75 50 40 NB 30 20 20 10 10 0 CHM 0 0 8 16 24 32 40 Threads | 40 ©2003 Azul Systems, Inc. 48 56 64 0 8 16 24 32 Threads 40 48 56 64
  • 41. Azul Vega1 – 384 cpus www.azulsystems.com 1K Table 1M Table 500 500 NB-99 400 300 300 M-ops/sec M-ops/sec 400 CHM-99 200 200 NB NB-75 100 100 CHM-75 0 0 100 200 Threads | 41 ©2003 Azul Systems, Inc. 300 400 CHM 0 0 100 200 Threads 300 400
  • 42. Azul Vega2 – 768 cpus www.azulsystems.com 1K Table 1M Table 1200 1200 NB-99 1000 1000 800 CHM-99 600 M-ops/sec M-ops/sec 800 600 NB 400 400 NB-75 200 200 CHM-75 0 CHM 0 0 100 200 300 400 500 600 700 800 Threads | 42 ©2003 Azul Systems, Inc. 0 100 200 300 400 500 Threads 600 700 800
  • 43. Summary www.azulsystems.com • • • • ● ● ● A faster lock-free HashTable Faster for more CPUs Much faster for higher table modification rate State-Based Reasoning: ─ No ordering, no JMM, no fencing Any thread can see any state at any time ● Must assume values change at each step State graphs really helped coding & debugging Resulting code is small & fast | 43 ©2003 Azul Systems, Inc.
  • 44. Summary www.azulsystems.com ● ● ● Obvious future work: ● Tools to check states ● Tools to write code Seems applicable to other data structures as well ● Concurrent append j.u.Vector ● Scalable near-FIFO work queues Code & Video available at: http://blogs.azulsystems.com/cliff/ | 44 ©2003 Azul Systems, Inc.
  • 45. #1 Platform for Business Critical Java™ WWW.AZULSYSTEMS.COM Thank You