SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
The theme is building a data structure that is used as a pool, making it scalable and usable for high loads, and not less usable than existing implementations for low loads.
What is a pool? A collection of items, which my be objects or tasks. Resource pool – objects that are used and then returned to the pool, Pool of jobs to perform, etc… The pool is approached by Producers and Consumers, that perform Put/Get (Push/Pop, Enqueue/Dequeue) actions. These actions can implement different semantics, be blocking/non-blocking, depends on how the pool was defined (Explanation of blocking on blocking)
The data structure we present is called ED-Tree and this is a highly scalable pool to, to be used in multithreaded application. We reach high performance and scalability by combining two paradigms: Elimination and diffraction The Ed-Tree is implemented in Java
If we look in Java JDK for data structures that can be used as pool, we will find the following…
All the mentioned data structures are problematic…. They are based on centralized structures… the head or tail of queue/stack becomes a hot spot and in case large number of threads performance becomes worse, instead of improving
If we think about it, we don’t care about the order in which the items are inserted/removed from the pool. All we want is to avoid starvation (if item is inserted to the pool, eventually it will be removed). Therefore we can avoid using centralized structure and distribute the pool in memory.
A single level of an elimination array was also used in implementing shared concurrent stacks. However, elimination trees and diffracting trees were never used to implement real world structures. This is mostly due the fact that there was no need for them: machines with a sufficient level of concurrency and low enough interconnect latency to benefit from them did not exist. Today, multi-core machines present the necessary combination of high levels of parallelism and low interconnection costs. Indeed, this paper is the first to show that that ED-Tree based implementations of data structures from the java.util.concurrent scale impressively on a real machine (a Sun Maramba multicore machine with 2x8 cores and 128 hardware threads), delivering throughput that is at high concurrency levels 10 times that of the new proposed JDK6.0 algorithms.
A balancer is usually implemented as a toggle bit: a bit that holds a binary value. Each thread change the value to the opposite one and picks a direction to exit, according to the bit value. For example 0 – go left, 1 – go right.
The diffraction tree constructed from a set of balancers…. You can say that the tree counts the elements, i.e. distributes them equally across the leafs…
If we connect a lock free queue/stack to each leaf and use two toggle bits in each balancer, we get a data structure which obeys a pool semantics…
We can see that we just moved our contention source from a single queue/stack to the balancers, starting from the entrance to the tree
The problem is solved by diffraction… what we get eventually is that each thread that approaches the pool, traverses the whole tree and eventually reaches one of the queues at the leafs.
Actually, if at some point during the tree traversal a producer and consumer threads meet each other, they don’t have to continue traversing the tree. The consumer can take the producers value, and they both can leave the tree.
In high loads, according to our statistics 50% of the threads are successfully eliminated on each level. I.e. if we use 3-level tree, 50% are eliminated at the first level, another 25% on the second, and 12.5% on the third, meaning, only about 10% of the requests survive till reaching the leaves.
We also use two toggle bits at each balancer – one for producers and one for consumers, to assure fair distribution
In the described implementation, another problem we can encounter is starvation…
Each balancer is composed from an EliminationArray, a pair of toggle bits, and two references one to each of its child nodes.
The implementation of an eliminationArray is based on an array of Exchangers. Each exchanger contains a single AtomicReference which is used as an Atomic placeholder for exchanging ExchangerPackage, where the ExchangerPackage is an object used to wrap the actual data and to mark its state and type.
At its peak at 64 threads the ED-Tree delivers more than 10 times the performance of the JDK. Beyond 64 threads the threads are no longer bound to a single CPU, and traffic across the interconnect causes a moderate performance decline for the ED-Tree version (the performance of the JDK is already very low).
Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees
Building Scalable Producer-Consumer
Pools based on
Yehuda Afek and Guy Korland and Maria
Natanzon and Nir Shavit
Producer-consumer pools, that is, collections of
unordered objects or tasks, are a fundamental
element of modern multiprocessor software and a
target of extensive research and development
We present the ED-Tree, a distributed pool
structure based on a combination of the
elimination-tree and diffracting-tree
paradigms, allowing high degrees of
parallelism with reduced contention
(Lea, Scott, and Shearer)
up function without buffering. Producers and consumers wait for
- Producers put their value and
leave, Consumers wait for a value to become available.
- Producers put their value
and leave, Consumers return null if the pool is empty.
All these structures are based on a centralized
structures like a lock-free queue or a stack,
and thus are limited in their scalability: the
head of the stack or queue is a sequential
bottleneck and source of contention.
pool does not have to obey neither LIFO or
Therefore, no centralized structure needed,
to hold the items and to serve producers and
ED-Tree: a combined variant of
the diffracting-tree structure (Shavit and Zemach) and
the elimination-tree structure (Shavit and Touitou)
The basic idea:
Use randomization to distribute the concurrent
requests of threads onto many locations so that they
collide with one another and can exchange values,
thus avoiding using a central place through which all
A pool that allows both parallelism and reduced
A little history
diffraction and elimination were
presented years ago, and claimed to be
effective through simulation
However, elimination trees and diffracting
trees were never used to implement real
Elimination and diffraction were never
combined in a single data structure
A binary tree of objects called balancers [Aspnes-Herlihy-Shavit] with
a single input wire and two output wires
Threads arrive at a balancer and it repeatedly sends them left and right,
so its top wire always has maximum one more than the bottom one.
In any quiescent state (when there are no threads in the tree), the tree
preserves the step property: the output items are balanced out so that the
top leaves outputted at most one more element than the bottom ones, and
there are no gaps.
Connect each output wire to a lock free queue
To perform a push, threads traverse the balancers from the root to the leaves and
then push the item onto the appropriate queue.
To perform a pop, threads traverse the balancers from the root to the leaves and
then pop from the appropriate queue/block if the queue is empty.
Each toggle bit is a hot spot
If an even number of threads pass through a balancer, the
outputs are evenly balanced on the top and bottom wires, but
the balancer's state remains unchanged
Add a diffraction array in front of each toggle bit
any point while traversing the tree, if
producer and consumer collide, there is no
need for them to diffract and continue
traversing the tree
can hand out his item to the
consumer, and both can leave the tree.
Using elimination-diffraction balancers
Let the array at balancer each be
a diffraction-elimination array:
If two producer (two consumer) threads meet in the
array, they leave on opposite wires, without a need to
touch the bit, as anyhow it would remain in its original
If producer and consumer meet, they eliminate,
If a producer or consumer call does not manage to
meet another in the array, it toggles the respective bit of
the balancer and moves on.
What about low concurrency
show that elimination and diffraction
techniques can be combined to work well at
both high and low loads
To insure good performance in low loads we use
several techniques, making the algorithm adapt
to the current contention level.
Use backoff in space:
Randomly choose a cell in a certain range of the array
If the cell is busy (already occupied by two threads), increase the range and
Else Spin and wait to collision
If timed out (no collision)
Decrease the range and repeat
If certain amount of timeouts reached, spin on the first cell of the array for a
period, and then move on to the toggle bit and the next level.
If certain amount of timeouts was reached, don’t try to diffract on any of the
next levels, just go straight to the toggle bit
Each thread remembers the last range it used at the current balancer and next
time starts from this range
that failed to eliminate and propagated
all the way to the leaves can wait for a long time
for their requests to complete, while new threads
entering the tree and eliminating finish faster.
avoid starvation we limit the time a thread
can be blocked in the queues before it retries
the whole traversal again.
balancer is composed from
an elimination array, a pair of toggle bits, and
two references one to each of its child nodes.
public class Balancer
ToggleBit producerToggle, consumerToggle;
Balancer leftChild , rightChild;
public class Exchanger
public class ExchangerPackage
State state ; // WAITING/ELIMINATION/DIFFRACTION,
Type type; // PRODUCER/CONSUMER
Starting from the root of the tree:
Choose a cell in the array and try to collide with another thread,
using backoff mechanism described earlier.
If collision with another thread occurred
If both threads are of the same type, leave to the next level balancer
(each to separate direction)
If threads are of different type, exchange values and leave
Else (no collision) use appropriate toggle bit and move to next
If one of the leaves reached, go to the appropriate queue and
Insert/Remove an item according to the thread type
Sun UltraSPARC T2 Plus multi-core machine.
2 processors, each with 8 cores
each core with 8 hardware threads
64 way parallelism on a processor and 128 way
parallelism across the machine.
Most of the tests were done on one processor. i.e.
max 64 hardware threads
A tree with 3 levels and 8 queues
The queues are
according to the pool specification
Synchronous stack of Lea et. Al vs ED synchronous pool
Linked blocking queue vs ED blocking pool
Concurrent linked queue vs ED non blocking pool
Adding a delay between accesses
to the pool
32 consumers, 32 producers
Changing percentage of Consumers vs. total