Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

We present the ED-Tree, a distributed pool structure based on a combination of the elimination-tree and diffracting-tree paradigms, allowing high degrees of parallelism with reduced contention

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

  1. 1. Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees Yehuda Afek and Guy Korland and Maria Natanzon and Nir Shavit
  2. 2. The Pool Producer-consumer pools, that is, collections of unordered objects or tasks, are a fundamental element of modern multiprocessor software and a target of extensive research and development Get( ) P1 Put(x) . . P2 C1 . . C2 Put(y) Get( ) Pn Put(z) Get( ) pool Cn
  3. 3. ED-Tree Pool We present the ED-Tree, a distributed pool structure based on a combination of the elimination-tree and diffracting-tree paradigms, allowing high degrees of parallelism with reduced contention
  4. 4. Java JDK6.0:  SynchronousQueue/Stack (Lea, Scott, and Shearer) - pairing up function without buffering. Producers and consumers wait for one another  LinkedBlockingQueue - Producers put their value and leave, Consumers wait for a value to become available.  ConcurrentLinkedQueue - Producers put their value and leave, Consumers return null if the pool is empty.
  5. 5. Drawback All these structures are based on a centralized structures like a lock-free queue or a stack, and thus are limited in their scalability: the head of the stack or queue is a sequential bottleneck and source of contention.
  6. 6. Some Observations A pool does not have to obey neither LIFO or FIFO semantics.  Therefore, no centralized structure needed, to hold the items and to serve producers and consumers requests.
  7. 7. New approach ED-Tree: a combined variant of the diffracting-tree structure (Shavit and Zemach) and the elimination-tree structure (Shavit and Touitou) The basic idea:  Use randomization to distribute the concurrent requests of threads onto many locations so that they collide with one another and can exchange values, thus avoiding using a central place through which all threads pass. The result:  A pool that allows both parallelism and reduced contention.
  8. 8. A little history  Both diffraction and elimination were presented years ago, and claimed to be effective through simulation  However, elimination trees and diffracting trees were never used to implement real world structures  Elimination and diffraction were never combined in a single data structure
  9. 9. Diffraction trees A binary tree of objects called balancers [Aspnes-Herlihy-Shavit] with a single input wire and two output wires 5 4 3 2 1 b 1 3 2 5 4 Threads arrive at a balancer and it repeatedly sends them left and right, so its top wire always has maximum one more than the bottom one.
  10. 10. Diffraction trees 1 [Shavit-Zemach] b b 10 9 8 7 6 5 4 3 2 1 b 9 2 10 3 4 b b b 5 6 7 b 8 In any quiescent state (when there are no threads in the tree), the tree preserves the step property: the output items are balanced out so that the top leaves outputted at most one more element than the bottom ones, and there are no gaps.
  11. 11. Diffraction trees Connect each output wire to a lock free queue b b b b b b b To perform a push, threads traverse the balancers from the root to the leaves and then push the item onto the appropriate queue. To perform a pop, threads traverse the balancers from the root to the leaves and then pop from the appropriate queue/block if the queue is empty.
  12. 12. Diffraction trees Problem: Each toggle bit is a hot spot 1 1 b 0/1 1 b 0/1 3 3 2 1 b 0/1 0/1 0/1 2 2 b 0/1 b 0/1 b 0/1 2 3
  13. 13. Diffraction trees Observation: If an even number of threads pass through a balancer, the outputs are evenly balanced on the top and bottom wires, but the balancer's state remains unchanged The approach: Add a diffraction array in front of each toggle bit 0/1 Prism Array toggle bit
  14. 14. Elimination  At any point while traversing the tree, if producer and consumer collide, there is no need for them to diffract and continue traversing the tree  Producer can hand out his item to the consumer, and both can leave the tree.
  15. 15. Adding elimination x Get( ) 1 2 . . : : k Put(x) ok 0/1 0/1
  16. 16. Using elimination-diffraction balancers Let the array at balancer each be a diffraction-elimination array:  If two producer (two consumer) threads meet in the array, they leave on opposite wires, without a need to touch the bit, as anyhow it would remain in its original state.  If producer and consumer meet, they eliminate, exchanging items.  If a producer or consumer call does not manage to meet another in the array, it toggles the respective bit of the balancer and moves on.
  17. 17. ED-tree
  18. 18. What about low concurrency levels?  We show that elimination and diffraction techniques can be combined to work well at both high and low loads  To insure good performance in low loads we use several techniques, making the algorithm adapt to the current contention level.
  19. 19. Adaptation mechanisms  Use backoff in space:  Randomly choose a cell in a certain range of the array  If the cell is busy (already occupied by two threads), increase the range and repeat.  Else Spin and wait to collision  If timed out (no collision)  Decrease the range and repeat  If certain amount of timeouts reached, spin on the first cell of the array for a period, and then move on to the toggle bit and the next level.  If certain amount of timeouts was reached, don’t try to diffract on any of the next levels, just go straight to the toggle bit  Each thread remembers the last range it used at the current balancer and next time starts from this range
  20. 20. Starvation avoidance  Threads that failed to eliminate and propagated all the way to the leaves can wait for a long time for their requests to complete, while new threads entering the tree and eliminating finish faster.  To avoid starvation we limit the time a thread can be blocked in the queues before it retries the whole traversal again.
  21. 21. Implementation  Each balancer is composed from an elimination array, a pair of toggle bits, and two references one to each of its child nodes. public class Balancer { ToggleBit producerToggle, consumerToggle; Exchanger[] eliminationArray; Balancer leftChild , rightChild; ThreadLocal<Integer> lastSlotRange; }
  22. 22. Implementation public class Exchanger { AtomicReference<ExchangerPackage> slot; } public class ExchangerPackage { Object value; State state ; // WAITING/ELIMINATION/DIFFRACTION, Type type; // PRODUCER/CONSUMER }
  23. 23. Implementation  Starting from the root of the tree:  Enter balancer  Choose a cell in the array and try to collide with another thread, using backoff mechanism described earlier.  If collision with another thread occurred     If both threads are of the same type, leave to the next level balancer (each to separate direction) If threads are of different type, exchange values and leave Else (no collision) use appropriate toggle bit and move to next level If one of the leaves reached, go to the appropriate queue and Insert/Remove an item according to the thread type
  24. 24. Performance evaluation Sun UltraSPARC T2 Plus multi-core machine.  2 processors, each with 8 cores  each core with 8 hardware threads  64 way parallelism on a processor and 128 way parallelism across the machine.  Most of the tests were done on one processor. i.e. max 64 hardware threads
  25. 25. Performance evaluation   A tree with 3 levels and 8 queues The queues are SynchronousBlocking/LinkedBlocking/ConcurrentLinked, according to the pool specification b b b b b b b
  26. 26. Performance evaluation Synchronous stack of Lea et. Al vs ED synchronous pool
  27. 27. Performance evaluation Linked blocking queue vs ED blocking pool
  28. 28. Performance evaluation Concurrent linked queue vs ED non blocking pool
  29. 29. Adding a delay between accesses to the pool 32 consumers, 32 producers
  30. 30. Changing percentage of Consumers vs. total threads number 64 threads
  31. 31. 25% Producers 75%Consumers
  32. 32. Elimination rate
  33. 33. Elimination range