Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Deuce STM - CMP'09

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Java Concurrency Idioms
Java Concurrency Idioms
Chargement dans…3
×

Consultez-les par la suite

1 sur 45 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Deuce STM - CMP'09 (20)

Publicité

Plus par Guy Korland (15)

Plus récents (20)

Publicité

Deuce STM - CMP'09

  1. 1. Noninvasive Java Concurrency with Deuce STM 1.0 Guy Korland “ Multi Core Tools” CMP09
  2. 2. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  3. 3. Motivation
  4. 4. Problem I <ul><ul><li>Process 1 Process 2 </li></ul></ul><ul><ul><li>a = acc.get()    </li></ul></ul><ul><ul><li>a = a + 100 b = acc.get()  </li></ul></ul><ul><ul><li>b = b + 50  </li></ul></ul><ul><ul><li>acc.set(b) </li></ul></ul><ul><ul><li>acc.set(a) </li></ul></ul><ul><ul><li> ... Lost Update! ...   </li></ul></ul>
  5. 5. Problem II <ul><ul><li>Process 1 Process2 </li></ul></ul><ul><ul><li>lock(A) lock(B) </li></ul></ul><ul><ul><li>lock(B) lock(A) </li></ul></ul><ul><ul><li> ... Deadlock! ... </li></ul></ul>
  6. 6. <ul><ul><li>Cannot exploit cheap threads </li></ul></ul><ul><ul><li>Today’s Software </li></ul></ul><ul><ul><ul><li>Non-scalable methodologies </li></ul></ul></ul><ul><ul><li>Today’s Hardware </li></ul></ul><ul><ul><ul><li>Poor support for scalable synchronization. </li></ul></ul></ul><ul><ul><ul><li>Low level support CAS, TAS, MemBar… </li></ul></ul></ul>The Problem
  7. 7. The Problem
  8. 8. Why Locking Doesn’t Scale? <ul><ul><li>Not Robust </li></ul></ul><ul><ul><li>Relies on conventions </li></ul></ul><ul><ul><li>Hard to Use </li></ul></ul><ul><ul><ul><li>Conservative </li></ul></ul></ul><ul><ul><ul><li>Deadlocks </li></ul></ul></ul><ul><ul><ul><li>Lost wake-ups </li></ul></ul></ul><ul><ul><li>Not Composable </li></ul></ul>
  9. 9. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  10. 10. Solutions I – Domain specific <ul><li>Mathlab – Concurrency behind the scenes. </li></ul><ul><li>SQL/XQuery/XPath – DB will handle it… </li></ul><ul><li>HTML, ASP, PHP, JSP … – (almost) stateless. </li></ul><ul><li>Fortress[Sun], X10[IBM], Chapel[UW] … – implicit concurrency. </li></ul>Remember Cobol! Domain too specific
  11. 11. Solutions II – Actor Model (Share nothing model) <ul><li>Carl Hewitt, Peter Bishop and Richard, </li></ul><ul><li>A Universal Modular Actor Formalism for Artificial Intelligence [IJCAI 1973]. </li></ul><ul><li>An actor, on message: </li></ul><ul><ul><li>no shared data </li></ul></ul><ul><ul><li>send messages to other actors </li></ul></ul><ul><ul><li>create new actors </li></ul></ul><ul><li>Where can we find it? </li></ul><ul><ul><li>Simula, Smalltalk, Scala, Haskell, F#, Erlang ... </li></ul></ul>Functional languges
  12. 12. Solutions II – Actor Model (Share nothing model) - module (counter). - export ([run/0, counter/1]).     run() ->     S = spawn (counter, counter, [0]),     send_msgs (S, 100000),     S.   counter(Sum) ->     receive         {inc, Amount} -> counter(Sum+Amount)     end. send_msgs (_, 0) -> true; send_msgs (S, Count) ->     S ! {inc, 1}, send_msgs (S, Count-1).   Actors in Erlang <ul><li>Is it really easier? </li></ul><ul><li>What about performance? </li></ul><ul><li>Will functional languages ever be functional? </li></ul><ul><li>Java/.NET/C++ rules! !! (maybe Ruby) </li></ul>
  13. 13. Solutions III – STM Nir Shavit, DAN TOUITOU, Software Transactional Memory [PODC95] synchronized { <instructions> } atomic { <instructions> } l.lock(); <instructions> l.unlock();
  14. 14. What is a transaction? <ul><ul><li>A tomicity – all or nothing </li></ul></ul><ul><ul><li>C onsistency – consistent state (after & before) </li></ul></ul><ul><ul><li>I solation – Other can’t see intermediate. </li></ul></ul><ul><ul><li>D urability - persistent </li></ul></ul>Or maybe we do want it?
  15. 15. The Brief History of STM 1993 STM (Shavit,Touitou) 2003 DSTM (Herlihy et al) 2003 WSTM (Fraser, Harris) 2003 OSTM (Fraser, Harris) 2004 ASTM (Marathe et al) 2004 T-Monitor (Jagannathan … ) 2005 Lock-OSTM (Ennals) 2004 HybridTM (Moir) 2004 Meta Trans (Herlihy, Shavit) 2005 McTM (Saha et al) 2006 AtomJava (Hindman…) 1997 Trans Support TM (Moir) 2005 TL (Dice, Shavit)) 2004 Soft Trans (Ananian, Rinard) 2006 LSA (Riegel et al 2006 TL2 (Dice, Shavit, Shalev) 2009 Deuce (Korland et al) 2008 Rock (Sun) 2006 DSTM2 (Herlihy, Luchangco) 2007 Tanger
  16. 16. DSTM2 Maurice Herlihy et al, A flexible framework … [OOPSLA06] <ul><ul><li>@atomic public interface INode{ </li></ul></ul><ul><ul><li>int getValue (); </li></ul></ul><ul><ul><li>void setValue ( int value ); </li></ul></ul><ul><ul><li>INode getNext (); </li></ul></ul><ul><ul><li>void setNext (INode value ); </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><ul><li>Factory < INode > factory = Thread.makeFactory(INode. class ); </li></ul></ul><ul><ul><li>result = Thread.doIt( new Callable < Boolean > () { </li></ul></ul><ul><ul><li> public Boolean call () { </li></ul></ul><ul><ul><li> return intSet.insert (value); </li></ul></ul><ul><ul><li> } </li></ul></ul><ul><ul><li> }); </li></ul></ul><ul><li>Limited to Objects. </li></ul><ul><li>V ery intrusive. </li></ul><ul><li>Doesn’t support libraries. </li></ul><ul><li>Bad performance (fork). </li></ul>
  17. 17. JVSTM João Cachopo and António Rito-Silva, Versioned boxes as the basis for memory transactions [SCOOL05] <ul><ul><li>public class Account{ </li></ul></ul><ul><ul><li>private VBox <Long> balance = new VBox <Long>(); </li></ul></ul><ul><ul><li>public @Atomic void withdraw( long amount) { balance.put (balance.get() - amount); </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><li>Doesn’t support libraries. </li></ul><ul><li>Less intrusive. </li></ul><ul><li>Need to “Announce” shared fields </li></ul>
  18. 18. Atom-Java B. Hindman and D. Grossman. Atomicity via source-tosource translation. [MSPC06] public void update ( double value){ Atomic{ commission += value; } } <ul><li>Add a reserved word. </li></ul><ul><li>Need precompilation. </li></ul><ul><li>Doesn’t support libraries. </li></ul><ul><li>Even Less intrusive. </li></ul>
  19. 19. Multiverse Peter Veentjer, 2009 @TmEntity public class Stack<E>{ private Node<E> head; public void push(E item) {     head = new Node(item, head); } } @TmEntity   public static class Node<E> {         final E value;         final Node parent;         Node(E value, Node prev) {             this .value = value;             this .parent = prev;         }     } <ul><li>Doesn’t support libraries. </li></ul><ul><li>Limited to Objects. </li></ul>
  20. 20. DATM-J Hany E. Ramadan et al., Dependence-aware transactional memory [MICRO08] Transaction tx = new Transaction ( id) ; boolean done = false; while ( !done) { try{ tx.BeginTransaction( ) ; / / txnl code done = tx.CommitTransaction ( ) ; } catch( AbortException e ) { tx.AbortTransaction( ) ; done = false; } } <ul><li>Explicit transaction. </li></ul><ul><li>Explicit retry. </li></ul>
  21. 21. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  22. 22. Deuce STM <ul><ul><li>Java STM framework </li></ul></ul><ul><ul><ul><li>@Atomic methods </li></ul></ul></ul><ul><ul><ul><li>Field based access </li></ul></ul></ul><ul><ul><ul><ul><li>More scalable than Object bases. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>More efficient than word based. </li></ul></ul></ul></ul><ul><ul><ul><li>Supports external libraries </li></ul></ul></ul><ul><ul><ul><ul><li>Can be part of a transaction </li></ul></ul></ul></ul><ul><ul><ul><li>No reserved words </li></ul></ul></ul><ul><ul><ul><ul><li>No need for new compilers (Existing IDEs can be used) </li></ul></ul></ul></ul><ul><ul><li>Research tool </li></ul></ul><ul><ul><ul><li>API for developing and testing new algorithms. </li></ul></ul></ul>
  23. 23. Deuce - API public class Bank{ final private static double MAXIMUM_TRANSACTION = 1000; private double commission = 0; @Atomic (retries=64) public void transaction( Account ac1, Account ac2, double amount){ ac1. balance -= (amount + commission ); ac2. balance += amount; } @Atomic public void update( double value){ commission += value; } }
  24. 24. Deuce - Overview
  25. 25. Deuce - Running <ul><ul><li>– javaagent:deuceAgent.jar </li></ul></ul><ul><ul><ul><li>Dynamic bytecode manipulation. </li></ul></ul></ul><ul><ul><li>-Xbootclasspath/p:rt.jar </li></ul></ul><ul><ul><ul><li>Offline instrumentation to support boot classloader. </li></ul></ul></ul><ul><ul><li>java –javaagent:deuceAgent.jar –cp “myjar.jar” MyMain </li></ul></ul>
  26. 26. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  27. 27. Implementation <ul><ul><li>ASM – Bytecode manipulation </li></ul></ul><ul><ul><ul><li>Online & Offline </li></ul></ul></ul><ul><ul><li>Fields </li></ul></ul><ul><ul><ul><li>private double commission; </li></ul></ul></ul><ul><ul><ul><li>final static public long commission__ADDRESS ... </li></ul></ul></ul><ul><ul><ul><ul><li>Relative address (-1 if final). </li></ul></ul></ul></ul><ul><ul><ul><li>final static public Object __CLASS_BASE__ ... </li></ul></ul></ul><ul><ul><ul><ul><li>Mark the class base for static fields access. </li></ul></ul></ul></ul>
  28. 28. Implementation <ul><ul><li>Method </li></ul></ul><ul><ul><ul><li>@Atomic methods. </li></ul></ul></ul><ul><ul><ul><ul><li>Replace the with a transaction retry loop. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Add another instrumented method. </li></ul></ul></ul></ul><ul><ul><ul><li>Non-Atomic methods </li></ul></ul></ul><ul><ul><ul><ul><li>Duplicate each with an instrumented version. </li></ul></ul></ul></ul>
  29. 29. Implementation @Atomic public void update ( double value){ double tmp = commission ; commission = tmp + value; } @Atomic public void update ( double value){ commission += value; } In byte code
  30. 30. Implementation public void update( double value, Context c){ double tmp; if ( commission__ADDRESS < 0 ) { // final field tmp = commission ; } else { c.beforeRead( this, commission__ADDRESS); tmp = c.onRead( this, commission , commission__ADDRESS); } c.onWrite( this, tmp + value, commission__ADDRESS); } JIT removes it
  31. 31. Implementation public void update( double value, Context c){ c.beforeRead( this, commission__ADDRESS); double tmp = c.onRead( this, commission , commission__ADDRESS); c.onWrite( this, tmp + value, commission__ADDRESS); }
  32. 32. Implementation public void update( double value){ Context context = ContextDelegetor.getContext(); for ( int i = retries ; i > 0 ; --i){ context.init (); try { update( value, context); if ( context.commit ()) return ; } catch ( TransactionException e ){ context.rollback (); continue ; } catch ( Throwable t ){ if ( context.commit ()) throw t; } } throw new TransactionException(); }
  33. 33. Implementation <ul><li>public interface Context{ </li></ul><ul><ul><ul><li>void init ( int atomicBlockId) </li></ul></ul></ul><ul><ul><ul><li>boolean commit(); </li></ul></ul></ul><ul><ul><ul><li>void rollback (); </li></ul></ul></ul><ul><ul><ul><li>void beforeReadAccess( Object obj , long field ); </li></ul></ul></ul><ul><ul><ul><li>Object onReadAccess( Object obj, Object value , long field ); </li></ul></ul></ul><ul><ul><ul><li>int onReadAccess( Object obj, int value , long field ); </li></ul></ul></ul><ul><ul><ul><li>long onReadAccess( Object obj, long value , long field ); </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul><ul><ul><ul><li>void onWriteAccess( Object obj , Object value , long field ); </li></ul></ul></ul><ul><ul><ul><li>void onWriteAccess( Object obj , int value , long field ); </li></ul></ul></ul><ul><ul><ul><li>void onWriteAccess( Object obj , long value , long field ); </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul><ul><li>} </li></ul>
  34. 34. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  35. 35. TL2 (Transaction Locking II) Dave Dice, Ori Shalev and Nir Shavit [DISC06] <ul><li>CTL - Commit-time locking </li></ul><ul><ul><li>Start </li></ul></ul><ul><ul><ul><li>Sample global version-clock </li></ul></ul></ul><ul><ul><li>Run through a speculative execution </li></ul></ul><ul><ul><ul><li>Collect write-set & read-set </li></ul></ul></ul><ul><ul><li>End </li></ul></ul><ul><ul><ul><li>Lock the write-set </li></ul></ul></ul><ul><ul><ul><li>Increment global version-clock </li></ul></ul></ul><ul><ul><ul><li>Validate the read-set </li></ul></ul></ul><ul><ul><ul><li>Commit and release the locks </li></ul></ul></ul>
  36. 36. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  37. 37. LSA (Lazy Snapshot Algorithm) Torvald Riegel, Pascal Felber and Christof Fetzer [DISC06] <ul><li>ETL - Encounter-time locking </li></ul><ul><ul><li>Start </li></ul></ul><ul><ul><ul><li>Sample global version-clock </li></ul></ul></ul><ul><ul><li>Run through a speculative execution </li></ul></ul><ul><ul><ul><li>Lock on write access </li></ul></ul></ul><ul><ul><ul><li>Collect read-set & write-set </li></ul></ul></ul><ul><ul><li>On validation error try to extend snapshot </li></ul></ul><ul><ul><li>End </li></ul></ul><ul><ul><ul><li>Increment global version-clock </li></ul></ul></ul><ul><ul><ul><li>Validate the read-set </li></ul></ul></ul><ul><ul><ul><li>Commit and release the locks </li></ul></ul></ul>
  38. 38. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  39. 39. Benchmarks (Azul – Vega2 – 2 x 46)
  40. 40. Benchmarks (SuperMicro – 2 x Quad Intel)
  41. 41. Benchmarks (Sun UltraSPARC T2 Plus – 2 x Quad x 8 HT )
  42. 42. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  43. 43. Summary <ul><ul><li>Simple API </li></ul></ul><ul><ul><ul><li>@Atomic </li></ul></ul></ul><ul><ul><li>No changes to Java </li></ul></ul><ul><ul><ul><li>No reserved words </li></ul></ul></ul><ul><ul><li>OpenSource </li></ul></ul><ul><ul><ul><li>On Google code </li></ul></ul></ul><ul><ul><li>Shows nice scalabilty </li></ul></ul><ul><ul><ul><li>Field based </li></ul></ul></ul>
  44. 44. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Solutions </li></ul></ul><ul><ul><li>Deuce </li></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><li>TL2 </li></ul></ul><ul><ul><li>LSA </li></ul></ul><ul><ul><li>Benchmarks </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>References </li></ul></ul>
  45. 45. References <ul><ul><li>Homepage - http://www.deucestm.org </li></ul></ul><ul><ul><li>Project - http://code.google.com/p/deuce/ </li></ul></ul><ul><ul><li>Wikipedia - http://en.wikipedia.org/wiki/Software_transactional_memory </li></ul></ul><ul><ul><li>TL2 – http://research.sun.com/scalable </li></ul></ul><ul><ul><li>LSA-STM - http://tmware.org/lsastm </li></ul></ul>

Notes de l'éditeur

  • Marabma – 128 threads ($17,995) Vega 3 - 864 processors
  • Specific domain are too specific to rule them all. New concurrent languages are too different from the wildly used languages. X10 – add async command Chapel – add send/recv
  • Working with messages can lead to deadlock also, and not intuitive. Every thing is immutable functional languages are every hard to work with in real applications. Imperative programming is too common.
  • No need for durability since we’re changing memory state. Remark, maybe we do want it….

×