SlideShare une entreprise Scribd logo
1  sur  59
Multithreaded Programming in Cilk L ECTURE  1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example applications:
Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
Cilk Is Simple ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fibonacci Cilk is a  faithful   extension of C.  A Cilk program’s  serial elision  is always a legal implementation of Cilk semantics.  Cilk provides  no   new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Cilk code
Basic Cilk Keywords cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Identifies a function as a  Cilk procedure , capable of being spawned in parallel. The named  child  Cilk procedure can execute in parallel with the  parent  caller. Control cannot pass this point until all spawned children have returned.
Dynamic Multithreading cilk   int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync ; return (x+y); } } The  computation dag  unfolds dynamically. Example:   fib(4) “ Processor   oblivious” 4 3 2 2 1 1 1 0 0
Multithreaded Computation ,[object Object],[object Object],[object Object],spawn edge return edge continue edge initial thread final thread
Cactus Stack Cilk supports C’s rule for pointers:   A pointer to stack space can be passed from parent to child, but not from child to parent.  (Cilk also supports  malloc .) Cilk’s  cactus stack  supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
LECTURE 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithmic Complexity Measures T P  =  execution time on  P  processors
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work T 1  =  span * * Also called  critical-path length  or  computational depth .
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work ,[object Object],[object Object],[object Object],* Also called  critical-path length  or  computational depth . T 1  =  span *
Speedup Definition:   T 1 /T P  =  speedup   on  P  processors. If  T 1 /T P =   ( P )  ·   P ,  we have  linear speedup ; =  P , we have  perfect linear speedup ; >  P , we have  superlinear speedup , which is not possible in our model, because of the lower bound  T P   ¸   T 1 / P .
Parallelism ,[object Object],[object Object],[object Object]
Example:  fib(4) Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 3 4 5 6 1 2 7 8 Work:   T 1  = 17
Example:  fib(4) Parallelism:   T 1 / T 1  = 2.125 Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 Work:   T 1  = 17 Using many more than  2  processors makes little sense.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
Parallelizing Vector Addition C C if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { void vadd (real *A, real *B, int n){ vadd (A, B, n/2); vadd (A+n/2, B+n/2, n-n/2); ,[object Object],[object Object],void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Parallelizing Vector Addition if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { C ,[object Object],[object Object],[object Object],void vadd (real *A, real *B, int n){ cil k spawn vadd (A, B, n/2; vadd (A+n/2, B+n/2, n-n/2; spawn Side benefit:   D&C is generally good for caches! sync ; C ilk void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Vector Addition cil k   void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn  vadd (A, B, n/2); spawn  vadd (A+n/2, B+n/2, n-n/2); sync ; } }
Vector Addition Analysis ,[object Object],[object Object],[object Object],[object Object], ( n /lg  n )  ( n )  (lg  n ) BASE
Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk  void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk  void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn  vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
Analysis ,[object Object], (1)  ( n ) … …  ( n ) BASE ,[object Object],[object Object],[object Object],PUNY!
Optimal Choice of BASE ,[object Object],[object Object], ( √ n  ) … ,[object Object], ( n ) BASE … ,[object Object], (BASE +  n /BASE) ,[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scheduling ,[object Object],[object Object],[object Object],P P P Network … Memory I/O $ $ $
Greedy Scheduling ,[object Object],Definition:   A thread is  ready  if all its predecessors have  executed .
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy-Scheduling Theorem Theorem  [Graham ’68 & Brent ’75]. Any greedy scheduler achieves T P      T 1 / P   + T  . ,[object Object],[object Object],[object Object],P  = 3
Optimality of Greedy ,[object Object],Proof .  Let  T P *  be the execution time produced by the optimal scheduler.  Since  T P *  ¸  max{ T 1 / P ,  T 1 }  (lower bounds), we have T P ·   T 1 / P  +  T 1  ·  2 ¢ max{ T 1 / P ,  T 1 } ·  2 T P *  .  ■
Linear Speedup ,[object Object],Proof.   Since  P  ¿   T 1 / T 1  is equivalent to  T 1   ¿   T 1 / P , the Greedy Scheduling Theorem gives us   T P ·   T 1 / P  +  T 1 ¼  T 1 / P  . Thus, the speedup is  T 1 / T P   ¼   P .  ■ Definition.  The quantity  ( T 1 / T 1  )/ P  is called the  parallel slackness .
Cilk Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk Chess Programs ,[object Object],[object Object],[object Object],[object Object]
 Socrates Normalized Speedup T P  =  T 1 / P  +  T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P  =  T  T P  =  T 1 / P T 1 /T P T 1 /T  P T 1 /T 
Developing   Socrates  ,[object Object],[object Object],[object Object],[object Object]
 Socrates Speedup Paradox T P      T 1 / P  +  T  T 32 = 2048/32 + 1   = 65  seconds   = 40  seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65  seconds T  32 = 40  seconds T 1 = 2048  seconds T    = 1  second T  1 = 1024  seconds T     = 8  seconds T 512 = 2048/512 + 1   = 5  seconds T  512 = 1024/512 + 8   = 10  seconds
Lesson Work  and  span  can predict performance on large machines better than running times on small machines can.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque   of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a   work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P      T 1 / P   + O ( T 1 ) on  P  processors. Pseudoproof . A processor is either  working  or  stealing .  The total time all processors spend working is  T 1 .  Each steal has a  1/ P  chance of reducing the span by  1 .  Thus, the expected cost of all steals is  O ( PT 1 ) .  Since there are  P  processors, the expected time is  ( T 1  +  O ( PT 1 ))/ P  =   T 1 / P  +  O ( T 1 )  .  ■
Space Bounds Theorem.   Let  S 1  be the stack space required by a serial execution of a Cilk program.  Then, the space required by a  P -processor execution is at most  S P   ·   PS 1  . Proof  (by induction). The work-stealing algorithm maintains the  busy-leaves property :  every extant procedure frame with no extant descendents has a processor working on it.   ■   P  = 3 P P P S 1
Linguistic Implications ,[object Object],for (i=1; i<1000000000; i++) { spawn  foo(i); } sync; M ORAL Better to steal parents than children!
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Key Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic ProgrammingNeeldhara Misra
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsVivek Bhargav
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Dr. Pankaj Agarwal
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesHaitham El-Ghareeb
 
02 order of growth
02 order of growth02 order of growth
02 order of growthHira Gul
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm AnalyzingHaluan Irsad
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithmsDr. Rupa Ch
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to AlgorithmsVenkatesh Iyer
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsAbdullah Al-hazmy
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 

Tendances (19)

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study Notes
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm Analyzing
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithms
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Analysis of Algorithm
Analysis of AlgorithmAnalysis of Algorithm
Analysis of Algorithm
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Big o
Big oBig o
Big o
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis tools
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Algorithms Question bank
Algorithms Question bankAlgorithms Question bank
Algorithms Question bank
 

Similaire à Lecture 1

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematicalbabuk110
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.pptebinazer1
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdfomchoubey297
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowQuoc-Sang Phan
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.pptSoumyaJ3
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptxKarthikVijay59
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithmDevaKumari Vijay
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxdudelover
 

Similaire à Lecture 1 (20)

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematical
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
parallel
parallelparallel
parallel
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf
 
Matlab Nn Intro
Matlab Nn IntroMatlab Nn Intro
Matlab Nn Intro
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Cs2251 daa
Cs2251 daaCs2251 daa
Cs2251 daa
 
AA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptxAA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptx
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information Flow
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.ppt
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithm
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptx
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Lecture 1

  • 1. Multithreaded Programming in Cilk L ECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
  • 2.
  • 3. Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
  • 4.
  • 5.
  • 6.
  • 7. Fibonacci Cilk is a faithful extension of C. A Cilk program’s serial elision is always a legal implementation of Cilk semantics. Cilk provides no new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Cilk code
  • 8. Basic Cilk Keywords cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Identifies a function as a Cilk procedure , capable of being spawned in parallel. The named child Cilk procedure can execute in parallel with the parent caller. Control cannot pass this point until all spawned children have returned.
  • 9. Dynamic Multithreading cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync ; return (x+y); } } The computation dag unfolds dynamically. Example: fib(4) “ Processor oblivious” 4 3 2 2 1 1 1 0 0
  • 10.
  • 11. Cactus Stack Cilk supports C’s rule for pointers: A pointer to stack space can be passed from parent to child, but not from child to parent. (Cilk also supports malloc .) Cilk’s cactus stack supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
  • 12.
  • 13. Algorithmic Complexity Measures T P = execution time on P processors
  • 14. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work
  • 15. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work T 1 = span * * Also called critical-path length or computational depth .
  • 16.
  • 17. Speedup Definition: T 1 /T P = speedup on P processors. If T 1 /T P =  ( P ) · P , we have linear speedup ; = P , we have perfect linear speedup ; > P , we have superlinear speedup , which is not possible in our model, because of the lower bound T P ¸ T 1 / P .
  • 18.
  • 19. Example: fib(4) Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 3 4 5 6 1 2 7 8 Work: T 1 = 17
  • 20. Example: fib(4) Parallelism: T 1 / T 1 = 2.125 Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 Work: T 1 = 17 Using many more than 2 processors makes little sense.
  • 21.
  • 22. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
  • 23.
  • 24.
  • 25. Vector Addition cil k void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn vadd (A, B, n/2); spawn vadd (A+n/2, B+n/2, n-n/2); sync ; } }
  • 26.
  • 27. Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.  Socrates Normalized Speedup T P = T 1 / P + T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P = T  T P = T 1 / P T 1 /T P T 1 /T  P T 1 /T 
  • 42.
  • 43.  Socrates Speedup Paradox T P  T 1 / P + T  T 32 = 2048/32 + 1 = 65 seconds = 40 seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65 seconds T  32 = 40 seconds T 1 = 2048 seconds T  = 1 second T  1 = 1024 seconds T   = 8 seconds T 512 = 2048/512 + 1 = 5 seconds T  512 = 1024/512 + 8 = 10 seconds
  • 44. Lesson Work and span can predict performance on large machines better than running times on small machines can.
  • 45.
  • 46. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
  • 47. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
  • 48. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 49. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 50. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 51. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 52. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 53. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 54. Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P  T 1 / P + O ( T 1 ) on P processors. Pseudoproof . A processor is either working or stealing . The total time all processors spend working is T 1 . Each steal has a 1/ P chance of reducing the span by 1 . Thus, the expected cost of all steals is O ( PT 1 ) . Since there are P processors, the expected time is ( T 1 + O ( PT 1 ))/ P = T 1 / P + O ( T 1 ) . ■
  • 55. Space Bounds Theorem. Let S 1 be the stack space required by a serial execution of a Cilk program. Then, the space required by a P -processor execution is at most S P · PS 1 . Proof (by induction). The work-stealing algorithm maintains the busy-leaves property : every extant procedure frame with no extant descendents has a processor working on it. ■ P = 3 P P P S 1
  • 56.
  • 57.
  • 58.
  • 59.