SlideShare une entreprise Scribd logo
1  sur  127
Parallel Computing Platforms Ananth Grama, Anshul Gupta,  George Karypis, and Vipin Kumar   To accompany the text ``Introduction to Parallel Computing'',  Addison Wesley, 2003.
Topic Overview  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scope of Parallelism  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Implicit Parallelism: Trends in Microprocessor Architectures  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object]
Superscalar Execution: An Example   Example of a two-way superscalar execution of instructions.
Superscalar Execution: An Example ,[object Object],[object Object]
Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Superscalar Execution:  Issue Mechanisms  ,[object Object],[object Object],[object Object]
Superscalar Execution:  Efficiency Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors: Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Limitations of  Memory System Performance  ,[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Bandwidth and Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Improving Effective Memory  Latency Using Caches  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches: Example  ,[object Object]
Impact of Caches: Example (continued) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth  ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],Multiplying a matrix with a vector: (a) multiplying column-by-column, keeping a running sum; (b) computing each element of the result as a dot product of a row of the matrix with the vector.
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Summary  ,[object Object],[object Object],[object Object],[object Object]
Alternate Approaches for  Hiding Memory Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding: Example ,[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object]
Prefetching for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object],[object Object]
Explicitly Parallel Platforms
Dichotomy of Parallel Computing Platforms  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object],[object Object]
SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
SIMD Processors  ,[object Object],[object Object],[object Object],[object Object]
Conditional Execution in SIMD Processors  Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
MIMD Processors ,[object Object],[object Object],[object Object],[object Object]
SIMD-MIMD Comparison  ,[object Object],[object Object],[object Object],[object Object]
Communication Model  of Parallel Platforms  ,[object Object],[object Object],[object Object]
Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object]
NUMA and UMA Shared-Address-Space Platforms  Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
NUMA and UMA  Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Shared-Address-Space  vs.  Shared Memory Machines   ,[object Object],[object Object],[object Object]
Message-Passing Platforms  ,[object Object],[object Object],[object Object],[object Object]
Message Passing  vs.  Shared Address Space Platforms   ,[object Object],[object Object]
Physical Organization  of Parallel Platforms  ,[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Physical Complexity of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Interconnection Networks  for Parallel Computers  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Static and Dynamic Interconnection Networks  Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
Interconnection Networks  ,[object Object],[object Object],[object Object]
Interconnection Networks:  Network Interfaces  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies  ,[object Object],[object Object],[object Object]
Network Topologies: Buses  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Network Topologies: Buses  Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
Network Topologies: Crossbars A completely non-blocking crossbar network connecting  p  processors to b memory banks. A crossbar network uses an  p×m  grid of switches to connect  p  inputs to m outputs in a non-blocking manner.
Network Topologies: Crossbars ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks  ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks The schematic of a typical multistage interconnection network.
Network Topologies: Multistage Omega Network ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
Network Topologies:  Multistage Omega Network ,[object Object],[object Object],Two switching configurations of the 2 × 2 switch:  (a) Pass-through; (b) Cross-over.
Network Topologies:  Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has  p/2 × log p  switching nodes, and the cost of such a network grows as  (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
Network Topologies:  Multistage Omega Network – Routing ,[object Object],[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network – Routing An example of blocking in omega network: one of the messages  (010 to 111 or 110 to 100) is blocked at link AB.
Network Topologies:  Completely Connected Network ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes;  (b) a star connected network of nine nodes.
Network Topologies:  Star Connected Network ,[object Object],[object Object],[object Object]
Network Topologies:  Linear Arrays, Meshes, and  k-d  Meshes ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
Network Topologies:  Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
Network Topologies:  Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
Network Topologies:  Properties of Hypercubes ,[object Object],[object Object],[object Object]
Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
Network Topologies: Tree Properties  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
Evaluating  Static Interconnection Networks ,[object Object],[object Object],[object Object]
Evaluating  Static Interconnection Networks Wraparound  k -ary  d -cube  Hypercube  2-D wraparound mesh  2-D mesh, no wraparound  Linear array  Complete binary tree  Star  Completely-connected  Cost  (No. of links)  Arc Connectivity  BisectionWidth  Diameter  Network
Evaluating Dynamic Interconnection Networks Dynamic Tree  Omega Network  Crossbar  Cost  (No. of links)  Arc Connectivity  Bisection Width  Diameter  Network
Cache Coherence  in Multiprocessor Systems  ,[object Object],[object Object],[object Object],[object Object]
Cache Coherence  in Multiprocessor Systems  Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
Cache Coherence:  Update and Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
Maintaining Coherence  Using Invalidate Protocols  Example of parallel program execution with the simple three-state coherence protocol.
Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
Performance of Snoopy Caches  ,[object Object],[object Object],[object Object]
Directory Based Systems  ,[object Object],[object Object],[object Object]
Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
Performance of  Directory Based Schemes  ,[object Object],[object Object],[object Object],[object Object]
Communication Costs  in Parallel Machines  ,[object Object],[object Object]
Message Passing Costs in  Parallel Computers ,[object Object],[object Object],[object Object],[object Object]
Store-and-Forward Routing  ,[object Object],[object Object],[object Object]
Routing Techniques Passing a message from node  P 0  to  P 3  (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
Packet Routing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Cost Models for  Shared Address Space Machines  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks  ,[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks Routing a message from node  P s  (010) to node  P d  (111) in a three-dimensional hypercube using E-cube routing.
Mapping Techniques for Graphs  ,[object Object],[object Object],[object Object]
Mapping Techniques for Graphs: Metrics  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Embedding a Linear Array  into a Hypercube  ,[object Object],[object Object],0
Embedding a Linear Array  into a Hypercube ,[object Object],[object Object]
Embedding a Linear Array  into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
Embedding a Mesh  into a Hypercube ,[object Object]
Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
Embedding a Mesh into a Linear Array  ,[object Object],[object Object],[object Object]
Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
Embedding a Hypercube into a 2-D Mesh ,[object Object],[object Object],[object Object]
Embedding a Hypercube into a 2-D Mesh: Example  Embedding a hypercube into a 2-D mesh.
Case Studies:  The IBM Blue-Gene Architecture  The hierarchical architecture of Blue Gene.
Case Studies:  The Cray T3E Architecture Interconnection network of the Cray T3E:  (a) node architecture; (b) network topology.
Case Studies:  The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
Case Studies:  The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.

Contenu connexe

Tendances

Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Yulong Bai
 
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSSTUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...Mumbai Academisc
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Mumbai Academisc
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsJames McGalliard
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...IJCSEA Journal
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manageriaemedu
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Lakshmi Yasaswi Kamireddy
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed ComputingRicha Singh
 
process management
 process management process management
process managementAshish Kumar
 

Tendances (19)

Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3
 
Cache memory
Cache memoryCache memory
Cache memory
 
SPROJReport (1)
SPROJReport (1)SPROJReport (1)
SPROJReport (1)
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSSTUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manager
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
process management
 process management process management
process management
 

Similaire à Chap2 slides

week_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxweek_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxmivomi1
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline MechanismAshik Iqbal
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismFarwa Ansari
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency modelspalani kumar
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET Journal
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore ComputersA B Shinde
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsateeq ateeq
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESijcsit
 
Reducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesReducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesAIRCC Publishing Corporation
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESijcsit
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 

Similaire à Chap2 slides (20)

Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
week_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxweek_2Lec02_CS422.pptx
week_2Lec02_CS422.pptx
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
 
Chapter 2 pc
Chapter 2 pcChapter 2 pc
Chapter 2 pc
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
shashank_spdp1993_00395543
shashank_spdp1993_00395543shashank_spdp1993_00395543
shashank_spdp1993_00395543
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
Reducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesReducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor Architectures
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Compiler design
Compiler designCompiler design
Compiler design
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Chap2 slides

  • 1. Parallel Computing Platforms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Superscalar Execution: An Example Example of a two-way superscalar execution of instructions.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 39.
  • 40.
  • 41.
  • 42. SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
  • 43.
  • 44. Conditional Execution in SIMD Processors Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. NUMA and UMA Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. Static and Dynamic Interconnection Networks Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Network Topologies: Buses Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
  • 66. Network Topologies: Crossbars A completely non-blocking crossbar network connecting p processors to b memory banks. A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner.
  • 67.
  • 68.
  • 69. Network Topologies: Multistage Networks The schematic of a typical multistage interconnection network.
  • 70.
  • 71. Network Topologies: Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
  • 72.
  • 73. Network Topologies: Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has p/2 × log p switching nodes, and the cost of such a network grows as (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
  • 74.
  • 75. Network Topologies: Multistage Omega Network – Routing An example of blocking in omega network: one of the messages (010 to 111 or 110 to 100) is blocked at link AB.
  • 76.
  • 77. Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes; (b) a star connected network of nine nodes.
  • 78.
  • 79.
  • 80. Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
  • 81. Network Topologies: Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
  • 82. Network Topologies: Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
  • 83.
  • 84. Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
  • 85.
  • 86. Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
  • 87.
  • 88. Evaluating Static Interconnection Networks Wraparound k -ary d -cube Hypercube 2-D wraparound mesh 2-D mesh, no wraparound Linear array Complete binary tree Star Completely-connected Cost (No. of links) Arc Connectivity BisectionWidth Diameter Network
  • 89. Evaluating Dynamic Interconnection Networks Dynamic Tree Omega Network Crossbar Cost (No. of links) Arc Connectivity Bisection Width Diameter Network
  • 90.
  • 91. Cache Coherence in Multiprocessor Systems Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
  • 92.
  • 93.
  • 94. Maintaining Coherence Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
  • 95. Maintaining Coherence Using Invalidate Protocols Example of parallel program execution with the simple three-state coherence protocol.
  • 96. Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
  • 97.
  • 98.
  • 99. Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. Routing Techniques Passing a message from node P 0 to P 3 (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112. Routing Mechanisms for Interconnection Networks Routing a message from node P s (010) to node P d (111) in a three-dimensional hypercube using E-cube routing.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Embedding a Linear Array into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
  • 118.
  • 119. Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
  • 120.
  • 121. Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
  • 122.
  • 123. Embedding a Hypercube into a 2-D Mesh: Example Embedding a hypercube into a 2-D mesh.
  • 124. Case Studies: The IBM Blue-Gene Architecture The hierarchical architecture of Blue Gene.
  • 125. Case Studies: The Cray T3E Architecture Interconnection network of the Cray T3E: (a) node architecture; (b) network topology.
  • 126. Case Studies: The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
  • 127. Case Studies: The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.