SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Processor: Superscalars Pipeline Organization 
Z. Jerry Shi 
Computer Science and Engineering 
University of Connecticut 
* Slides adapted from Blumrich&Gschwind/ELE475’03, Peh/ELE475’*
Targeting better performance 
•Factors that decide the execution time 
Execution Time = Path Length × CPI × Cycle Time 
•Exploit parallelism
Abstract view of instruction execution unit for MIPS
Key components on datapath
Pipelining 
•An implementation technique whereby multiple instructions are overlapped in execution 
–The parallelism among instructions in a sequential stream 
–The parallelism among actions needed to execute an instruction 
•Divide the execution into multiple steps and do one step each time 
–Each step is called a pipe stage or a pipe segment 
•Pipeline throughput: how often an instruction leaves the pipeline 
•Need to balance the length of each pipeline stage 
–Processor cycle time is determined by the slowest stage 
•Ideally, the speedup is the number of pipe stages. However,… 
–Time per instruction on unpipelined machine / Number of pipe stages
Pipelined MIPS datapath
Pipelining in MIPS instruction execution
Two abstract representation of a 5-stage pipeline
Performance of Pipelines 
pipelined 
unpipelined 
pipelined 
unpipelined 
pipelined pipelined 
unpipelined unpipelined 
pipelined 
unpipelined 
pipeline 
Cycle Time 
Cycle Time 
CPI 
CPI 
CPI Cycle Time 
CPI Cycle Time 
AVG Instr Time 
AVG Instr Time 
Speedup 
_ 
_ 
_ 
_ 
_ _ 
_ _ 
  
 
 
 
 
Assume the cycle time is the same: 
1 
Pipeline _ Depth 
CPI 
CPI 
Speedup 
pipelined 
unpipelined 
pipeline  
Things preventing you from getting the ideal speedup 
•Hazard 
•Cost of pipelining 
–Delay on pipeline registers 
–Unbalanced pipeline stages
A basic MIPS datapath
Registers added between stages
Towards Ideal Pipeline CPI 
Pipeline CPI = 
Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls 
–Ideal pipeline CPI: measure of the maximum performance attainable by the implementation 
–Structural hazards: HW cannot support this combination of instructions 
–Data hazards: Instruction depends on the result of prior instructions 
–Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps) 
•Stall the pipeline when there is a hazard 
–Any instructions issued earlier than the stalled instruction continue 
–Any instructions after the stalled instruction are also stalled 
•No new instrutions are fetched
Structural hazards in a simple RISC pipeline 
Accessing memory in the same cycle
Performance impact of structural hazards 
Ideal CPI = 1, no structural hazard, clock rate = 1 
40% of the instructions resulting structural hazards, clock rate =1.05 
Which one is faster? 
Instruction count is the same. Need to consider time per instr. only 
The average time per instruction for the processor with the structural hazard is 
idealidealTimeCycleTimeCycleTimeCycleCPITimeInstrAVG_3.105.1_ )14.01( _ __   
Data hazards
Bypassing can handle some data hazards 
Any other bypassing paths?
Forwarding required by stores
Some problems cannot be solved by bypassing
Data forwarding requires more inputs on multiplexers 
Any other paths ? 
1 
2 
3
Data forwarding to the MEM stage
Examples of data forwarding 
1 
2 
3 
4 
5 
6 
7 
8 
9 
LD R2, 0(R11) 
IF 
ID 
EX 
ME 
WB 
ADD R1, R2, R3 
IF 
ID 
- 
EX 
ME 
WB 
ADD R4, R1, R4 
IF 
- 
ID 
EX 
ME 
WB 
ADD R5, R1, R5 
IF 
ID 
EX 
ME 
WB 
1 
2 
3 
4 
5 
6 
7 
8 
9 
LD R2, 0(R11) 
IF 
ID 
EX 
ME 
WB 
ST R2, 0(R12) 
IF 
ID 
EX 
ME 
WB 
ADD R1, R3, R4 
IF 
ID 
EX 
ME 
WB 
ST R1, 0(R13) 
IF 
ID 
EX 
ME 
WB 
0
Try producing fast code for 
a = b + c; 
d = e – f; 
assuming a, b, c, d ,e, and f in memory. 
Slow code: 
LW Rb,b 
LW Rc,c 
ADD Ra,Rb,Rc 
SW a,Ra 
LW Re,e 
LW Rf,f 
SUB Rd,Re,Rf 
SW d,Rd 
Software scheduling to avoid load hazards 
Fast code: 
LW Rb,b 
LW Rc,c 
LW Re,e 
ADD Ra,Rb,Rc 
LW Rf,f 
SW a,Ra 
SUB Rd,Re,Rf 
SW d,Rd 
Compiler optimizes for performance. Hardware checks for safety.
Reducing branch hazards 
Forwarding from EX/MEM and MEM/WB
Handling control hazards 
Branch instruction 
IF 
ID 
EX 
MEM 
WB 
Brach successor 
IF 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 1 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 2 
IF 
ID 
EX 
MEM 
WB 
• Freeze/flush the pipeline. Wait until the branch destination is known 
–Penalty is fixed 
• Treat every branch as not taken 
• Treat every branch as taken 
–Any advantages in our 5-stage pipeline? 
• Delayed branch 
Branch instruction 
Sequential successor 1 
Branch target if taken 
What if the condition is not resolved until the EX stage?
Predicted Not Taken 
Untaken Branch instr. 
IF 
ID 
EX 
MEM 
WB 
Brach successor 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 1 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 2 
IF 
ID 
EX 
MEM 
WB 
Taken Branch instruction 
IF 
ID 
EX 
MEM 
WB 
Brach successor 
IF 
IF 
ID 
EX 
MEM 
WB 
Brach target 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 1 
IF 
ID 
EX 
MEM 
WB 
Brach successor + 2 
IF 
ID 
EX 
MEM 
WB
Scheduling the branch delay slot 
•a) is the best choice, fills delay slot & reduces instruction count (IC) 
•In b), the sub instruction may need to be copied, increasing IC 
•In b) and c), it must be okay to execute sub when branch fails
Delayed Branch 
•Compiler effectiveness for single branch delay slot: 
–Fills about 60% of branch delay slots 
–About 80% of instructions executed in branch delay slots useful in computation 
–About 50% (60% x 80%) of slots usefully filled 
•Delayed branch downside: 
As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot 
–Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches 
–Growth in available transistors has made dynamic approaches relatively cheaper
Performance of Branch Schemes 
Example: 
Assume a deeper pipeline. 
4% unconditional branch, 
6% conditional branch- untaken, 
10% conditional branch-taken. 
Pipeline speedup = Pipeline depth 
1 +Branch frequencyBranch penalty 
Branch scheme Penalty 
unconditional 
Penalty 
untaken 
Penalty 
taken 
Flush 2 3 3 
Predicted taken 2 3 2 
Predicted untaken 2 0 3
Evaluating Branch Alternatives 
Branch scheme 
Speedup vs Flush 
Delayed branch 
Flush 
1 
Predicted taken 
1.06 
1.14 
Predicted untaken 
1.12 
1.19 
For delayed branch, 50% of the slots can be filled with useful instructions.
MIPS pipeline with three unpipelined FP functional units
•Multiple FP instructions can be executed simultaneously 
Pipelined functional units
Latency and initiation interval 
•Latency: the number of intervening cycles between an instruction that produces a result and an instruction that uses the results 
–Typically 1 cycle less than the depth of the execution pipeline 
•Consider LD has a two-stage execution, 1-cycle latency if the following instruction is not ST 
•Initiation interval: the number of cycles that must elapse between issuing two operations to the same functional unit 
For example, a multiplier with a latency of 7 cycles 
Unpipelined: initiation interval is 7 cycles. 1, 8, 15, … 
Pipelined: initiation interval is 1 cycle. 1, 2, 3, …
Latencies and initiation intervals for functional units 
Functional unit 
# of execution stage 
Latency 
Initiation interval 
Integer ALU 
1 
0 
1 
Data memory 
2 
1 
1 
FP add 
4 
3 
1 
FP multiply 
7 
6 
1 
FP divide 
25 
24 
25
Pipeline timing of a set of independent FP oprations 
•Instructions are fecthed and sent to functional units in order 
•The completion of instructions are not in order because of different execution lenghes 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
MUL.D 
IF 
ID 
M1 
M2 
M3 
M4 
M5 
M6 
M7 
ME 
WB 
ADD.D 
IF 
ID 
A1 
A2 
A3 
A4 
ME 
WB 
L.D 
IF 
ID 
EX 
ME 
WB 
S.D 
IF 
ID 
EX 
ME 
WB
FP code sequence showing the stalls (from RAW) 
1 
2 
3 
4 
5 
6 
7 
8 
9 
L.D F4, 0(R2) 
IF 
ID 
EX 
ME 
WB 
MUL F0,F4,F6 
IF 
ID 
- 
M1 
M2 
M3 
M4 
M5 
ADD F2,F0,F8 
IF 
- 
ID 
- 
- 
- 
- 
S.D F2, 0(R2) 
IF 
- 
- 
- 
- 
10 
11 
12 
13 
14 
15 
16 
17 
18 
L.D F4, 0(R2) 
MUL F0,F4,F6 
M6 
M7 
ME 
WB 
ADD F2,F0,F8 
A1 
A2 
A3 
A4 
ME 
WB 
S.D F2, 0(R2) 
ID 
EX 
- 
- 
- 
ME 
WB
Handling multiple writes to register file 
•Track the use of the write port in the ID stage and install an instruction before it issues 
–Stalls the instruction if it writes in the same cycle as instructions already issued 
–Use shift registers to track which instruction need register in which cycle 
•Stall a conflicting instruction when it tries to enter either MEM or WB stage 
–May choose either instruction 
•May give priority to instructions with long latencies 
–Does not detect conflict until the entrance of the MEM or WB stage, where it is easy to see 
–Complicates pipeline control as stalls may arise from two places
Problems with Pipelining 
•Exception: An unusual event happens to an instruction during its execution 
–Examples: divide by zero, undefined opcode 
•Interrupt: Hardware signal to switch the processor to a new instruction stream 
–Example: a sound card interrupts when it needs more audio output samples (an audio “click” happens if it is left waiting) 
•Problem: It must appear that the exception or interrupt must appear between 2 instructions (Ii and Ii+1) 
–The effect of all instructions up to and including Ii is totalling complete 
– No effect of any instruction after Ii can take place 
•The interrupt (exception) handler either aborts program or restarts at instruction Ii+1
Dealing with exceptions 
•Exceptions are harder to handle in a pipelined processor 
–An instruction is executed in several steps, making it more difficult to determine whether an instruction can safely change the state of the processor 
•Other instructions in pipeline may cause exceptions 
•Example of exceptions 
–Invoking an operating system service 
–Breakpoint (programmer-requested interrupt) 
–Integer/FP arithmetic overflow or anomaly 
–Memory access (Page fault, protection, misalignment) 
–Unknown instructions 
–Hardware malfunctions 
–I/O request 
–Power failure
Classification of exceptions 
•Synchronous versus asynchronous 
–Occur at the same place every time the program is executed? 
•User requested versus coerced 
–User asks for it? 
•User maskable versus nonmaskable 
–Can be masked (disabled) by user? 
•Within versus between instructions 
–Occur in the middle of execution and prevent instruction completion? 
•Resume versus terminate 
–Can program’s execution be resumed?
Stopping and restarting exceptions 
•Most difficult exceptions 
–Occur within instructions (e.g. in the EX and MEM stage) 
–Must be restartable 
•Possible solutions 
–Force a trap instruction into the pipeline on the next IF 
–Until the trap is taken, turn off all writes for the faulting and all following instructions 
–In the exception handlers, save the PC of the faulting instructions 
Precise exceptions: if the pipeline can be stopped so 
the instructions before the faulting instruction can complete 
the instructions after the faulting instruction can be restarted
Precise Exceptions in Static Pipelines 
Key observation: architected state only change in memory and register write stages.
A more complicated pipeline 
•Fetch 
•Decode 
•Dispatch 
•Issue 
•Execute 
•Finish 
•Complete 
•Retire 
Branch Prediction 
Dynamic Scheduling 
Reorder buffer
Superscalar Pipeline 
Executing multiple instructions in parallel
Instruction Fetch 
•Limit on maximum throughput of pipeline 
•Fetch s instructions per cycle from I cache 
•Problems with attaining throughput: 
–Control flow  Branch Prediction 
–Alignment of cache line and PC
Interactions between Instruction Fetch and Instruction Cache Structure 
•In b), if a fetch group ( s instructions) straddles two cache lines, need to access I cache twice 
–If any of the cache line is a miss, the pipeline stalls
Instruction Decode 
•Extract from assembly instruction 
–Instruction Type (Decoder) 
–Dependencies (Comparators) 
–Operands (Register Files & Buses) 
•CISC  RISC: 
–Converted to ROP (RISC OP)
Instruction Decode - Predecoding
AMD’s K6 can decode two instructions per cycle
Instruction Dispatch 
•Dataflow: 
–Send an instruction to a functional unit as soon as its operands are available, regardless of original program order. 
–Tomasulo’s
Instruction Dispatch 
•Centralized reservation station 
•Distributed reservation station
Instruction Execution 
•How many functional units? Why different types? 
–Constraints of area, power, interconnection, etc. 
•You cannot put as many as you want 
–Mix of functional units may not be ideal for some applications 
•Bypassing 
–Bypassing needed between functional units to minimize stalls
Instruction Completion & Retiring 
•Completion  Registers 
•Reorder/Store buffer in between 
–Registers in the buffer (not register file) hold the new values 
•Retiring Memory
Limiting factors: Pipelining hazards 
•Structural hazards 
–Resource conflicts when hardware cannot support all possible combinations of instructions simultaneously 
•Data hazards 
–An instruction depends on the results of a previous instruction 
•Control hazards 
–Branch instructions that change the instruction flow

Contenu connexe

Tendances

Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentationbhavanadonthi
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of PipeliningSHAKOOR AB
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of ProcessorsGaditek
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipeliningTech_MX
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards Bharti Khemani
 
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazardsMunaam Munawar
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processorBảo Hoang
 
Instruction pipeline
Instruction pipelineInstruction pipeline
Instruction pipelineajay_a
 
Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Gaditek
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Gaditek
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and ciscMark Gibbs
 

Tendances (19)

Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentation
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of Pipelining
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipelining
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards
 
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazards
 
Instruction Pipelining
Instruction PipeliningInstruction Pipelining
Instruction Pipelining
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processor
 
Instruction pipeline
Instruction pipelineInstruction pipeline
Instruction pipeline
 
pipelining
pipeliningpipelining
pipelining
 
Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2
 
Pipelining in computer architecture
Pipelining in computer architecturePipelining in computer architecture
Pipelining in computer architecture
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
Unit 3
Unit 3Unit 3
Unit 3
 
Chapter6 pipelining
Chapter6  pipeliningChapter6  pipelining
Chapter6 pipelining
 
Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1
 
Pipelining, processors, risc and cisc
Pipelining, processors, risc and ciscPipelining, processors, risc and cisc
Pipelining, processors, risc and cisc
 
Pipelining1
Pipelining1Pipelining1
Pipelining1
 

Similaire à Topic2a ss pipelines

Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorSmit Shah
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector ProcessingTheInnocentTuber
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer ArchitectureHaris456
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Computer Organozation
Computer OrganozationComputer Organozation
Computer OrganozationAabha Tiwari
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxNadaAAmin
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptxAshokRachapalli1
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.The Linux Foundation
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 

Similaire à Topic2a ss pipelines (20)

Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
COA Unit-5.pptx
COA Unit-5.pptxCOA Unit-5.pptx
COA Unit-5.pptx
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer Architecture
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Computer Organozation
Computer OrganozationComputer Organozation
Computer Organozation
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptx
 
Unit 4 COA.pptx
Unit 4 COA.pptxUnit 4 COA.pptx
Unit 4 COA.pptx
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptx
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 

Dernier

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 

Dernier (20)

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 

Topic2a ss pipelines

  • 1. Processor: Superscalars Pipeline Organization Z. Jerry Shi Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475’03, Peh/ELE475’*
  • 2. Targeting better performance •Factors that decide the execution time Execution Time = Path Length × CPI × Cycle Time •Exploit parallelism
  • 3. Abstract view of instruction execution unit for MIPS
  • 4. Key components on datapath
  • 5. Pipelining •An implementation technique whereby multiple instructions are overlapped in execution –The parallelism among instructions in a sequential stream –The parallelism among actions needed to execute an instruction •Divide the execution into multiple steps and do one step each time –Each step is called a pipe stage or a pipe segment •Pipeline throughput: how often an instruction leaves the pipeline •Need to balance the length of each pipeline stage –Processor cycle time is determined by the slowest stage •Ideally, the speedup is the number of pipe stages. However,… –Time per instruction on unpipelined machine / Number of pipe stages
  • 7. Pipelining in MIPS instruction execution
  • 8. Two abstract representation of a 5-stage pipeline
  • 9. Performance of Pipelines pipelined unpipelined pipelined unpipelined pipelined pipelined unpipelined unpipelined pipelined unpipelined pipeline Cycle Time Cycle Time CPI CPI CPI Cycle Time CPI Cycle Time AVG Instr Time AVG Instr Time Speedup _ _ _ _ _ _ _ _       Assume the cycle time is the same: 1 Pipeline _ Depth CPI CPI Speedup pipelined unpipelined pipeline  
  • 10. Things preventing you from getting the ideal speedup •Hazard •Cost of pipelining –Delay on pipeline registers –Unbalanced pipeline stages
  • 11. A basic MIPS datapath
  • 13. Towards Ideal Pipeline CPI Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls –Ideal pipeline CPI: measure of the maximum performance attainable by the implementation –Structural hazards: HW cannot support this combination of instructions –Data hazards: Instruction depends on the result of prior instructions –Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps) •Stall the pipeline when there is a hazard –Any instructions issued earlier than the stalled instruction continue –Any instructions after the stalled instruction are also stalled •No new instrutions are fetched
  • 14. Structural hazards in a simple RISC pipeline Accessing memory in the same cycle
  • 15. Performance impact of structural hazards Ideal CPI = 1, no structural hazard, clock rate = 1 40% of the instructions resulting structural hazards, clock rate =1.05 Which one is faster? Instruction count is the same. Need to consider time per instr. only The average time per instruction for the processor with the structural hazard is idealidealTimeCycleTimeCycleTimeCycleCPITimeInstrAVG_3.105.1_ )14.01( _ __   
  • 17. Bypassing can handle some data hazards Any other bypassing paths?
  • 19. Some problems cannot be solved by bypassing
  • 20. Data forwarding requires more inputs on multiplexers Any other paths ? 1 2 3
  • 21. Data forwarding to the MEM stage
  • 22. Examples of data forwarding 1 2 3 4 5 6 7 8 9 LD R2, 0(R11) IF ID EX ME WB ADD R1, R2, R3 IF ID - EX ME WB ADD R4, R1, R4 IF - ID EX ME WB ADD R5, R1, R5 IF ID EX ME WB 1 2 3 4 5 6 7 8 9 LD R2, 0(R11) IF ID EX ME WB ST R2, 0(R12) IF ID EX ME WB ADD R1, R3, R4 IF ID EX ME WB ST R1, 0(R13) IF ID EX ME WB 0
  • 23. Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Software scheduling to avoid load hazards Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd Compiler optimizes for performance. Hardware checks for safety.
  • 24. Reducing branch hazards Forwarding from EX/MEM and MEM/WB
  • 25. Handling control hazards Branch instruction IF ID EX MEM WB Brach successor IF IF ID EX MEM WB Brach successor + 1 IF ID EX MEM WB Brach successor + 2 IF ID EX MEM WB • Freeze/flush the pipeline. Wait until the branch destination is known –Penalty is fixed • Treat every branch as not taken • Treat every branch as taken –Any advantages in our 5-stage pipeline? • Delayed branch Branch instruction Sequential successor 1 Branch target if taken What if the condition is not resolved until the EX stage?
  • 26. Predicted Not Taken Untaken Branch instr. IF ID EX MEM WB Brach successor IF ID EX MEM WB Brach successor + 1 IF ID EX MEM WB Brach successor + 2 IF ID EX MEM WB Taken Branch instruction IF ID EX MEM WB Brach successor IF IF ID EX MEM WB Brach target IF ID EX MEM WB Brach successor + 1 IF ID EX MEM WB Brach successor + 2 IF ID EX MEM WB
  • 27. Scheduling the branch delay slot •a) is the best choice, fills delay slot & reduces instruction count (IC) •In b), the sub instruction may need to be copied, increasing IC •In b) and c), it must be okay to execute sub when branch fails
  • 28. Delayed Branch •Compiler effectiveness for single branch delay slot: –Fills about 60% of branch delay slots –About 80% of instructions executed in branch delay slots useful in computation –About 50% (60% x 80%) of slots usefully filled •Delayed branch downside: As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot –Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches –Growth in available transistors has made dynamic approaches relatively cheaper
  • 29. Performance of Branch Schemes Example: Assume a deeper pipeline. 4% unconditional branch, 6% conditional branch- untaken, 10% conditional branch-taken. Pipeline speedup = Pipeline depth 1 +Branch frequencyBranch penalty Branch scheme Penalty unconditional Penalty untaken Penalty taken Flush 2 3 3 Predicted taken 2 3 2 Predicted untaken 2 0 3
  • 30. Evaluating Branch Alternatives Branch scheme Speedup vs Flush Delayed branch Flush 1 Predicted taken 1.06 1.14 Predicted untaken 1.12 1.19 For delayed branch, 50% of the slots can be filled with useful instructions.
  • 31. MIPS pipeline with three unpipelined FP functional units
  • 32. •Multiple FP instructions can be executed simultaneously Pipelined functional units
  • 33. Latency and initiation interval •Latency: the number of intervening cycles between an instruction that produces a result and an instruction that uses the results –Typically 1 cycle less than the depth of the execution pipeline •Consider LD has a two-stage execution, 1-cycle latency if the following instruction is not ST •Initiation interval: the number of cycles that must elapse between issuing two operations to the same functional unit For example, a multiplier with a latency of 7 cycles Unpipelined: initiation interval is 7 cycles. 1, 8, 15, … Pipelined: initiation interval is 1 cycle. 1, 2, 3, …
  • 34. Latencies and initiation intervals for functional units Functional unit # of execution stage Latency Initiation interval Integer ALU 1 0 1 Data memory 2 1 1 FP add 4 3 1 FP multiply 7 6 1 FP divide 25 24 25
  • 35. Pipeline timing of a set of independent FP oprations •Instructions are fecthed and sent to functional units in order •The completion of instructions are not in order because of different execution lenghes 1 2 3 4 5 6 7 8 9 10 11 MUL.D IF ID M1 M2 M3 M4 M5 M6 M7 ME WB ADD.D IF ID A1 A2 A3 A4 ME WB L.D IF ID EX ME WB S.D IF ID EX ME WB
  • 36. FP code sequence showing the stalls (from RAW) 1 2 3 4 5 6 7 8 9 L.D F4, 0(R2) IF ID EX ME WB MUL F0,F4,F6 IF ID - M1 M2 M3 M4 M5 ADD F2,F0,F8 IF - ID - - - - S.D F2, 0(R2) IF - - - - 10 11 12 13 14 15 16 17 18 L.D F4, 0(R2) MUL F0,F4,F6 M6 M7 ME WB ADD F2,F0,F8 A1 A2 A3 A4 ME WB S.D F2, 0(R2) ID EX - - - ME WB
  • 37. Handling multiple writes to register file •Track the use of the write port in the ID stage and install an instruction before it issues –Stalls the instruction if it writes in the same cycle as instructions already issued –Use shift registers to track which instruction need register in which cycle •Stall a conflicting instruction when it tries to enter either MEM or WB stage –May choose either instruction •May give priority to instructions with long latencies –Does not detect conflict until the entrance of the MEM or WB stage, where it is easy to see –Complicates pipeline control as stalls may arise from two places
  • 38. Problems with Pipelining •Exception: An unusual event happens to an instruction during its execution –Examples: divide by zero, undefined opcode •Interrupt: Hardware signal to switch the processor to a new instruction stream –Example: a sound card interrupts when it needs more audio output samples (an audio “click” happens if it is left waiting) •Problem: It must appear that the exception or interrupt must appear between 2 instructions (Ii and Ii+1) –The effect of all instructions up to and including Ii is totalling complete – No effect of any instruction after Ii can take place •The interrupt (exception) handler either aborts program or restarts at instruction Ii+1
  • 39. Dealing with exceptions •Exceptions are harder to handle in a pipelined processor –An instruction is executed in several steps, making it more difficult to determine whether an instruction can safely change the state of the processor •Other instructions in pipeline may cause exceptions •Example of exceptions –Invoking an operating system service –Breakpoint (programmer-requested interrupt) –Integer/FP arithmetic overflow or anomaly –Memory access (Page fault, protection, misalignment) –Unknown instructions –Hardware malfunctions –I/O request –Power failure
  • 40. Classification of exceptions •Synchronous versus asynchronous –Occur at the same place every time the program is executed? •User requested versus coerced –User asks for it? •User maskable versus nonmaskable –Can be masked (disabled) by user? •Within versus between instructions –Occur in the middle of execution and prevent instruction completion? •Resume versus terminate –Can program’s execution be resumed?
  • 41. Stopping and restarting exceptions •Most difficult exceptions –Occur within instructions (e.g. in the EX and MEM stage) –Must be restartable •Possible solutions –Force a trap instruction into the pipeline on the next IF –Until the trap is taken, turn off all writes for the faulting and all following instructions –In the exception handlers, save the PC of the faulting instructions Precise exceptions: if the pipeline can be stopped so the instructions before the faulting instruction can complete the instructions after the faulting instruction can be restarted
  • 42. Precise Exceptions in Static Pipelines Key observation: architected state only change in memory and register write stages.
  • 43. A more complicated pipeline •Fetch •Decode •Dispatch •Issue •Execute •Finish •Complete •Retire Branch Prediction Dynamic Scheduling Reorder buffer
  • 44. Superscalar Pipeline Executing multiple instructions in parallel
  • 45. Instruction Fetch •Limit on maximum throughput of pipeline •Fetch s instructions per cycle from I cache •Problems with attaining throughput: –Control flow  Branch Prediction –Alignment of cache line and PC
  • 46. Interactions between Instruction Fetch and Instruction Cache Structure •In b), if a fetch group ( s instructions) straddles two cache lines, need to access I cache twice –If any of the cache line is a miss, the pipeline stalls
  • 47. Instruction Decode •Extract from assembly instruction –Instruction Type (Decoder) –Dependencies (Comparators) –Operands (Register Files & Buses) •CISC  RISC: –Converted to ROP (RISC OP)
  • 48. Instruction Decode - Predecoding
  • 49. AMD’s K6 can decode two instructions per cycle
  • 50. Instruction Dispatch •Dataflow: –Send an instruction to a functional unit as soon as its operands are available, regardless of original program order. –Tomasulo’s
  • 51. Instruction Dispatch •Centralized reservation station •Distributed reservation station
  • 52. Instruction Execution •How many functional units? Why different types? –Constraints of area, power, interconnection, etc. •You cannot put as many as you want –Mix of functional units may not be ideal for some applications •Bypassing –Bypassing needed between functional units to minimize stalls
  • 53. Instruction Completion & Retiring •Completion  Registers •Reorder/Store buffer in between –Registers in the buffer (not register file) hold the new values •Retiring Memory
  • 54. Limiting factors: Pipelining hazards •Structural hazards –Resource conflicts when hardware cannot support all possible combinations of instructions simultaneously •Data hazards –An instruction depends on the results of a previous instruction •Control hazards –Branch instructions that change the instruction flow