SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Advanced Pipelining

• Superpiplining: Increase the depth of the pipeline (deep pipline)
   – to overlap more instructions
• Multiple issue: start more than one instruction each cycle
   – To have CPI<1
• Loop unrolling : a technique to get better instr scheduling
   – To expose more ILP


• “Superscalar” processors
   – DEC Alpha 21264: 9 stage pipeline, 6 instruction issue
   – dynamic multiple issue: processor dynamically chooses which
     instructions to execute in a given cycle while trying to avoid hazard.
• VLIW: very long instruction word, static multiple issue
       (relies more on compiler technology - packing instructions
                                                 and handling hazard)

                                                               ©2004 Morgan Kaufmann Publishers   40
Advanced Pipelining

 • Static multiple issue
    – compiler decides multiple issue before execution
 • Dynamic multiple issue
    – processor decides multiple issue during execution
 • Problems of multiple issue
    – How to package instructions into issue slots
    – How to deal with data and control hazard

 • Speculation – the compiler or processor guesses the outcome
   of an instruction to remove it as a dependence in executing
   other instructions




                                                          ©2004 Morgan Kaufmann Publishers   41
Static Multiple Issue

 • Issue packet
    – the set of instructions that issue together in a clock cycle
 • SMI concept
    – regard an issue packet as one large instruction with multiple operations
    – Very Long Instruction Word (VLIW) or Explicitly Parallel Instruction
      Computer (EPIC) by intel IA-64
 • Assume two instrs may be issued per clock cycle:
    – 1 for an integer ALU op or branch
    – 1 for a load or store




                                                                 ©2004 Morgan Kaufmann Publishers   42
A Static Two-issue Datapath




                              ©2004 Morgan Kaufmann Publishers   43
Static Multiple Issues

• Extra resources (issuing 2 instrs per cycle)
   –   Another 32bits from instruction memory
   –   need extra ports in the register file
   –   Another ALU handling address calculation for data transfer
   •   Without these extra resources ⇒ structural hazards


• More ambitious compiler or h/w scheduling technique
   – loads have a latency of 1 clock cycle in simple five-stage pipeline
       • In two-issue pipeline, the next two inst cannot use the load result
         without stalling.
   – ALU that has no use latency in simple five-stage pipeline
       • Become 1-instr use latency (the result cannot be used in paired instr)




                                                             ©2004 Morgan Kaufmann Publishers   44
Example: Multiple-issue Code Scheduling

•   How would this loop be scheduled on a two-issue pipeline for MIPS?
    Reorder the instrs to avoid as many pipeline stalls as possible.

                    Loop:   lw        $t0, 0($s1)
                            addu      $t0, $t0, $s2
                            sw        $t0, 0($s1)
                            addi      $s1, $s1, -4
                            bne       $s1, $zero, Loop

• Ans:    4 clocks per loop iteration
          CPI = 4/5= 0.8
                     ALU or branch inst.   Data transfer inst.   Clock cycle
            Loop:                            lw $t0, 0($s1)              1
                      addi $s1, $s1, -4                                  2
                      addu $t0, $t0, $s2                                 3
                     bne $s1, $zero,Loop     sw $t0, 4($s1)              4

                                                                 ©2004 Morgan Kaufmann Publishers   45
Example: Loop Unrolling for Multiple-issue Pipelines
 Loop unrolling:
 • multiple copies of the loop body are made &
   instrs from different iterations are scheduled together
 • Register renaming - remove antidependence (name dependence)
 Ex. Assume the loop index is a multiple of four

                                         ALU or branch inst.    Data transfer               Clock
Loop: lw   $t0, 0($s1)                                          inst.                       cycle
      addu $t0, $t0, $s2         Loop:   addi $s1,$s1, -16      lw $t0, 0($s1)              1
      sw   $t0, 0($s1)                                          lw $t1,12($s1) 2
      addi $s1, $s1, -4                  addu $t0, $t0, $s2     lw $t2,8($s1)               3
      bne $s1, $zero, Loop                                                                  4
                                         addu $t1, $t1, $s2     lw $t3,4($s1)
                                         addu $t2, $t2, $s2     sw $t0,16($s1) 5
 • Ans:
                                         addu $t3, $t3, $s2     sw $t1,12($s1) 6
    – 8/4 clocks per iteration
                                                                sw $t2,8($s1)               7
    – CPI = 8/14=0.57                    bne $s1, $zero, Loop                               8
                                                                sw $t3,4($s1)

                                                                ©2004 Morgan Kaufmann Publishers   46
The BIG Picture

 • Both pipelining and multiple-issue execution
   increase peak instr throughput.
 • Longer pipelines and wider multiple-issue put even
   more pressure on the compiler to deliver on the
   performance potential of the hardware.
 • Hardware designers must ensure correct execution
   of all instr sequences.
 • Compiler writers must understand the pipeline to
   generate the appropriate code and then to achieve
   best performance.




                                           ©2004 Morgan Kaufmann Publishers   47
Dynamic Pipeline Scheduling

• SuperScalar processor – the pipeline is divided into three
  major units
   1. an instr fetch and decode unit:
        « fetches instrs, decodes them, & sends each instr to related
          functional units
   2. functional units (FUs):
        « Reservation station: each FU has buffers
        « Once the buffer contains all its operands and the functional
          unit is ready to execute, the result is calculated.
   3. a commit unit:
        « decide when to put the result into the reg file or memory




                                                         ©2004 Morgan Kaufmann Publishers   48
The Dynamically scheduled Pipeline

                                       Instruction fetch                                  In-order issue
                                       and decode unit




             Reservation   Reservation        …      Reservation      Reser vation
               station       station                   station          station



                                                           Floating     Load/                 Out-of-order
Functional    Integer        Integer          …                                           Out-of-order execute
  units                                                     point       Store                  execution




                                                                                          In-order commit
                                           Commit
                                            unit


                                                                                 ©2004 Morgan Kaufmann Publishers   49
The Dynamically scheduled Pipeline

 • Motivations for dynamic scheduling:
   – Not all stalls are predictable (e.g., cache miss). (Ch7)
   – If dynamic branch prediction is used (it cannot know the
     execution order of instruction at compile time)
   – Pipeline latency and issue width change from one
     implementation to another.
          Dynamic scheduling allows to hide the multiple
         versions of hardware implementations of the same
         instruction set.
          Old code will get benefit of a new implementation
         without the need for recompilation.




                                                  ©2004 Morgan Kaufmann Publishers   50

Contenu connexe

Tendances

Addressing modes of 8051
Addressing modes of 8051Addressing modes of 8051
Addressing modes of 8051SARITHA REDDY
 
Microcontroller-8051.ppt
Microcontroller-8051.pptMicrocontroller-8051.ppt
Microcontroller-8051.pptDr.YNM
 
Assembler directives and basic steps ALP of 8086
Assembler directives and basic steps ALP of 8086Assembler directives and basic steps ALP of 8086
Assembler directives and basic steps ALP of 8086Urvashi Singh
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architecturesAmit Kumar Rathi
 
Architecture of 8085 microprocessor
Architecture of 8085 microprocessorArchitecture of 8085 microprocessor
Architecture of 8085 microprocessorAMAN SRIVASTAVA
 
An introduction to microprocessor architecture using INTEL 8085 as a classic...
An introduction to microprocessor  architecture using INTEL 8085 as a classic...An introduction to microprocessor  architecture using INTEL 8085 as a classic...
An introduction to microprocessor architecture using INTEL 8085 as a classic...Prasad Deshpande
 
Session 6 sv_randomization
Session 6 sv_randomizationSession 6 sv_randomization
Session 6 sv_randomizationNirav Desai
 
Introduction to pic microcontroller
Introduction to pic microcontrollerIntroduction to pic microcontroller
Introduction to pic microcontrollerSiva Kumar
 
Architecture of 8086 Microprocessor
Architecture of 8086 Microprocessor  Architecture of 8086 Microprocessor
Architecture of 8086 Microprocessor Mustapha Fatty
 
Subroutine in 8051 microcontroller
Subroutine in 8051 microcontrollerSubroutine in 8051 microcontroller
Subroutine in 8051 microcontrollerbhadresh savani
 
8085 interfacing with memory chips
8085 interfacing with memory chips8085 interfacing with memory chips
8085 interfacing with memory chipsSrikrishna Thota
 
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSOR
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSORTRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSOR
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSORSubash Sambath Kumar
 

Tendances (20)

Addressing modes of 8051
Addressing modes of 8051Addressing modes of 8051
Addressing modes of 8051
 
Microcontroller-8051.ppt
Microcontroller-8051.pptMicrocontroller-8051.ppt
Microcontroller-8051.ppt
 
Assembler directives and basic steps ALP of 8086
Assembler directives and basic steps ALP of 8086Assembler directives and basic steps ALP of 8086
Assembler directives and basic steps ALP of 8086
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architectures
 
8051 block diagram
8051 block diagram8051 block diagram
8051 block diagram
 
Architecture of 8085 microprocessor
Architecture of 8085 microprocessorArchitecture of 8085 microprocessor
Architecture of 8085 microprocessor
 
An introduction to microprocessor architecture using INTEL 8085 as a classic...
An introduction to microprocessor  architecture using INTEL 8085 as a classic...An introduction to microprocessor  architecture using INTEL 8085 as a classic...
An introduction to microprocessor architecture using INTEL 8085 as a classic...
 
Session 6 sv_randomization
Session 6 sv_randomizationSession 6 sv_randomization
Session 6 sv_randomization
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
CISC & RISC Architecture
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture
 
Introduction to pic microcontroller
Introduction to pic microcontrollerIntroduction to pic microcontroller
Introduction to pic microcontroller
 
Architecture of 8086 Microprocessor
Architecture of 8086 Microprocessor  Architecture of 8086 Microprocessor
Architecture of 8086 Microprocessor
 
Microprocessor ppt
Microprocessor pptMicroprocessor ppt
Microprocessor ppt
 
Verilog hdl
Verilog hdlVerilog hdl
Verilog hdl
 
Subroutine in 8051 microcontroller
Subroutine in 8051 microcontrollerSubroutine in 8051 microcontroller
Subroutine in 8051 microcontroller
 
Interfacing Stepper motor with 8051
Interfacing Stepper motor with 8051Interfacing Stepper motor with 8051
Interfacing Stepper motor with 8051
 
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
 
8085 interfacing with memory chips
8085 interfacing with memory chips8085 interfacing with memory chips
8085 interfacing with memory chips
 
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSOR
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSORTRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSOR
TRAFFIC LIGHT CONTROL SYSTEM USING 8085 MICROPROCESSOR
 
Intel 8051 Programming in C
Intel 8051 Programming in CIntel 8051 Programming in C
Intel 8051 Programming in C
 

Similaire à Advanced pipelining

Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelinesturki_09
 
Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascaleMarc Snir
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.ppt
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.pptpipehhhhhhhhhhhhhbbbbbbbbblinehazards.ppt
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.pptAkkiDongre
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
Design for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTDesign for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTjayasreenimmakuri777
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.The Linux Foundation
 
Ct213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardCt213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardrakeshrakesh2020
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipeliningjagrat123
 

Similaire à Advanced pipelining (20)

Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascale
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.ppt
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.pptpipehhhhhhhhhhhhhbbbbbbbbblinehazards.ppt
pipehhhhhhhhhhhhhbbbbbbbbblinehazards.ppt
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
Lect06
Lect06Lect06
Lect06
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 
Lecture7
Lecture7Lecture7
Lecture7
 
L21.fa13
L21.fa13L21.fa13
L21.fa13
 
Design for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTDesign for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFT
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
 
Ct213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardCt213 processor design_pipelinehazard
Ct213 processor design_pipelinehazard
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Dernier (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Advanced pipelining

  • 1. Advanced Pipelining • Superpiplining: Increase the depth of the pipeline (deep pipline) – to overlap more instructions • Multiple issue: start more than one instruction each cycle – To have CPI<1 • Loop unrolling : a technique to get better instr scheduling – To expose more ILP • “Superscalar” processors – DEC Alpha 21264: 9 stage pipeline, 6 instruction issue – dynamic multiple issue: processor dynamically chooses which instructions to execute in a given cycle while trying to avoid hazard. • VLIW: very long instruction word, static multiple issue (relies more on compiler technology - packing instructions and handling hazard) ©2004 Morgan Kaufmann Publishers 40
  • 2. Advanced Pipelining • Static multiple issue – compiler decides multiple issue before execution • Dynamic multiple issue – processor decides multiple issue during execution • Problems of multiple issue – How to package instructions into issue slots – How to deal with data and control hazard • Speculation – the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions ©2004 Morgan Kaufmann Publishers 41
  • 3. Static Multiple Issue • Issue packet – the set of instructions that issue together in a clock cycle • SMI concept – regard an issue packet as one large instruction with multiple operations – Very Long Instruction Word (VLIW) or Explicitly Parallel Instruction Computer (EPIC) by intel IA-64 • Assume two instrs may be issued per clock cycle: – 1 for an integer ALU op or branch – 1 for a load or store ©2004 Morgan Kaufmann Publishers 42
  • 4. A Static Two-issue Datapath ©2004 Morgan Kaufmann Publishers 43
  • 5. Static Multiple Issues • Extra resources (issuing 2 instrs per cycle) – Another 32bits from instruction memory – need extra ports in the register file – Another ALU handling address calculation for data transfer • Without these extra resources ⇒ structural hazards • More ambitious compiler or h/w scheduling technique – loads have a latency of 1 clock cycle in simple five-stage pipeline • In two-issue pipeline, the next two inst cannot use the load result without stalling. – ALU that has no use latency in simple five-stage pipeline • Become 1-instr use latency (the result cannot be used in paired instr) ©2004 Morgan Kaufmann Publishers 44
  • 6. Example: Multiple-issue Code Scheduling • How would this loop be scheduled on a two-issue pipeline for MIPS? Reorder the instrs to avoid as many pipeline stalls as possible. Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop • Ans: 4 clocks per loop iteration CPI = 4/5= 0.8 ALU or branch inst. Data transfer inst. Clock cycle Loop: lw $t0, 0($s1) 1 addi $s1, $s1, -4 2 addu $t0, $t0, $s2 3 bne $s1, $zero,Loop sw $t0, 4($s1) 4 ©2004 Morgan Kaufmann Publishers 45
  • 7. Example: Loop Unrolling for Multiple-issue Pipelines Loop unrolling: • multiple copies of the loop body are made & instrs from different iterations are scheduled together • Register renaming - remove antidependence (name dependence) Ex. Assume the loop index is a multiple of four ALU or branch inst. Data transfer Clock Loop: lw $t0, 0($s1) inst. cycle addu $t0, $t0, $s2 Loop: addi $s1,$s1, -16 lw $t0, 0($s1) 1 sw $t0, 0($s1) lw $t1,12($s1) 2 addi $s1, $s1, -4 addu $t0, $t0, $s2 lw $t2,8($s1) 3 bne $s1, $zero, Loop 4 addu $t1, $t1, $s2 lw $t3,4($s1) addu $t2, $t2, $s2 sw $t0,16($s1) 5 • Ans: addu $t3, $t3, $s2 sw $t1,12($s1) 6 – 8/4 clocks per iteration sw $t2,8($s1) 7 – CPI = 8/14=0.57 bne $s1, $zero, Loop 8 sw $t3,4($s1) ©2004 Morgan Kaufmann Publishers 46
  • 8. The BIG Picture • Both pipelining and multiple-issue execution increase peak instr throughput. • Longer pipelines and wider multiple-issue put even more pressure on the compiler to deliver on the performance potential of the hardware. • Hardware designers must ensure correct execution of all instr sequences. • Compiler writers must understand the pipeline to generate the appropriate code and then to achieve best performance. ©2004 Morgan Kaufmann Publishers 47
  • 9. Dynamic Pipeline Scheduling • SuperScalar processor – the pipeline is divided into three major units 1. an instr fetch and decode unit: « fetches instrs, decodes them, & sends each instr to related functional units 2. functional units (FUs): « Reservation station: each FU has buffers « Once the buffer contains all its operands and the functional unit is ready to execute, the result is calculated. 3. a commit unit: « decide when to put the result into the reg file or memory ©2004 Morgan Kaufmann Publishers 48
  • 10. The Dynamically scheduled Pipeline Instruction fetch In-order issue and decode unit Reservation Reservation … Reservation Reser vation station station station station Floating Load/ Out-of-order Functional Integer Integer … Out-of-order execute units point Store execution In-order commit Commit unit ©2004 Morgan Kaufmann Publishers 49
  • 11. The Dynamically scheduled Pipeline • Motivations for dynamic scheduling: – Not all stalls are predictable (e.g., cache miss). (Ch7) – If dynamic branch prediction is used (it cannot know the execution order of instruction at compile time) – Pipeline latency and issue width change from one implementation to another. Dynamic scheduling allows to hide the multiple versions of hardware implementations of the same instruction set. Old code will get benefit of a new implementation without the need for recompilation. ©2004 Morgan Kaufmann Publishers 50