SlideShare une entreprise Scribd logo
1  sur  10
Branch prediction

      Titov Alexander
                MDSP

        November, 2009
Agenda
     Reasons to use
     Static (software) prediction
     Saturating counter
     Two-level adaptive predictor
     Local/global predictors
     Branch target predictor
     Competitive approaches

    A branch predictor is a digital circuit that tries to guess which way a
    branching (e.g. an if then-else structure) will go before this is known
    for sure.



                                MDSP project, Intel Lab,
                         Moscow Institute of Physics and Technology
2
The 1st reason: the gap in pipeline
                                                                Condition is
                                                            calculated only here


                 Conditional branch                FE    DEC      EXE       MEM    WB
                                       not taken

                     instruction                          FE      DEC        EXE   MEM   WB


         taken       instruction                                    FE      DEC    EXE   MEM   WB


                        ...
                                                            the gap if
                                                              taken
                     instruction                                              FE   DEC   EXE   MEM   WB

                     instruction                                                   FE    DEC   EXE   MEM


     The time that is wasted in case of a branch misprediction is equal to the number of
      stages in the pipeline from the fetch stage to the execute stage. Modern
      microprocessors tend to have quite long pipelines so that the misprediction delay is
      between 10 and 20 cycles (if not consider misses in caches).


                                              MDSP project, Intel Lab,
                                      Moscow Institute of Physics and Technology
3
The 2nd reason: parallel execution
     A processor use special technique to identify instruction can be executed in
      parallel, but the linear part (instruction between two branches) is to small to
      select enough instruction (every 5-8th instruction is a branch).
     The branch prediction can increase the linear part:

           Control flow graph                                                          Prediction
                                                       The present
                       1
                                                         is here
                                                                                 Predicted linear part
                                      loop,
                                      2 iter                                                                     exit
                                                 Branch
                                                predictor         1      3      6       1    3      6    1   3
              2               3



                           exit
       4           5              6                    The linear part increases significantly, which means
                                                       that a processor can find and execute more
                                                       instructions in parallel.
       critical path


                                                 MDSP project, Intel Lab,
                                          Moscow Institute of Physics and Technology
4
Static (software) prediction
     There are situations as the branch direction can be statically predicted with a
      good probability:
       – Backwards branches
       – Exception and error handling
       – Some branches meet with specific heuristics
     The backwards branch is one that has a target address that is lower than its
      own address. Therefore SBP can help with prediction accuracy of loops, which
      are usually backward-pointing branches, and are taken more often than not
      taken.
     Some processors allow branch prediction hints to be inserted into the code to
      tell whether the static prediction should be taken or not taken. In this case
      also it is possible to use the profile information (the history of execution of a
      program).
     Static prediction is used as a fall-back technique in some processors with
      dynamic branch prediction when there isn't any information for dynamic
      predictors to use.

                                     MDSP project, Intel Lab,
                              Moscow Institute of Physics and Technology
5
Saturating counter
     A saturating counter or bimodal predictor is a state machine with four states:




       When a branch is evaluated, the corresponding state machine is updated.
       Branches evaluated as not taken (NT) decrement the state towards strongly
       not taken, and branches evaluated as taken (T) increment the state towards
       strongly taken

     The advantage of the two-bit counter over a one-bit scheme is that a
      conditional jump has to deviate twice from what it has done most in the past
      before the prediction changes. For example, a loop-closing conditional jump
      is mispredicted once rather than twice.


                                   MDSP project, Intel Lab,
                            Moscow Institute of Physics and Technology
6
Two-level adaptive predictor
     Conditional jumps that are taken every second time or have some other
      regularly recurring pattern are not predicted well by the saturating counter,
      but a adaptive predictor can easy solve this problem

                   present
                                                           Pattern history table
        future                 past

                                                                            strongly   weakly   weakly   strongly
Branch                                                        00               NT       NT        T          T
sequence:

1 0 0 1 0 0
                                                                            strongly   weakly   weakly   strongly
                        History                               01               NT       NT        T          T
                        buffer

                                                                            strongly   weakly   weakly   strongly
                                                              10               NT       NT        T          T

                 Prediction:          0 0 0 0 0 1 0 0 0
                                                                            strongly   weakly   weakly   strongly
                 Result:   TRUE
                           FALSE                              11               NT       NT        T          T




                                      MDSP project, Intel Lab,
                               Moscow Institute of Physics and Technology
7
Local/global predictors
     Local prediction
       – Local branch predictor has a separate history buffer for each conditional jump
         instruction.
       – Pattern history table may be separate as well or it may be shared between all
         conditional jumps.
       – Accuracy of the local predictor is equal about 97.1%.
     Global prediction
       – Global branch keeps a shared history of all conditional jumps: in this case any
         correlation between different conditional jumps is utilized in the prediction making,
         but when there is no correlation the prediction can decrease.
       – This predictor must have a long history buffer in order to make a good prediction.
         The size of the pattern history table grows exponentially with the size of the history
         buffer. Therefore, the big pattern history table is shared between all conditional
         jumps.
       – Accuracy of the global branch predictor is equal about 96.6%, which is just a little
         worse than large local predictors.
     To do the best prediction real branch predictors are made as a
      combination of many predictors including global, local and other
      more specific ones.

                                     MDSP project, Intel Lab,
                              Moscow Institute of Physics and Technology
8
Branch target prediction
     The address of a dynamic branch isn’t known statically in the
      moment of compilation. It is calculated in the moment of
      execution of the program.

     A branch target predictor is the part of a processor that
      predicts the target address of a taken dynamic conditional
      branch or an dynamic unconditional branch instruction before
      the target address of the branch instruction is computed by
      the execution unit of the processor.

     Note! Branch target prediction is not the same as branch
      prediction. Branch prediction attempts to guess whether a
      conditional branch will be taken or not-taken, but branch
      target one predicts the value of target address.


                              MDSP project, Intel Lab,
                       Moscow Institute of Physics and Technology
9
Competitive approaches
      Almost all x86 processors use the branch prediction approach,
       but branch prediction is not the just one way to reach the
       wide parallel execution and to avoid gaps in pipeline.
      Branch predictor is a large and very sophisticated circuit. It
       consumes a lot of energy. Some mobile processors use only a
       very simple branch predictor to decrease their power and
       area.
      There are architectures provide another approaches. For
       example, in VLIW (Very Long Instruction Word) architecture
       independent instructions are combined into a long instruction
       by compiler and executed in parallel in the processor. And to
       avoid the gaps delay branches are used.




                               MDSP project, Intel Lab,
                        Moscow Institute of Physics and Technology
10

Contenu connexe

En vedette

Comp architecture : branch prediction
Comp architecture : branch predictionComp architecture : branch prediction
Comp architecture : branch predictionrinnocente
 
Evaluation of Branch Predictors
Evaluation of Branch PredictorsEvaluation of Branch Predictors
Evaluation of Branch PredictorsBharat Biyani
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysisPinky Vincent
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsJose Pinilla
 
Intel CPU Manufacturing Process
Intel CPU Manufacturing ProcessIntel CPU Manufacturing Process
Intel CPU Manufacturing ProcessA B Shinde
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

En vedette (9)

Comp architecture : branch prediction
Comp architecture : branch predictionComp architecture : branch prediction
Comp architecture : branch prediction
 
Evaluation of Branch Predictors
Evaluation of Branch PredictorsEvaluation of Branch Predictors
Evaluation of Branch Predictors
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
 
Intel CPU Manufacturing Process
Intel CPU Manufacturing ProcessIntel CPU Manufacturing Process
Intel CPU Manufacturing Process
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similaire à [2009 11-09] branch prediction

Reading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxReading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxcongtran88
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicEric Verhulst
 
Interferece Management in LTE Femtocell
Interferece Management in LTE FemtocellInterferece Management in LTE Femtocell
Interferece Management in LTE Femtocelltoha ardi nugraha
 
The operational amplifier (part 1)
The operational amplifier (part 1)The operational amplifier (part 1)
The operational amplifier (part 1)Jamil Ahmed
 
Detecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJDetecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJCoen De Roover
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdownAditya Kamat
 
LDBP: localized boundary detection and parametrization for 3 d sensor networks
LDBP: localized boundary detection and parametrization for 3 d sensor networksLDBP: localized boundary detection and parametrization for 3 d sensor networks
LDBP: localized boundary detection and parametrization for 3 d sensor networksPapitha Velumani
 
Facial expression recognition
Facial expression recognitionFacial expression recognition
Facial expression recognitionMitul Rawat
 
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Nicolas Navet
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structuresayubimoak
 
Ee395 lab 2 - loren - victor - taylor
Ee395   lab 2 - loren - victor - taylorEe395   lab 2 - loren - victor - taylor
Ee395 lab 2 - loren - victor - taylorLoren Schwappach
 

Similaire à [2009 11-09] branch prediction (20)

Instruction pipelining (ii)
Instruction pipelining (ii)Instruction pipelining (ii)
Instruction pipelining (ii)
 
Reading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxReading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptx
 
EC561
EC561EC561
EC561
 
Embedded-Project
Embedded-ProjectEmbedded-Project
Embedded-Project
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 Altreonic
 
Interferece Management in LTE Femtocell
Interferece Management in LTE FemtocellInterferece Management in LTE Femtocell
Interferece Management in LTE Femtocell
 
The operational amplifier (part 1)
The operational amplifier (part 1)The operational amplifier (part 1)
The operational amplifier (part 1)
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Detecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJDetecting aspect-specific code smells using Ekeko for AspectJ
Detecting aspect-specific code smells using Ekeko for AspectJ
 
Graph processing
Graph processingGraph processing
Graph processing
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdown
 
Vsync track c
Vsync   track cVsync   track c
Vsync track c
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
LDBP: localized boundary detection and parametrization for 3 d sensor networks
LDBP: localized boundary detection and parametrization for 3 d sensor networksLDBP: localized boundary detection and parametrization for 3 d sensor networks
LDBP: localized boundary detection and parametrization for 3 d sensor networks
 
Facial expression recognition
Facial expression recognitionFacial expression recognition
Facial expression recognition
 
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
 
ZERO WIRE LOAD MODEL.pptx
ZERO WIRE LOAD MODEL.pptxZERO WIRE LOAD MODEL.pptx
ZERO WIRE LOAD MODEL.pptx
 
Ee395 lab 2 - loren - victor - taylor
Ee395   lab 2 - loren - victor - taylorEe395   lab 2 - loren - victor - taylor
Ee395 lab 2 - loren - victor - taylor
 
Shuronr
ShuronrShuronr
Shuronr
 

Dernier

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 

Dernier (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

[2009 11-09] branch prediction

  • 1. Branch prediction Titov Alexander MDSP November, 2009
  • 2. Agenda  Reasons to use  Static (software) prediction  Saturating counter  Two-level adaptive predictor  Local/global predictors  Branch target predictor  Competitive approaches A branch predictor is a digital circuit that tries to guess which way a branching (e.g. an if then-else structure) will go before this is known for sure. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 2
  • 3. The 1st reason: the gap in pipeline Condition is calculated only here Conditional branch FE DEC EXE MEM WB not taken instruction FE DEC EXE MEM WB taken instruction FE DEC EXE MEM WB ... the gap if taken instruction FE DEC EXE MEM WB instruction FE DEC EXE MEM  The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 cycles (if not consider misses in caches). MDSP project, Intel Lab, Moscow Institute of Physics and Technology 3
  • 4. The 2nd reason: parallel execution  A processor use special technique to identify instruction can be executed in parallel, but the linear part (instruction between two branches) is to small to select enough instruction (every 5-8th instruction is a branch).  The branch prediction can increase the linear part: Control flow graph Prediction The present 1 is here Predicted linear part loop, 2 iter exit Branch predictor 1 3 6 1 3 6 1 3 2 3 exit 4 5 6 The linear part increases significantly, which means that a processor can find and execute more instructions in parallel. critical path MDSP project, Intel Lab, Moscow Institute of Physics and Technology 4
  • 5. Static (software) prediction  There are situations as the branch direction can be statically predicted with a good probability: – Backwards branches – Exception and error handling – Some branches meet with specific heuristics  The backwards branch is one that has a target address that is lower than its own address. Therefore SBP can help with prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not taken.  Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. In this case also it is possible to use the profile information (the history of execution of a program).  Static prediction is used as a fall-back technique in some processors with dynamic branch prediction when there isn't any information for dynamic predictors to use. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 5
  • 6. Saturating counter  A saturating counter or bimodal predictor is a state machine with four states: When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken (NT) decrement the state towards strongly not taken, and branches evaluated as taken (T) increment the state towards strongly taken  The advantage of the two-bit counter over a one-bit scheme is that a conditional jump has to deviate twice from what it has done most in the past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 6
  • 7. Two-level adaptive predictor  Conditional jumps that are taken every second time or have some other regularly recurring pattern are not predicted well by the saturating counter, but a adaptive predictor can easy solve this problem present Pattern history table future past strongly weakly weakly strongly Branch 00 NT NT T T sequence: 1 0 0 1 0 0 strongly weakly weakly strongly History 01 NT NT T T buffer strongly weakly weakly strongly 10 NT NT T T Prediction: 0 0 0 0 0 1 0 0 0 strongly weakly weakly strongly Result: TRUE FALSE 11 NT NT T T MDSP project, Intel Lab, Moscow Institute of Physics and Technology 7
  • 8. Local/global predictors  Local prediction – Local branch predictor has a separate history buffer for each conditional jump instruction. – Pattern history table may be separate as well or it may be shared between all conditional jumps. – Accuracy of the local predictor is equal about 97.1%.  Global prediction – Global branch keeps a shared history of all conditional jumps: in this case any correlation between different conditional jumps is utilized in the prediction making, but when there is no correlation the prediction can decrease. – This predictor must have a long history buffer in order to make a good prediction. The size of the pattern history table grows exponentially with the size of the history buffer. Therefore, the big pattern history table is shared between all conditional jumps. – Accuracy of the global branch predictor is equal about 96.6%, which is just a little worse than large local predictors.  To do the best prediction real branch predictors are made as a combination of many predictors including global, local and other more specific ones. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 8
  • 9. Branch target prediction  The address of a dynamic branch isn’t known statically in the moment of compilation. It is calculated in the moment of execution of the program.  A branch target predictor is the part of a processor that predicts the target address of a taken dynamic conditional branch or an dynamic unconditional branch instruction before the target address of the branch instruction is computed by the execution unit of the processor.  Note! Branch target prediction is not the same as branch prediction. Branch prediction attempts to guess whether a conditional branch will be taken or not-taken, but branch target one predicts the value of target address. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 9
  • 10. Competitive approaches  Almost all x86 processors use the branch prediction approach, but branch prediction is not the just one way to reach the wide parallel execution and to avoid gaps in pipeline.  Branch predictor is a large and very sophisticated circuit. It consumes a lot of energy. Some mobile processors use only a very simple branch predictor to decrease their power and area.  There are architectures provide another approaches. For example, in VLIW (Very Long Instruction Word) architecture independent instructions are combined into a long instruction by compiler and executed in parallel in the processor. And to avoid the gaps delay branches are used. MDSP project, Intel Lab, Moscow Institute of Physics and Technology 10

Notes de l'éditeur

  1. The branch sequence is the sequence of the branch behaviour. Zero means “not taken”, one means “taken”. Default state for all counters in the pattern history table is “weakly NT”. The counters doesn’t change their state while the history buffer isn’t full.