SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
CSL718 : Pipelined Processors


        PipelineTimings
         15th Jan, 2009


         Anshul Kumar, CSE IITD
Pipelined Processors
                   Pipelined Processors

                                     Parallel architectures




                            Function-parallel          Data-parallel


      Instr level (ILP)        Thread level        Process level

                                           Intel’s terminology:
 Pipelined VLIWs Superscalar                    • intra ILP
processors       processors                     • inter ILP
                                                              slide 2
   Anshul Kumar, CSE IITD
Ideal Pipelining

           Tinst
                          S stages




                                     slide 3
Anshul Kumar, CSE IITD
Determining Clock Period
                            P
                                          Reg
           Reg             Comb


       Clock


                           Δt
Δt ≥ P                      Δt = Pmax
P = propagation delay       Pmax = max propagation delay

                                                  slide 4
  Anshul Kumar, CSE IITD
Ideal Pipelining

           Tinst
                              S stages

                                         Pmax = Tinst / S



   Δt = Tinst / S         Effective CPI = 1
   Effective time per inst Teff = CPI * Δt
                                = 1 * Tinst / S
                                                    slide 5
Anshul Kumar, CSE IITD
Pipelining with hazards

           Tinst
                               S stages



                              Frequency of interruptions - b


   Δt = Tinst / S
   CPI = 1 + (S - 1) * b
   Teff = (1 + (S - 1) * b) * Tinst / S
                                                    slide 6
Anshul Kumar, CSE IITD
Teff vs. S          (Tinst = 10)
       12

       10

       8
                                                         b = .2
Teff




       6                                                 b = .1
                                                         b = .05
       4

       2

       0
            1   2   3   4   5       6     7   8   9 10
                                S
A more realistic view
                               P
                                                Reg
         Reg                  Comb


   Clock


      Register output delay          Register setup time

                         Clock skew



                                                           slide 8
Anshul Kumar, CSE IITD
Clocking Overhead
• Fixed overhead            c
   – Setup time
   – Output delay
• Variable overhead
     (stretching factor) k
   – Clock skew
Δt = Pmax + k * Pmax + c
  = (1 + k) * Tinst / S + c
Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c]

                                                       slide 9
Anshul Kumar, CSE IITD
Teff vs. S (Tinst = 10, c = 1, k = .1)
  14
  12
  10
       8                                            b = .2
Teff




                                                    b = .1
       6
                                                    b = .05
       4
       2
       0
           1   3   5    7       9 11 13 15
                            S
Pipelining with Clocking Overhead
 Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c]

 Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)]




                                                  slide 11
  Anshul Kumar, CSE IITD
Partitioning instruction into cycles
  with non-uniform stage times
        non-uniform



   One action - one pipeline stage
   => large quantization overhead

            Multiple actions per stage?
            Multiple stages per action?
                                          slide 12
 Anshul Kumar, CSE IITD
Example                   Put Away 2 ns

                         Execute 7+7+8 ns

                         Data - ALU 3 ns
                         Cache Data 10 ns
                          Cache Dir 6 ns
                         Addr - MAR 3 ns
                          Gen Addr 9ns
                          Decode 6+6 ns
                          Data - IR 3 ns
                         Cache Data 10 ns
                          Cache Dir 6 ns
                          PC - MAR 4 ns     slide 13
Anshul Kumar, CSE IITD
Optimal Pipelining
Tinst = 4+6+10+3+12+9+3+6+10+3+22+2
  = 90 ns
b = 0.2     c = 4 ns k = 5%

Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)]
      = 9.7 ⇒ 9
Pmax = 10 ns

                                                 slide 14
Anshul Kumar, CSE IITD
Example                   Put Away 2 ns

                         Execute 7+7+8 ns

                         Data - ALU 3 ns
Pmax = 10 ns             Cache Data 10 ns
                          Cache Dir 6 ns
                         Addr - MAR 3 ns
                          Gen Addr 9ns
S = 10
Δt = 14.5 ns              Decode 6+6 ns
S * Δt = 145 ns           Data - IR 3 ns
                         Cache Data 10 ns
                          Cache Dir 6 ns
                          PC - MAR 4 ns     slide 15
Anshul Kumar, CSE IITD
Example                   Put Away 2 ns

                         Execute 7+7+8 ns

                         Data - ALU 3 ns
S=9                      Cache Data 10 ns
                          Cache Dir 6 ns
                         Addr - MAR 3 ns
                          Gen Addr 9ns
Pmax = 13 ns
Δt = 17.65 ns             Decode 6+6 ns
S * Δt = 159 ns           Data - IR 3 ns
                         Cache Data 10 ns
                          Cache Dir 6 ns
                          PC - MAR 4 ns     slide 16
Anshul Kumar, CSE IITD
Example                   Put Away 2 ns

                         Execute 7+7+8 ns

                         Data - ALU 3 ns
Pmax = 20 ns             Cache Data 10 ns
                          Cache Dir 6 ns
                         Addr - MAR 3 ns
                          Gen Addr 9ns
S=5
Δt = 25 ns                Decode 6+6 ns
S * Δt = 125 ns           Data - IR 3 ns
                         Cache Data 10 ns
                          Cache Dir 6 ns
                          PC - MAR 4 ns     slide 17
Anshul Kumar, CSE IITD
Comparison

                         Δt      S * Δt
  S            Pmax                       Teff

  9            13        17.65   159      45.89

  10           10        14.50   145      40.60

  5            20        25.00   125      45.00



                                                 slide 18
Anshul Kumar, CSE IITD
Cycle Quantization
Delays are not integral multiple of clock period
Total overhead = clocking overhead
                   + quantization overhead
Δt ≥ Tinst / S + c            (ignoring k)
∴ S * Δt ≥ Tinst + S * c
Quantization overhead = S * (Δt - c) -Tinst
This reduces as clock period becomes small

                                           slide 19
Anshul Kumar, CSE IITD
Other Timing Approaches
• Self Timed Circuits
   – No centralized free running clock
   – An operation begins as soon as its inputs are
     available, that is, all its predecessors have
     completed
   – Higher speed, lower power consumption
• Wave Pipelining
   – Omit inter-stage registers
   – Reduced clocking overhead
                                                 slide 20
Anshul Kumar, CSE IITD
Conventional vs Wave Pipelining
 Conventional vs Wave Pipelining
Conventional Pipeline          Wave Pipeline
• Registers separate           • No registers between
  adjoining stages               adjoining stages
• Clock period > max prop      • Clock period less than
  delay                          max prop delay
• Inter-stage data stored in   • Waves of data propagate
  registers                      through combinational
                                 network (effectively, data
                                 is stored in the
                                 combinational circuit
                                 delay!)

                                                       slide 21
Anshul Kumar, CSE IITD
No pipelining
             Reg X                  X’ Reg Y




Clock


 X
 X’
 Y


                                               slide 22
      Anshul Kumar, CSE IITD
Conventional pipelining
             Reg X            X’   Y    Y’ Z   Z’ Reg W




Clock


 X
        X’
             Y
                     Y’
                          Z
                                   Z’
                                        W
Wave pipelining
            Reg X                      Z’ Reg W




Clock


 X




                              Z’
                                                  slide 24
     Anshul Kumar, CSE IITD        W
Timing
            Reg                                      Reg

                              Comb ckt
                X                                Y
Clock
                                                             T≥p+s
                                     T
                                clock period

 X
 Y

                           p                        s
                    propagation delay          set-up time
                                                              slide 25
     Anshul Kumar, CSE IITD
Timing with clock skew
            Reg                                        Reg

                              Comb ckt
                X                                  Y
Clock
                                       T
Clock skew = ±δ

 X
 Y

                              p                        s
                                                             δ
                    δ             T ≥ p + s + 2δ
                                                                 slide 26
     Anshul Kumar, CSE IITD
Variation in propagation delay
• Different delays in different paths
• Delay variation due to process /
  temperature/ power variations
• Data-dependent delay variations




                                        slide 27
Anshul Kumar, CSE IITD
Timing for wave pipelining
            Reg                                       Reg

                              Comb ckt
                X                                 Y
Clock
                                                  T

     ±δ

 X
                                                      Δp
                              pmin
 Y                            pmax


                               T ≥ Δ p + s + 4δ             slide 28
     Anshul Kumar, CSE IITD
Timing for wave pipelining
            (expanded view)
                             T



X
                                        Δp
Y


                                             nT
              (n-1) T      pmin     pmax
                         pmin ≥ (n-1) T + 2δ
                         nT ≥ pmax + s + 2δ
                        ⇒T    ≥ Δ p + s + 4δ
                                                  slide 29
    Anshul Kumar, CSE IITD
Comparison
Conventional Pipeline           Wave Pipeline
   T ≥ pmax/n + s + 2δ            T ≥ Δ p + s + 4δ
     (plus cycle quantization
      overhead)
   nT ≥ pmax + ns + 2nδ           nT ≥ pmax + s + 2δ




                                                       slide 30
Anshul Kumar, CSE IITD
Problems with wave pipelining
•   Need to balance delays
•   Narrow range of clock frequencies
•   Control difficult
•   Not very suitable for non-linear pipelines




                                             slide 31
Anshul Kumar, CSE IITD
References
1.    M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel
      Processor Designquot;, Narosa Publishing House/ Jones and
      Bartlett, 1996.
2.    Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and
      Wentai Liu, “Wave-Pipelining: A Tutorial and Research
      Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3,
      September 1998, pp. 464 – 474.




                                                            slide 32
     Anshul Kumar, CSE IITD

Contenu connexe

Tendances

Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Rong (Carina) Wang
 
R&D on CsI(Tl) crystals + LAAPD
R&D on CsI(Tl) crystals + LAAPD  R&D on CsI(Tl) crystals + LAAPD
R&D on CsI(Tl) crystals + LAAPD Martin Gascon
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Use Ruby GC in full..
Use Ruby GC in full..Use Ruby GC in full..
Use Ruby GC in full..Alex Mercer
 
Centennial Talk Hydrates
Centennial Talk HydratesCentennial Talk Hydrates
Centennial Talk Hydratesstalnaker
 
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresKostis Kyzirakos
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Shao-Yen Hung
 

Tendances (9)

Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity
 
DaWaK'07
DaWaK'07DaWaK'07
DaWaK'07
 
R&D on CsI(Tl) crystals + LAAPD
R&D on CsI(Tl) crystals + LAAPD  R&D on CsI(Tl) crystals + LAAPD
R&D on CsI(Tl) crystals + LAAPD
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Use Ruby GC in full..
Use Ruby GC in full..Use Ruby GC in full..
Use Ruby GC in full..
 
Centennial Talk Hydrates
Centennial Talk HydratesCentennial Talk Hydrates
Centennial Talk Hydrates
 
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)
 

En vedette

The Other Social, Collaboration Days 2014
The Other Social, Collaboration Days 2014The Other Social, Collaboration Days 2014
The Other Social, Collaboration Days 2014Stefan Heinz
 
Lec Jan22 2009
Lec Jan22 2009Lec Jan22 2009
Lec Jan22 2009Ravi Soni
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln ViewRavi Soni
 
마케팅전쟁 Sp
마케팅전쟁 Sp마케팅전쟁 Sp
마케팅전쟁 Spytkim
 
Lec 2 Multidisciplinary 183
Lec 2  Multidisciplinary 183Lec 2  Multidisciplinary 183
Lec 2 Multidisciplinary 183Ravi Soni
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009Ravi Soni
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009Ravi Soni
 

En vedette (8)

The Other Social, Collaboration Days 2014
The Other Social, Collaboration Days 2014The Other Social, Collaboration Days 2014
The Other Social, Collaboration Days 2014
 
MOINC Server
MOINC ServerMOINC Server
MOINC Server
 
Lec Jan22 2009
Lec Jan22 2009Lec Jan22 2009
Lec Jan22 2009
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
마케팅전쟁 Sp
마케팅전쟁 Sp마케팅전쟁 Sp
마케팅전쟁 Sp
 
Lec 2 Multidisciplinary 183
Lec 2  Multidisciplinary 183Lec 2  Multidisciplinary 183
Lec 2 Multidisciplinary 183
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009
 

Similaire à Lec Jan15 2009

Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009Ravi Soni
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
UNIT- 2&3 based on gear life.pdf
UNIT- 2&3 based on gear life.pdfUNIT- 2&3 based on gear life.pdf
UNIT- 2&3 based on gear life.pdfdinesh babu
 
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...Deepak Malani
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Provectus
 
Design of Flexible Pavement Using AASHTO.pptx
Design of Flexible Pavement Using AASHTO.pptxDesign of Flexible Pavement Using AASHTO.pptx
Design of Flexible Pavement Using AASHTO.pptxmohammeed3
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...gigax2
 
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Takahiro Katagiri
 
Temperature Control Fan Using 8051 Microcontroller
Temperature Control Fan Using 8051 MicrocontrollerTemperature Control Fan Using 8051 Microcontroller
Temperature Control Fan Using 8051 MicrocontrollerMafaz Ahmed
 
beard1.ppt
beard1.pptbeard1.ppt
beard1.pptessmikke
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 

Similaire à Lec Jan15 2009 (18)

Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
UNIT- 2&3 based on gear life.pdf
UNIT- 2&3 based on gear life.pdfUNIT- 2&3 based on gear life.pdf
UNIT- 2&3 based on gear life.pdf
 
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
 
Design of Flexible Pavement Using AASHTO.pptx
Design of Flexible Pavement Using AASHTO.pptxDesign of Flexible Pavement Using AASHTO.pptx
Design of Flexible Pavement Using AASHTO.pptx
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
 
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
 
PROJECT REPORT.pptx
PROJECT REPORT.pptxPROJECT REPORT.pptx
PROJECT REPORT.pptx
 
Temperature Control Fan Using 8051 Microcontroller
Temperature Control Fan Using 8051 MicrocontrollerTemperature Control Fan Using 8051 Microcontroller
Temperature Control Fan Using 8051 Microcontroller
 
beard1.ppt
beard1.pptbeard1.ppt
beard1.ppt
 
kmaps
 kmaps kmaps
kmaps
 
1 5
1 51 5
1 5
 
TPM.pptx
TPM.pptxTPM.pptx
TPM.pptx
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 

Plus de Ravi Soni

Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Ravi Soni
 
Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Ravi Soni
 
Lec 6 Structure (Types) 196
Lec 6  Structure (Types) 196Lec 6  Structure (Types) 196
Lec 6 Structure (Types) 196Ravi Soni
 
Lec 3 Organizational Effectiveness 184
Lec 3  Organizational Effectiveness 184Lec 3  Organizational Effectiveness 184
Lec 3 Organizational Effectiveness 184Ravi Soni
 
Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186Ravi Soni
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009Ravi Soni
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009Ravi Soni
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009Ravi Soni
 

Plus de Ravi Soni (9)

Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
 
Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209
 
Lec 6 Structure (Types) 196
Lec 6  Structure (Types) 196Lec 6  Structure (Types) 196
Lec 6 Structure (Types) 196
 
Lec 3 Organizational Effectiveness 184
Lec 3  Organizational Effectiveness 184Lec 3  Organizational Effectiveness 184
Lec 3 Organizational Effectiveness 184
 
Lec 1 182
Lec 1 182Lec 1 182
Lec 1 182
 
Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009
 

Dernier

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Lec Jan15 2009

  • 1. CSL718 : Pipelined Processors PipelineTimings 15th Jan, 2009 Anshul Kumar, CSE IITD
  • 2. Pipelined Processors Pipelined Processors Parallel architectures Function-parallel Data-parallel Instr level (ILP) Thread level Process level Intel’s terminology: Pipelined VLIWs Superscalar • intra ILP processors processors • inter ILP slide 2 Anshul Kumar, CSE IITD
  • 3. Ideal Pipelining Tinst S stages slide 3 Anshul Kumar, CSE IITD
  • 4. Determining Clock Period P Reg Reg Comb Clock Δt Δt ≥ P Δt = Pmax P = propagation delay Pmax = max propagation delay slide 4 Anshul Kumar, CSE IITD
  • 5. Ideal Pipelining Tinst S stages Pmax = Tinst / S Δt = Tinst / S Effective CPI = 1 Effective time per inst Teff = CPI * Δt = 1 * Tinst / S slide 5 Anshul Kumar, CSE IITD
  • 6. Pipelining with hazards Tinst S stages Frequency of interruptions - b Δt = Tinst / S CPI = 1 + (S - 1) * b Teff = (1 + (S - 1) * b) * Tinst / S slide 6 Anshul Kumar, CSE IITD
  • 7. Teff vs. S (Tinst = 10) 12 10 8 b = .2 Teff 6 b = .1 b = .05 4 2 0 1 2 3 4 5 6 7 8 9 10 S
  • 8. A more realistic view P Reg Reg Comb Clock Register output delay Register setup time Clock skew slide 8 Anshul Kumar, CSE IITD
  • 9. Clocking Overhead • Fixed overhead c – Setup time – Output delay • Variable overhead (stretching factor) k – Clock skew Δt = Pmax + k * Pmax + c = (1 + k) * Tinst / S + c Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c] slide 9 Anshul Kumar, CSE IITD
  • 10. Teff vs. S (Tinst = 10, c = 1, k = .1) 14 12 10 8 b = .2 Teff b = .1 6 b = .05 4 2 0 1 3 5 7 9 11 13 15 S
  • 11. Pipelining with Clocking Overhead Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c] Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)] slide 11 Anshul Kumar, CSE IITD
  • 12. Partitioning instruction into cycles with non-uniform stage times non-uniform One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action? slide 12 Anshul Kumar, CSE IITD
  • 13. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 13 Anshul Kumar, CSE IITD
  • 14. Optimal Pipelining Tinst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns b = 0.2 c = 4 ns k = 5% Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)] = 9.7 ⇒ 9 Pmax = 10 ns slide 14 Anshul Kumar, CSE IITD
  • 15. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Pmax = 10 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S = 10 Δt = 14.5 ns Decode 6+6 ns S * Δt = 145 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 15 Anshul Kumar, CSE IITD
  • 16. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns S=9 Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Pmax = 13 ns Δt = 17.65 ns Decode 6+6 ns S * Δt = 159 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 16 Anshul Kumar, CSE IITD
  • 17. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Pmax = 20 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S=5 Δt = 25 ns Decode 6+6 ns S * Δt = 125 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 17 Anshul Kumar, CSE IITD
  • 18. Comparison Δt S * Δt S Pmax Teff 9 13 17.65 159 45.89 10 10 14.50 145 40.60 5 20 25.00 125 45.00 slide 18 Anshul Kumar, CSE IITD
  • 19. Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead Δt ≥ Tinst / S + c (ignoring k) ∴ S * Δt ≥ Tinst + S * c Quantization overhead = S * (Δt - c) -Tinst This reduces as clock period becomes small slide 19 Anshul Kumar, CSE IITD
  • 20. Other Timing Approaches • Self Timed Circuits – No centralized free running clock – An operation begins as soon as its inputs are available, that is, all its predecessors have completed – Higher speed, lower power consumption • Wave Pipelining – Omit inter-stage registers – Reduced clocking overhead slide 20 Anshul Kumar, CSE IITD
  • 21. Conventional vs Wave Pipelining Conventional vs Wave Pipelining Conventional Pipeline Wave Pipeline • Registers separate • No registers between adjoining stages adjoining stages • Clock period > max prop • Clock period less than delay max prop delay • Inter-stage data stored in • Waves of data propagate registers through combinational network (effectively, data is stored in the combinational circuit delay!) slide 21 Anshul Kumar, CSE IITD
  • 22. No pipelining Reg X X’ Reg Y Clock X X’ Y slide 22 Anshul Kumar, CSE IITD
  • 23. Conventional pipelining Reg X X’ Y Y’ Z Z’ Reg W Clock X X’ Y Y’ Z Z’ W
  • 24. Wave pipelining Reg X Z’ Reg W Clock X Z’ slide 24 Anshul Kumar, CSE IITD W
  • 25. Timing Reg Reg Comb ckt X Y Clock T≥p+s T clock period X Y p s propagation delay set-up time slide 25 Anshul Kumar, CSE IITD
  • 26. Timing with clock skew Reg Reg Comb ckt X Y Clock T Clock skew = ±δ X Y p s δ δ T ≥ p + s + 2δ slide 26 Anshul Kumar, CSE IITD
  • 27. Variation in propagation delay • Different delays in different paths • Delay variation due to process / temperature/ power variations • Data-dependent delay variations slide 27 Anshul Kumar, CSE IITD
  • 28. Timing for wave pipelining Reg Reg Comb ckt X Y Clock T ±δ X Δp pmin Y pmax T ≥ Δ p + s + 4δ slide 28 Anshul Kumar, CSE IITD
  • 29. Timing for wave pipelining (expanded view) T X Δp Y nT (n-1) T pmin pmax pmin ≥ (n-1) T + 2δ nT ≥ pmax + s + 2δ ⇒T ≥ Δ p + s + 4δ slide 29 Anshul Kumar, CSE IITD
  • 30. Comparison Conventional Pipeline Wave Pipeline T ≥ pmax/n + s + 2δ T ≥ Δ p + s + 4δ (plus cycle quantization overhead) nT ≥ pmax + ns + 2nδ nT ≥ pmax + s + 2δ slide 30 Anshul Kumar, CSE IITD
  • 31. Problems with wave pipelining • Need to balance delays • Narrow range of clock frequencies • Control difficult • Not very suitable for non-linear pipelines slide 31 Anshul Kumar, CSE IITD
  • 32. References 1. M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel Processor Designquot;, Narosa Publishing House/ Jones and Bartlett, 1996. 2. Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474. slide 32 Anshul Kumar, CSE IITD