SlideShare une entreprise Scribd logo
1  sur  40
CSE 8383 - Advanced
Computer Architecture

            Week-3
     Week of Jan 26, 2004
   engr.smu.edu/~rewini/8383
Contents
   Linear Pipelines
   Nonlinear pipelines
   Instruction Pipelines
   Arithmetic Operations
   Design of Multifunction Pipeline
Linear Pipeline
   Processing Stages are linearly
    connected
   Perform fixed function
   Synchronous Pipeline
       Clocked latches between Stage i and
        Stage i+1
       Equal delays in all stages
   Asynchronous Pipeline (Handshaking)
Latches


     S1               S2              S3


              L1                 L2

Slowest stage determines delay

Equal delays  clock period
Reservation Table
          Time


S1    X

S2        X

S3
                 X

                     X
S4
5 tasks on 4 stages
                  Time

S1    X   X   X   X      X

S2        X   X   X      X   X

S3            X   X      X   X   X

S4                X      X   X   X   X
Non Linear Pipelines
   Variable functions
   Feed-Forward
   Feedback
3 stages & 2 functions
       X                  Y



 S1        S2        S3
Reservation Tables for X & Y
S1    X                   X       X
S2        X       X
S3            X       X       X


S1    Y               Y
S2            Y
S3        Y       Y       Y
Linear Instruction Pipelines
   Assume the following instruction
    execution phases:
       Fetch (F)
       Decode (D)
       Operand Fetch (O)
       Execute (E)
       Write results (W)
Pipeline Instruction Execution

F    I1   I2   I3

D         I1   I2   I3

O              I1   I2   I3

E                   I1   I2   I3

W
                         I1   I2   I3
Dependencies
   Data Dependency
    (Operand is not ready yet)

   Instruction Dependency
    (Branching)

    Will that Cause a Problem?
Data Dependency
I1 -- Add R1, R2, R3
I2 -- Sub R4, R1, R5
       1    2    3    4    5    6

  F   I1   I2
  D        I1   I2
 O              I1   I2
 E
                     I1   I2
 W                        I1   I2
Solutions
   STALL
   Forwarding
   Write and Read in one cycle
   ….
Instruction Dependency
I1 – Branch o
I2 –
        1    2    3    4    5    6

   F   I1   I2
  D         I1   I2
  O              I1   I2
  E
                      I1   I2
  W                        I1   I2
Solutions
   STALL
   Predict Branch taken
   Predict Branch not taken
   ….
Floating Point Multiplication
   Inputs (Mantissa1, Exponenet1), (Mantissa2,
    Exponent2)
   Add the two exponents  Exponent-out
   Multiple the 2 mantissas
   Normalize mantissa and adjust exponent
   Round the product mantissa to a single length
    mantissa. You may adjust the exponent
Linear Pipeline for floating-
      point multiplication

     Add             Multiply
                                    Normalize          Round
   Exponents         Mantissa




  Add           Partial                    Normalize      Round
                            Accumulator
Exponents      Products



                                                          Re
                                                       normalize
Linear Pipeline for floating-
       point Addition


            Partial    Add            Find             Partial
 Subtract
             Shift    Mantissa      Leading 1           Shift
Exponents




                                                   Re
                                 Round
                                                normalize
Combined Adder and
       Multiplier
             Partial
                       B
            Products


   A          F              C                G               H
Exponents    Partial         Add             Find             Partial
 Subtract     Shift        Mantissa        Leading 1           Shift
  / ADD



                                                          Re
                                      Round
                                                       normalize

                                       E                  D
Reservation Table for Multiply
    1   2   3   4   5   6   7

A   X
B       X   X
C           X   X
D                   X       X
E                       X
F

G

H
Reservation Table for Addition
    1   2   3   4   5   6   7   8   9
A   Y
B
C               Y
D                                   Y
E                               Y
F       Y   Y
G                   Y
H                       Y   Y
Nonlinear Pipeline Design
   Latency
      The number of clock cycles between two
      initiations of a pipeline
   Collision
      Resource Conflict
   Forbidden Latencies
      Latencies that cause collisions
Nonlinear Pipeline Design
cont
   Latency Sequence
      A sequence of permissible latencies between
      successive task initiations
   Latency Cycle
      A sequence that repeats the same subsequence
   Collision vector
    C = (Cm, Cm-1, …, C2, C1), m <= n-1
    n = number of column in reservation table
    Ci = 1 if latency i causes collision, 0 otherwise
Mul – Mul Collision (lunch
after 1 cycle)
    1   2    3     4    5   6   7

A   X   Z
B       X   X Z    Z
C            X    X Z   Z
D                       X   Z   X
E                           X   Z
F

G

H
Mul –Mul Collision (lunch after
2 cycles)
    1   2   3   4   5   6    7

A   X       Z
B       X   X   Z   Z
C           X   X   Z   Z
D                   X       X Z
E                       X
F

G

H
Mul – Mul Collision (lunch
after 3 cycles)
    1   2   3   4   5   6   7

A   X           Z
B       X   X       Z   Z
C           X   X       Z   Z
D                   X       X
E                       X
F

G

H
Collision Vector for Multiply
after Multiply
Forbidden Latencies: 1, 2

Collision vector
0 0 0 0 1 1  11

Maximum forbidden latency = 2  m = 2
Example
      X             Y



 S1       S2   S3
Reservation Tables for X & Y
S1    X                   X       X
S2        X       X
S3            X       X       X


S1    Y               Y
S2            Y
S3        Y       Y       Y
Reservation Tables for X & Y
S1    X                   X       X
S2        X       X
S3            X       X       X


S1    Y               Y
S2            Y
S3        Y       Y       Y
Forbidden Latencies
   X after X
   X after Y
   Y after X
   Y after Y
X after X
       2
S1    X1        X2                   X1            X2 X1
S2         X1        X2 X1           X2
S3              X1           X2 X1        X2 X1

       5
S1    X1                       X2 X1              X1

 S2        X1        X1                   X2
S3              X1        X1              X1      X2
X after X
       4
S1    X1                       X2        X1                X1
S2         X1        X1                  X2                X2
S3              X1             X1             X2 X1

       7
S1    X1                            X1                X2
                                                      X1
 S2
           X1        X1
S3              X1        X1                  X1
Collision Vector
 Forbidden Latencies: 2, 4, 5, 7
 Collision Vector =

 1011010
Y after Y
S1   Y       Y       Y
S2           Y       Y
S3       Y       Y       Y
                 Y       Y

S1   Y               Y
S2                   Y
S3
             Y
         Y       Y       Y
                         Y
Collision Vector
   Forbidden Latencies: 2, 4
   Collision Vector =
    1010
Exercise – Find the collision
vector

    1   2   3   4   5   6   7

A   X       X   X

B       X               X

C                   X       X

D               X
State Diagram for X

                           8+

             1011010


     3                            8+
         6       8+   1*

     1011011                    1111111
3*           6
Cycles
 Simple cycles  each state appears
  only once
(3), (6), (8), (1, 8), (3, 8), and (6,8)
 Greedy Cycles  simple cycles whose

  edges are all made with minimum
  latencies from their respective starting
  states
 (1,8), (3)  one of them is MAL

Contenu connexe

Tendances

Composite transformations
Composite transformationsComposite transformations
Composite transformationsMohd Arif
 
Virtual memory
Virtual memoryVirtual memory
Virtual memoryAnuj Modi
 
Ch8 (1) morris mano
Ch8 (1) morris manoCh8 (1) morris mano
Ch8 (1) morris manoKIRTI89
 
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptx
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptxChapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptx
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptxSovannDoeur
 
INSTRUCTION PIPELINING
INSTRUCTION PIPELININGINSTRUCTION PIPELINING
INSTRUCTION PIPELININGrubysistec
 
Chapter 4 The Processor
Chapter 4 The ProcessorChapter 4 The Processor
Chapter 4 The Processorguest4f73554
 
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit DesignVinit Raut
 
Os structure
Os structureOs structure
Os structureMohd Arif
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)Sudarshan Mondal
 
2 d geometric transformations
2 d geometric transformations2 d geometric transformations
2 d geometric transformationsMohd Arif
 
Midpoint circle algo
Midpoint circle algoMidpoint circle algo
Midpoint circle algoMohd Arif
 
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiTekendra Nath Yogi
 
computer graphics
computer graphicscomputer graphics
computer graphicsashpri156
 

Tendances (20)

Composite transformations
Composite transformationsComposite transformations
Composite transformations
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Ch8 (1) morris mano
Ch8 (1) morris manoCh8 (1) morris mano
Ch8 (1) morris mano
 
Parallel processing and pipelining
Parallel processing and pipeliningParallel processing and pipelining
Parallel processing and pipelining
 
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptx
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptxChapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptx
Chapter 4 NumPy Basics Arrays and Vectorized Computation (Part I).pptx
 
Galois field
Galois fieldGalois field
Galois field
 
Task assignment approach
Task assignment approachTask assignment approach
Task assignment approach
 
Pda
PdaPda
Pda
 
INSTRUCTION PIPELINING
INSTRUCTION PIPELININGINSTRUCTION PIPELINING
INSTRUCTION PIPELINING
 
Chapter 4 The Processor
Chapter 4 The ProcessorChapter 4 The Processor
Chapter 4 The Processor
 
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit Design
 
Os structure
Os structureOs structure
Os structure
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
2 d geometric transformations
2 d geometric transformations2 d geometric transformations
2 d geometric transformations
 
Pipeline
PipelinePipeline
Pipeline
 
control unit
control unitcontrol unit
control unit
 
Midpoint circle algo
Midpoint circle algoMidpoint circle algo
Midpoint circle algo
 
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
 
computer graphics
computer graphicscomputer graphics
computer graphics
 
Os(18 cs43) module5
Os(18 cs43) module5Os(18 cs43) module5
Os(18 cs43) module5
 

Similaire à Advanced computer architecture

Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lesson 10: The Chain Rule (handout)
Lesson 10: The Chain Rule (handout)Lesson 10: The Chain Rule (handout)
Lesson 10: The Chain Rule (handout)Matthew Leingang
 
Lesson 4A - Inverses of Functions.ppt
Lesson 4A - Inverses of Functions.pptLesson 4A - Inverses of Functions.ppt
Lesson 4A - Inverses of Functions.pptssuser78a386
 
Design of infinite impulse response digital filters 2
Design of infinite impulse response digital filters 2Design of infinite impulse response digital filters 2
Design of infinite impulse response digital filters 2HIMANSHU DIWAKAR
 
ITS World Congress :: Vienna, Oct 2012
ITS World Congress :: Vienna, Oct 2012ITS World Congress :: Vienna, Oct 2012
ITS World Congress :: Vienna, Oct 2012László Nádai
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...Alex Pruden
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
Lesson 10: The Chain Rule (Section 21 handout)
Lesson 10: The Chain Rule (Section 21 handout)Lesson 10: The Chain Rule (Section 21 handout)
Lesson 10: The Chain Rule (Section 21 handout)Matthew Leingang
 
Dsp U Lec06 The Z Transform And Its Application
Dsp U   Lec06 The Z Transform And Its ApplicationDsp U   Lec06 The Z Transform And Its Application
Dsp U Lec06 The Z Transform And Its Applicationtaha25
 

Similaire à Advanced computer architecture (20)

Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lesson 10: The Chain Rule (handout)
Lesson 10: The Chain Rule (handout)Lesson 10: The Chain Rule (handout)
Lesson 10: The Chain Rule (handout)
 
Lifting 1
Lifting 1Lifting 1
Lifting 1
 
UNIT I_5.pdf
UNIT I_5.pdfUNIT I_5.pdf
UNIT I_5.pdf
 
Matched filter
Matched filterMatched filter
Matched filter
 
Lesson 4A - Inverses of Functions.ppt
Lesson 4A - Inverses of Functions.pptLesson 4A - Inverses of Functions.ppt
Lesson 4A - Inverses of Functions.ppt
 
Continuity.ppt
Continuity.pptContinuity.ppt
Continuity.ppt
 
Design of infinite impulse response digital filters 2
Design of infinite impulse response digital filters 2Design of infinite impulse response digital filters 2
Design of infinite impulse response digital filters 2
 
ITS World Congress :: Vienna, Oct 2012
ITS World Congress :: Vienna, Oct 2012ITS World Congress :: Vienna, Oct 2012
ITS World Congress :: Vienna, Oct 2012
 
Lecture.1
Lecture.1Lecture.1
Lecture.1
 
Lecture28
Lecture28Lecture28
Lecture28
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Lect26 Engin112
Lect26 Engin112Lect26 Engin112
Lect26 Engin112
 
Lecture22
Lecture22Lecture22
Lecture22
 
fghdfh
fghdfhfghdfh
fghdfh
 
Singlevaropt
SinglevaroptSinglevaropt
Singlevaropt
 
Conic Clustering
Conic ClusteringConic Clustering
Conic Clustering
 
Lesson 10: The Chain Rule (Section 21 handout)
Lesson 10: The Chain Rule (Section 21 handout)Lesson 10: The Chain Rule (Section 21 handout)
Lesson 10: The Chain Rule (Section 21 handout)
 
Dsp U Lec06 The Z Transform And Its Application
Dsp U   Lec06 The Z Transform And Its ApplicationDsp U   Lec06 The Z Transform And Its Application
Dsp U Lec06 The Z Transform And Its Application
 

Plus de Md. Mahedi Mahfuj

Plus de Md. Mahedi Mahfuj (20)

Bengali optical character recognition system
Bengali optical character recognition systemBengali optical character recognition system
Bengali optical character recognition system
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
 
Parallel computing chapter 2
Parallel computing chapter 2Parallel computing chapter 2
Parallel computing chapter 2
 
Parallel computing(2)
Parallel computing(2)Parallel computing(2)
Parallel computing(2)
 
Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Message passing interface
Message passing interfaceMessage passing interface
Message passing interface
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Clustering manual
Clustering manualClustering manual
Clustering manual
 
Matrix multiplication graph
Matrix multiplication graphMatrix multiplication graph
Matrix multiplication graph
 
Strategy pattern
Strategy patternStrategy pattern
Strategy pattern
 
Observer pattern
Observer patternObserver pattern
Observer pattern
 
Mediator pattern
Mediator patternMediator pattern
Mediator pattern
 
Database management system chapter16
Database management system chapter16Database management system chapter16
Database management system chapter16
 
Database management system chapter15
Database management system chapter15Database management system chapter15
Database management system chapter15
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
 
Strategies in job search process
Strategies in job search processStrategies in job search process
Strategies in job search process
 
Report writing(short)
Report writing(short)Report writing(short)
Report writing(short)
 
Report writing(long)
Report writing(long)Report writing(long)
Report writing(long)
 
Job search_resume
Job search_resumeJob search_resume
Job search_resume
 
Job search_interview
Job search_interviewJob search_interview
Job search_interview
 

Advanced computer architecture

  • 1. CSE 8383 - Advanced Computer Architecture Week-3 Week of Jan 26, 2004 engr.smu.edu/~rewini/8383
  • 2. Contents  Linear Pipelines  Nonlinear pipelines  Instruction Pipelines  Arithmetic Operations  Design of Multifunction Pipeline
  • 3. Linear Pipeline  Processing Stages are linearly connected  Perform fixed function  Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages  Asynchronous Pipeline (Handshaking)
  • 4. Latches S1 S2 S3 L1 L2 Slowest stage determines delay Equal delays  clock period
  • 5. Reservation Table Time S1 X S2 X S3 X X S4
  • 6. 5 tasks on 4 stages Time S1 X X X X X S2 X X X X X S3 X X X X X S4 X X X X X
  • 7. Non Linear Pipelines  Variable functions  Feed-Forward  Feedback
  • 8. 3 stages & 2 functions X Y S1 S2 S3
  • 9. Reservation Tables for X & Y S1 X X X S2 X X S3 X X X S1 Y Y S2 Y S3 Y Y Y
  • 10. Linear Instruction Pipelines  Assume the following instruction execution phases:  Fetch (F)  Decode (D)  Operand Fetch (O)  Execute (E)  Write results (W)
  • 11. Pipeline Instruction Execution F I1 I2 I3 D I1 I2 I3 O I1 I2 I3 E I1 I2 I3 W I1 I2 I3
  • 12. Dependencies  Data Dependency (Operand is not ready yet)  Instruction Dependency (Branching) Will that Cause a Problem?
  • 13. Data Dependency I1 -- Add R1, R2, R3 I2 -- Sub R4, R1, R5 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  • 14. Solutions  STALL  Forwarding  Write and Read in one cycle  ….
  • 15. Instruction Dependency I1 – Branch o I2 – 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  • 16. Solutions  STALL  Predict Branch taken  Predict Branch not taken  ….
  • 17. Floating Point Multiplication  Inputs (Mantissa1, Exponenet1), (Mantissa2, Exponent2)  Add the two exponents  Exponent-out  Multiple the 2 mantissas  Normalize mantissa and adjust exponent  Round the product mantissa to a single length mantissa. You may adjust the exponent
  • 18. Linear Pipeline for floating- point multiplication Add Multiply Normalize Round Exponents Mantissa Add Partial Normalize Round Accumulator Exponents Products Re normalize
  • 19. Linear Pipeline for floating- point Addition Partial Add Find Partial Subtract Shift Mantissa Leading 1 Shift Exponents Re Round normalize
  • 20. Combined Adder and Multiplier Partial B Products A F C G H Exponents Partial Add Find Partial Subtract Shift Mantissa Leading 1 Shift / ADD Re Round normalize E D
  • 21. Reservation Table for Multiply 1 2 3 4 5 6 7 A X B X X C X X D X X E X F G H
  • 22. Reservation Table for Addition 1 2 3 4 5 6 7 8 9 A Y B C Y D Y E Y F Y Y G Y H Y Y
  • 23. Nonlinear Pipeline Design  Latency The number of clock cycles between two initiations of a pipeline  Collision Resource Conflict  Forbidden Latencies Latencies that cause collisions
  • 24. Nonlinear Pipeline Design cont  Latency Sequence A sequence of permissible latencies between successive task initiations  Latency Cycle A sequence that repeats the same subsequence  Collision vector C = (Cm, Cm-1, …, C2, C1), m <= n-1 n = number of column in reservation table Ci = 1 if latency i causes collision, 0 otherwise
  • 25. Mul – Mul Collision (lunch after 1 cycle) 1 2 3 4 5 6 7 A X Z B X X Z Z C X X Z Z D X Z X E X Z F G H
  • 26. Mul –Mul Collision (lunch after 2 cycles) 1 2 3 4 5 6 7 A X Z B X X Z Z C X X Z Z D X X Z E X F G H
  • 27. Mul – Mul Collision (lunch after 3 cycles) 1 2 3 4 5 6 7 A X Z B X X Z Z C X X Z Z D X X E X F G H
  • 28. Collision Vector for Multiply after Multiply Forbidden Latencies: 1, 2 Collision vector 0 0 0 0 1 1  11 Maximum forbidden latency = 2  m = 2
  • 29. Example X Y S1 S2 S3
  • 30. Reservation Tables for X & Y S1 X X X S2 X X S3 X X X S1 Y Y S2 Y S3 Y Y Y
  • 31. Reservation Tables for X & Y S1 X X X S2 X X S3 X X X S1 Y Y S2 Y S3 Y Y Y
  • 32. Forbidden Latencies  X after X  X after Y  Y after X  Y after Y
  • 33. X after X 2 S1 X1 X2 X1 X2 X1 S2 X1 X2 X1 X2 S3 X1 X2 X1 X2 X1 5 S1 X1 X2 X1 X1 S2 X1 X1 X2 S3 X1 X1 X1 X2
  • 34. X after X 4 S1 X1 X2 X1 X1 S2 X1 X1 X2 X2 S3 X1 X1 X2 X1 7 S1 X1 X1 X2 X1 S2 X1 X1 S3 X1 X1 X1
  • 35. Collision Vector  Forbidden Latencies: 2, 4, 5, 7  Collision Vector = 1011010
  • 36. Y after Y S1 Y Y Y S2 Y Y S3 Y Y Y Y Y S1 Y Y S2 Y S3 Y Y Y Y Y
  • 37. Collision Vector  Forbidden Latencies: 2, 4  Collision Vector = 1010
  • 38. Exercise – Find the collision vector 1 2 3 4 5 6 7 A X X X B X X C X X D X
  • 39. State Diagram for X 8+ 1011010 3 8+ 6 8+ 1* 1011011 1111111 3* 6
  • 40. Cycles  Simple cycles  each state appears only once (3), (6), (8), (1, 8), (3, 8), and (6,8)  Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL