2. Pipelining – It’s Natural!
Laundry example
Amal, Bimal, Chamal, & Dinal
each have one load of clothes
to wash, dry, & fold
Washer takes 30 minutes
Dryer takes 40 minutes
Folder takes 20 minutes
A B C D
2
3. Sequential Laundry
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
3
4. Pipelined Laundry – Start Work ASAP
Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
4
5. Pipelining Lessons
Pipelining doesn’t reduce
latency of a single task
Improve throughput of entire
workload
Pipeline rate limited by
slowest pipeline stage
Multiple tasks operating
simultaneously
Potential speedup = No pipe
stages
Unbalanced lengths of pipe
stages reduces speedup
Time to fill pipeline & time to
drain/flush it reduces
speedup
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
9. Pipeline With a Branch Penalty
Due to a Taken Branch
9
Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
10. Superscalar Architectures
Executes more than 1 instruction during a clock
cycle by simultaneously dispatching multiple
instructions to redundant functional units
10
Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
11. Intel Hyper Threading (HT)
Introduced with Intel Pentium 4
Allows 2 different resources of CPU to be used at
the same time
While 1st thread (instruction) is working with integers
(ALU’s integer unit) 2nd thread can work on floating
point numbers (ALU’s floating point unit)
OS feels that there are 2 logical CPUs
Achieved through a mix of shared, replicated, &
partitioned chip resources such as:
Registers
Arithmetic units
Cache memory 11
12. Amdahl’s Law
What’s maximum expected improvement to an
overall system when only part of it is improved?
Amdahl said this relationship is not linear
12
14. Amdahl’s Law – Example
Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
14
Speedupoverall =
1
0.95
= 1.053
ExTimenew = ExTimeold × (0.9 + 0.1/2) = 0.95 × ExTimeold
15. Moore’s Law – Today’s Status
15
Moore’s Law – No of
transistors on a chip
tends to double about
every 2 years
Transistor
count still
rising
Clock speed
flattening
sharply
www.extremetech.com/wp-
content/uploads/2012/02/CPU-Scaling.jpg
16. Dual Core
Introduced by IBM Power4
However, AMD brought it to consumer market
Combines 2 independent CPUs & their
respective caches onto a single silicon chip
Provide better performance improvement than
HT
True parallelism
16
28. Power Consumption
Dynamic energy
Transistor switch from 0 1 or 1 0
½ × Capacitive load × Voltage2
Dynamic power
½ × Capacitive load × Voltage2 × Frequency switched
Static power consumption
Currentstatic × Voltage
Scales with no of transistors
Reducing voltage reduces energy
Reducing clock rate reduces power, not energy
Power gating than not only taking out clock signal28