2. Early proposals/prototypes
Term
Superscalar
Cheetah America project(4)
IBM
Multititan project(2)
DEC
Match(2) Torch(4)
Stanford U
SIMP(4) DSNS(4)
Kyushu U
1982 1983 1984 1985 1986 1987 1988 1989
slide 2
Anshul Kumar, CSE IITD
5. Tasks of superscalar processing
Parallel Parallel Preserving the
decoding instruction sequential
and issue execution consistency of
instruction execution
and
exception processing
slide 5
Anshul Kumar, CSE IITD
6. Superscalar decode and issue
I - cache I - cache
Instruction Instruction
buffer buffer
Scalar Superscalar
Decode & Issue
Issue Issue Decode & Issue
IF D/I IF D I
slide 6
Anshul Kumar, CSE IITD
7. Parallel Decoding
• Fetch multiple instructions in instruction
buffer
• Decode multiple instructions in parallel –
instruction window
• Possibly check dependencies among these
as well as with the instructions already
under execution
slide 7
Anshul Kumar, CSE IITD
8. Reducing decoding time
Pre-decoding
Second level cache
• Do partial decoding while or main memory
instructions are being
loaded in I-cache
N bits/cycle
• Decoded information is
Pre-decode unit
appended to the instruction
• This includes instruction N + n bits/cycle
class, resources required
I - cache
etc.
slide 8
Anshul Kumar, CSE IITD
9. Pre-decoding examples
Pre-decoding
Processor No. of predecode bits
PA 7200 (1995) 5
PA 8000 (1996) 5
PowerPC 620(1996) 7
UltraSparc (1995) 4
HAL PM1 (1995) 4
AMD K5 (1995) 5 (per byte)
R 10000 (1996) 4
slide 9
Anshul Kumar, CSE IITD
10. Blocking during issue
Decode and issue
Instruction
buffer instructions
issue window
directly to EUs
Decode
Check & Issue
Instructions may be
blocked due to
data dependency
EU EU EU
slide 10
Anshul Kumar, CSE IITD
11. Non-blocking Issue
Non-blocking
Instruction
buffer
Decode and issue
Decode & Issue
to buffers
From buffers
Reservation Reservation Reservation
station station station
dispatch to EUs
Dep. Checking/ Dep. Checking/ Dep. Checking/
dispatch dispatch dispatch
EU EU EU
slide 11
Anshul Kumar, CSE IITD
12. Handling of Issue Blockages
Preserving issue order Alignment of instruction issue
aligned unaligned
in-order out of order
slide 12
Anshul Kumar, CSE IITD
13. Issue Order
Issue in strict program order Out of order Issue
Issue window Issue window
Instructions Instructions
to be issued e to be issued e
d c b a d c b a
Instructions Instructions
a c a
issued issued
Example: MC 88110, PowerPC 601
Independent instruction
Dependent instruction
Issued instruction
slide 13
Anshul Kumar, CSE IITD
14. Alignment
Aligned Issue Unaligned Issue
next window fixed window gliding window
checked
in cycle 1 h g f e d c b a h g f e d c b a
issued
a a
in cycle 1
checked
in cycle 2 h g f e d c b h g f e d c b
issued
c b c b
in cycle 2
checked
in cycle 3 h g f e d h g f e d
issued
d f e d
in cycle 3
slide 14
Anshul Kumar, CSE IITD
15. Design space in instruction issue
Coping with Coping with Use of Handling of Issue
false data unresolved RSs issue blockages rate
dependencies control (2-6)
dependencies
blocking non-blocking
no Register
renaming wait speculative
slide 15
Anshul Kumar, CSE IITD
16. Frequently used issue policies
in scalar processors
Traditional Traditional Traditional Traditional
scalar issue scalar issue scalar issue scalar issue
with RSs with RSs with spec.
and renaming execution
i386 CDC 6600 IBM 360/91 I486
MC68030 MC68040
R3000 R4000
Sparc MicroSparc
slide 16
Anshul Kumar, CSE IITD
17. Frequently used issue policies
in super scalar processors
Straightforward Straightforward Straight forward Advanced
superscalar superscalar superscalar superscalar
issue issue with issue with issue
RSs renaming (renaming+RSs)
R10000
(speculative execution in all)
aligned unaligned PentiumPro
MC68060 MC88110
Pentium PowerPC602
PowerPC602
PowerPC601 PA7200 R8000 PA8000
UltraSparc
PA7100 Sparc64
SuperSparc Am29000
Alpha21164 K5
slide 17
Anshul Kumar, CSE IITD
18. Design Space of Reservation Stations
Design Space of Reservation Stations
Scope Layout of Operand fetch Instruction
reservation policy dispatch scheme
stations
partial full
slide 18
Anshul Kumar, CSE IITD
19. Layout of Reservation Stations
Type Number of Number of read
buffer entries and write ports
depends on
individual 2-4
no. of EUs
group 6-16
Stand combined with connected
central 20
alone renaming and total 15-40
(RS) reordering
slide 19
Anshul Kumar, CSE IITD
22. Issue bound operand fetch
(with single register file)
(with single register file)
instruction
Decode/issue
data
RF
RS RS RS RS
EU EU EU EU
slide 22
Anshul Kumar, CSE IITD
23. Dispatch bound operand fetch
(with single register file)
(with single register file)
instruction
Decode/issue data
RS RS RS RS
RF
EU EU EU EU
slide 23
Anshul Kumar, CSE IITD
24. Issue bound operand fetch
(with multiple register files)
(with multiple register files)
instruction
Decode/issue
data
RF RF
RS RS RS RS
EU EU EU EU
slide 24
Anshul Kumar, CSE IITD
25. Dispatch bound operand fetch
(with multiple register files)
(with multiple register files)
instruction
Decode/issue data
RS RS RS RS
RF RF
EU EU EU EU
slide 25
Anshul Kumar, CSE IITD
26. Updating RFs and RSs
instruction
data
Decode/issue
RF RF
RS RS RS RS
EU EU EU EU
slide 26
Anshul Kumar, CSE IITD
27. Instruction dispatch scheme
Dispatch Dispatch Checking Treatment of
policy rate operand empty RS
availability
single multiple
instr/ instr/
cycle cycle
Individual RS Group or central RS
slide 27
Anshul Kumar, CSE IITD
28. Dispatch policy
Selection Arbitration Dispatch
rule rule order
Rule for identifying Rule for choosing
instructions which are one out of several
ready for execution ready instructions
(data dependency check) (earlier instruction has priority)
slide 28
Anshul Kumar, CSE IITD
29. Dispatch order
in-order partially out of
out of order
order
check
RS RS
check
slide 29
Anshul Kumar, CSE IITD
30. Checking availability of operands
Direct check of Check of explicit
score-board bits status bits in RS
(usual for dispatch (usual for issue
bound operand fetch) bound operand fetch)
control flow approach data flow approach
Flynn’s terminology
slide 30
Anshul Kumar, CSE IITD
31. Score-board
Score-board
Introduced with CDC6600
Data status
0 1
0
1
2 1
Register
File
0
1
slide 31
Anshul Kumar, CSE IITD
32. Checking in dispatch bound fetch
Checking in dispatch bound fetch
decoded
instruction
check V bits of sources
Reservation
station
update Rd
Rs1,Rs2,Rd set V bit
OC Rs1 Rs2 Rd
reset V bit of Rd
Register
File
Os1
OC
(opcode)
Os2 (operand value)
EU
result, Rd
slide 32
Anshul Kumar, CSE IITD
33. Checking in issue bound fetch
Checking in issue bound fetch
decoded update Rd, set V bit
Rs1,Rs2,Rd
instruction reset V bit of Rd
Register
File
Os1
Os2 (operand value)
check Vs1, Vs2
Reservation station
OC, Os1, Os2, Rd
OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd
associative update of
EU
Is1, Is2 with Rd, set Vs bits
result, Rd
slide 33
Anshul Kumar, CSE IITD
34. Treatment of an empty RS
Straight forward Bypassing
approach RS if empty
At least one
RS RS
cycle stay in RS
EU EU
Sparc64
Nx586
PowerPc 604
slide 34
Anshul Kumar, CSE IITD
35. Approaches in dispatching
Straight forward Enhanced Advanced
in order partially out of order out of order
single single multiple
instr/cycle instr/cycle instr/cycle
individual RSs individual RSs group/central RSs
Power1, PPC603 Power2 PM1, PentiumPro
Nx586, Am29000 PPC604,620 PA8000, R10000
slide 35
Anshul Kumar, CSE IITD
36. Reference
1. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer
Architectures : A Design Space Approachquot;, Addison Wesley,
1997.
slide 36
Anshul Kumar, CSE IITD