Powerful Google developer tools for immediate impact! (2023-24 C)
3D Microprocessor Design: Stacking at different granularities
1. 3D Microprocessor Design
Stacking at different granularities
Alberto Villegas Erce
Seminar on Computer Systems
Turku University
April 2010
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 1 / 29
2. Introduction
Concepts review
Previously on 3D world...
Industry trends
Make it faster, smaller and cuter but do not forget the prize
3D Design
Benefits: shorter wire length, speed increase, lower power consumption.
Challenges: risk of defects, heat problems, design complexity.
Through Silicon Vias (TSVs)
Vertical electrical connection passing completely through a silicon die.
Low power consumption
Low latency
Increasing integration level (10k-100k per cm2 )
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 2 / 29
3. Introduction
Today
Three dimensional Puzzle
How to face 3D design?
2D design decomposition at different
granularities.
1 Entire cores, cache: add functionality
with high 2D reuse.
2 Functional unit blocks: performance
improvement and power reduction.
Must re-floorplan and retime paths.
3 Logic gates (block splitting): reduce
latency and power on every level routes.
Need new 3D circuit design,
methodologies and layout tools.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 3 / 29
4. Introduction
Index
1 Stacking Complete Modules
2 Stacking Functional Unit Blocks
3 Splitting Functional Unit Blocks
4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 4 / 29
5. Stacking Complete Modules
Index
1 Stacking Complete Modules
2 Stacking Functional Unit Blocks
3 Splitting Functional Unit Blocks
4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 5 / 29
6. Stacking Complete Modules Idea
Three-Dimensional Stacked Caches
Idea
Break & stack existing modules.
Conventional dual-core processor
featuring a 4MB L2 cache.
Design options for 3D stacking
Reduce space.
Increase storage.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 6 / 29
7. Stacking Complete Modules Increasing storage
L2 cache controller in 3D
Objective
Add more storage to the L2
cache.
Stacking a second silicon
layer
Additional 8MB of cache
Nearly no impact in L2
access latency
Traditional 2D solution
Double silicon area.
Latency increased.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 7 / 29
8. Stacking Complete Modules Increasing storage
L2 cache controller in 3D (cont.)
DRAM Solution
Much greater
storage density.
Greater latency
(50-150 cycles).
Reduce silicon
area in a half.
Hybrid solution
SRAM to store
only the tags.
DRAM to store
the actual data.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 8 / 29
9. Stacking Complete Modules Increasing storage
L2 cache controller in 3D (testing)
Three programs test:
Program A : small working set that fits in 4MB SRAM cache.
Program B : larger working set that do not fit 4MB SRAM but does fit
within 32MB DRAM cache.
Program C : streaming memory access patterns. Poor cache hits rate for
both configurations.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 9 / 29
10. Stacking Complete Modules 3D optionality
3D Integration
... for everyone?
3D Integration:
Increase silicon required for the chip (layers)
=⇒ Increase manufacturing cost
Extra manufacturing steps for bounding.
Impact on yield rates.
3D is not the general answer!
3D stacking is to use it as a means to optionally augment the processor
with some additional functionality
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 10 / 29
11. Stacking Complete Modules 3D optionality
Introspective 3D Processors
Objective
Access to more dynamic information about the internal state of a
microprocessor.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 11 / 29
12. Stacking Complete Modules 3D optionality
Reliable 3D Processors
Problem
Small size in modern processors makes them vulnerable to data corruption
Solutions
Redundancy: two/three copies of the
processor operating lock-step =⇒
multiple pipelines increase cost.
Leading execution/trailing checking
cores: trailing core re-executes
instructions (not lock-step) =⇒ still
additional pipeline increases area.
Extra wires eliminated.
Stack it! Optional checker core.
Unutilized silicon area.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 12 / 29
13. Stacking Functional Unit Blocks
Index
1 Stacking Complete Modules
2 Stacking Functional Unit Blocks
3 Splitting Functional Unit Blocks
4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 13 / 29
14. Stacking Functional Unit Blocks Introduction
Stacking Functional Unit Blocks
Nowadays
Early step of development for this
technologies.
3D integration will require
Design automation tools.
Layout support.
Verification and validation
methodologies.
Future
Reorganize the processor pipeline in new
ways.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 14 / 29
15. Stacking Functional Unit Blocks Removing wires
Removing Wires
Pentium III & IV branch misprediction
Problem
Wire delays have not evolve as fast as transistors speed.
PIII branch misprediction
PIV branch misprediction
Solution
3D implementation so distant blocks are now vertically stacked on top of
each other.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 15 / 29
16. Stacking Functional Unit Blocks Removing wires
Removing Wires
Alpha 21264
Problem
Superscalar processor with multiple execution units (EU) requires a bypass
network to forward results between all of the EU =⇒ wiring.
2D Solution
Divide EU into two groups or
clusters, each with its own bypass
network and communicated.
3D Solution
Stack the clusters.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 16 / 29
17. Stacking Functional Unit Blocks Trade-offs
Removing Wires
Trade-offs
Cons
Pros Non-trivial engineering
Optimize processor effort.
pipeline opportunities. Modify pipeline
Physically reduction of Verify and validate
amount of wiring. new design.
Additional costs.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 17 / 29
18. Stacking Functional Unit Blocks TSV Reality
Removing Wires
TSV Reality
Problem
After stacking two blocks there is enough room for placing TSVs.
Solution
Different layouts of the TSVs.
Wire overhead reintroduction
Reintroduced wires do not completely cancel the 3D wire reduction benefits.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 18 / 29
19. Splitting Functional Unit Blocks
Index
1 Stacking Complete Modules
2 Stacking Functional Unit Blocks
3 Splitting Functional Unit Blocks
4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 19 / 29
20. Splitting Functional Unit Blocks Introduction
Splitting Functional Unit Blocks
Last level
Logic gates
Split individual functional units
across multiple layers.
Reorganize the functional unit
block =⇒ more compact 3D
arragement.
Benefits
Reduce length of intra-block
wiring.
Improve operating frequencies.
We will introduce a starting point of thinking.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 20 / 29
21. Splitting Functional Unit Blocks 3D Cache Organizations
3D Cache Organizations
First view
Problem
L2 cache consumes about half of the overall
die area.
Worst case routing distance: 2x+4y
Two stack possibilities.
Banks on cores Banks on banks
Half space. Half space.
Accessing Accessing
equal. reduced.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 21 / 29
22. Splitting Functional Unit Blocks Splitting the cache
3D Splitting the cache
Problem
Wires within each bank also impact overall
latency.
Split individual cache banks across multiple layers.
Columns on
columns
Best
latency.
Rows on rows
Energy reduction.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 22 / 29
23. Splitting Functional Unit Blocks Splitting the cache
3D Splitting cache
Testing
Experimental results
SPICE simulation.
Column on column organization.
SRAM implementations in 65-nm process.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 23 / 29
24. Splitting Functional Unit Blocks 3D Adders
3D Adders
Classic Look-ahead Carry Adder
Look-ahead Carry Adder
n = 16-bits
Critical path along bit[0]-bit[n-1]
Several ways to split the adder
Based on inputs By significance
x bottom layer; least significant bits
y top layer. bottom layer;
most significant top
1st lvl of propagate layer.
layer splitted.
TSV between root
Half wire length. nodes.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 24 / 29
25. Conclusions
Index
1 Stacking Complete Modules
2 Stacking Functional Unit Blocks
3 Splitting Functional Unit Blocks
4 Conclusions
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 25 / 29
26. Conclusions
Conclusions
Benefits of 3D organizing
components
Can significantly reduce
wire lengths.
Devices from different
technologies can be
tightly integrated and
combined.
3D organizations may be
required depending on the
exact design constraints and
objectives.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 26 / 29
27. Conclusions
Conclusions
Cons
More granularity ⇒
more re-dising.
Stacking can increase
heat.
Long level of
technological
development
Every re-design process yields
to a cost increment.
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 27 / 29
28. References
References
Three-Dimensional Microprocessor Design
Gabriel H. Loh
Springer Science 2010
A Modular 3D Processor for Flexible Product Design and Technology
Migration
Gabriel H. Loh
ACM 2008
Die-stacking (3D) microarchitecture
B. Black.
International Symposium on Microarchitecture, pp. 469-479, 2006
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 28 / 29
29. The end Questions
Thank you.
Questions?
Please be nice
Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design
3D Microprocessor April 2010 29 / 29