The document discusses the multicore midlife crisis as processors move to multiple cores to cope with Moore's Law. As core counts increase, the memory bandwidth does not scale accordingly, creating a memory wall problem. Solutions proposed include increasing cache sizes, improving memory speeds, and better caching techniques. Future multicore designs may focus more on heterogeneous cores tailored for different workloads rather than increasing core counts uniformly. Research challenges include coping with heterogeneity, improving data locality given slow memory speeds, and software techniques to help address issues like cache coherence.
6. So What?
Yeap, they improved the cache size. Do I care?
The interesting part is why they did it.
5/4/11 6
7. The Memory Problem
• Moore’s Law: the number Processor
of transistors double
Core Core Core Core
every 18 months
– Singlecore: new transistors
= faster speed
– Multicore: new transistors Cache
= more cores
• Memory speed increase
Memory
does not obey Moore’s
Law!
5/4/11 7
8. The Memory Problem
• Problem: More cores compete for same slow
memory!
• Implications:
IF IF ID Queue
ID ID
X Stalled!
M access to cache
or RAM
W
J 5 cycles L > 100 cycles
5/4/11 8
9. The Memory Problem
• Problem: More cores compete for same slow
memory!
• Solution: Increase cache size J
– Maintain cache hit rate
• 2x cache hit rate requires 4x cache size
• Exponential increase in #transistors need
– Cache coherence overhead
5/4/11 9
10. Increasing Cache Size
Not practical!
B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009
5/4/11 10
12. Do We Need All These Cores?
• Average utilization: < 20%
• We don’t have too many parallel apps
• We just have enough compute power
• Until you try to encode an HD video
– Star Trek holodecks: not there yet
• CPU vendors still have to make a living
5/4/11 12
14. Tomorrow’s Multicore
• Intel Core i3, i5, i7
– Video is integrated into CPU
– Must balance sequential and parallel performance
– Lower energy requirements than prev. generations
• Heterogeneous cores
– Many, slow, good at floating points
– Some general purpose cores
– “Combine” cores into super-cores
• Must live with the memory problems
5/4/11 14
15. Tomorrow’s Multicore
• The number of cores is becoming less
important
– They can’t keep increasing them
– i3, i5, i7: how many cores each?
5/4/11 15
17. Tomorrow’s Multicore
• The number of cores is becoming less
important
– They can’t keep increasing them
– i3, i5, i7: how many cores each?
• Important is what the system provides
– FLOP intensive: GPU-style cores
– I/O intensive: FAWN (CMU)
– Memory intensive: Opteron/Xeon NUMA servers
5/4/11 17
18. A Research Perspective
• Coping with heterogeneity is hard
– Different degrees of parallelism have different
sequential executions speeds
– Many tradeoffs: Speed vs. Energy vs. Memory
intensity vs. I/O intensity
• Need models for heterogeneity
– Understand the cost of the applications in terms
of FLOPS, INTOPS, memory, I/O etc.
• Silver lining: stick to sequential apps (?)
5/4/11 18
19. A Research Perspective
• Coping with slow memory
• Need to improve data locality by orders of
magnitude
• Compiler support, auto-tunners etc.
• Space-efficient data types:
• HOT area in algo & systems
• Bloom filters: NSDI’10: 3 papers!
• Succinct data structures: STOC’08-STOC’10
• Cache oblivious algorithms
5/4/11 19
20. A Research Perspective
• Software-helped cache coherence
– Or go without it J
• Renounce some programming patterns
• Java initializes all objects to some value…
• Rethink those hash tables
• Go for approximate solutions
– It’s better if you can provide error bounds
5/4/11 20
21. Discussion
Thank you for your attention
5/4/11 21