SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
The Multicore Midlife Crisis
       Bogdan Marius Tudor

            CSTalks
         30 March 2011
Outline
•    The Memory Problem
•    Do We Need All These Cores?
•    Tomorrow’s Multicore
•    Research Perspective




5/4/11                             2
Remember Single Core?




                                 Wikipedia
5/4/11                                  3
My Next Processors
                     4000


                     3000
Cache Size [kB]




                     2000


                     1000


                           0
                                66      200      1000      2250      1600      2400     2400
                               MHz      MHz      MHz       MHz       MHz       MHz      MHz
                               Apr-94




                                        Apr-98



                                                  Nov-01


                                                            May-04

                                                                      Jul-06

                                                                               Jul-08


                                                                                         Mar-11
                  5/4/11                                                                          4
My Next Processors
                     4000


                     3000
Cache Size [kB]




                     2000


                     1000


                           0
                                66      200      1000      2250      1600      2400     2400
                               MHz      MHz      MHz       MHz       MHz       MHz      MHz
                               Apr-94




                                        Apr-98



                                                  Nov-01


                                                            May-04

                                                                      Jul-06

                                                                               Jul-08


                                                                                         Mar-11
                  5/4/11                                                                          5
So What?

Yeap, they improved the cache size. Do I care?



The interesting part is why they did it.




5/4/11                                           6
The Memory Problem
•  Moore’s Law: the number                    Processor
   of transistors double
                                          Core Core Core Core
   every 18 months
         –  Singlecore: new transistors
            = faster speed
         –  Multicore: new transistors          Cache
            = more cores


•  Memory speed increase
                                               Memory
   does not obey Moore’s
   Law!

5/4/11                                                          7
The Memory Problem
•  Problem: More cores compete for same slow
   memory!
•  Implications:
         IF              IF         ID Queue

         ID              ID
         X             Stalled!

         M                        access to cache
                                     or RAM
         W

         J 5 cycles    L > 100 cycles
5/4/11                                              8
The Memory Problem
•  Problem: More cores compete for same slow
   memory!
•  Solution: Increase cache size J
         –  Maintain cache hit rate
            •  2x cache hit rate requires 4x cache size
            •  Exponential increase in #transistors need
         –  Cache coherence overhead



5/4/11                                                     9
Increasing Cache Size



                                                                    Not practical!




         B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009

5/4/11                                                                                                     10
Other Approaches
•  Improve memory speed
         –  Slow, power-hungry and error-prone
•  Better caching
•  Improve memory bandwidth
         –  Latency tradeoff
•  Prefetch
         –  Mixed blessings
•  Allow more in-flight requests
5/4/11                                           11
Do We Need All These Cores?
•  Average utilization: < 20%
•  We don’t have too many parallel apps
•  We just have enough compute power

•  Until you try to encode an HD video
         –  Star Trek holodecks: not there yet

•  CPU vendors still have to make a living

5/4/11                                           12
Tomorrow’s Multicore




                                Intel

5/4/11                                  13
Tomorrow’s Multicore
•  Intel Core i3, i5, i7
         –  Video is integrated into CPU
         –  Must balance sequential and parallel performance
         –  Lower energy requirements than prev. generations
•  Heterogeneous cores
         –  Many, slow, good at floating points
         –  Some general purpose cores
         –  “Combine” cores into super-cores
•  Must live with the memory problems
5/4/11                                                     14
Tomorrow’s Multicore
•  The number of cores is becoming less
   important
         –  They can’t keep increasing them
         –  i3, i5, i7: how many cores each?




5/4/11                                         15
Tomorrow’s Multicore




                                Wikipedia
5/4/11                                16
Tomorrow’s Multicore
•  The number of cores is becoming less
   important
         –  They can’t keep increasing them
         –  i3, i5, i7: how many cores each?
•  Important is what the system provides
         –  FLOP intensive: GPU-style cores
         –  I/O intensive: FAWN (CMU)
         –  Memory intensive: Opteron/Xeon NUMA servers

5/4/11                                                17
A Research Perspective
•  Coping with heterogeneity is hard
         –  Different degrees of parallelism have different
            sequential executions speeds
         –  Many tradeoffs: Speed vs. Energy vs. Memory
            intensity vs. I/O intensity
•  Need models for heterogeneity
         –  Understand the cost of the applications in terms
            of FLOPS, INTOPS, memory, I/O etc.
•  Silver lining: stick to sequential apps (?)

5/4/11                                                         18
A Research Perspective
•  Coping with slow memory
•  Need to improve data locality by orders of
   magnitude
         •  Compiler support, auto-tunners etc.
•  Space-efficient data types:
         •  HOT area in algo & systems
         •  Bloom filters: NSDI’10: 3 papers!
         •  Succinct data structures: STOC’08-STOC’10
         •  Cache oblivious algorithms

5/4/11                                                  19
A Research Perspective
•  Software-helped cache coherence
         –  Or go without it J
•  Renounce some programming patterns
            •  Java initializes all objects to some value…
            •  Rethink those hash tables
•  Go for approximate solutions
         –  It’s better if you can provide error bounds



5/4/11                                                       20
Discussion


         Thank you for your attention




5/4/11                                  21

Contenu connexe

Similaire à The Multicore Midlife Crisis and Coping with Slow Memory

Nano-node: Intelligent Hard Drives in Large Storage Infrastructures
Nano-node: Intelligent Hard Drives in Large Storage InfrastructuresNano-node: Intelligent Hard Drives in Large Storage Infrastructures
Nano-node: Intelligent Hard Drives in Large Storage InfrastructuresOpenIO Object Storage
 
.NET Memory Primer
.NET Memory Primer.NET Memory Primer
.NET Memory PrimerMartin Kulov
 
Single and Multi core processor
Single and Multi core processorSingle and Multi core processor
Single and Multi core processorMunaam Munawar
 
Direct memory jugl-2012.03.08
Direct memory jugl-2012.03.08Direct memory jugl-2012.03.08
Direct memory jugl-2012.03.08Benoit Perroud
 
7 (or so) deadly sins - PLMCE 2015
7 (or so) deadly sins - PLMCE 20157 (or so) deadly sins - PLMCE 2015
7 (or so) deadly sins - PLMCE 2015Martin Arrieta
 

Similaire à The Multicore Midlife Crisis and Coping with Slow Memory (6)

Nano-node: Intelligent Hard Drives in Large Storage Infrastructures
Nano-node: Intelligent Hard Drives in Large Storage InfrastructuresNano-node: Intelligent Hard Drives in Large Storage Infrastructures
Nano-node: Intelligent Hard Drives in Large Storage Infrastructures
 
.NET Memory Primer
.NET Memory Primer.NET Memory Primer
.NET Memory Primer
 
15 storage
15 storage15 storage
15 storage
 
Single and Multi core processor
Single and Multi core processorSingle and Multi core processor
Single and Multi core processor
 
Direct memory jugl-2012.03.08
Direct memory jugl-2012.03.08Direct memory jugl-2012.03.08
Direct memory jugl-2012.03.08
 
7 (or so) deadly sins - PLMCE 2015
7 (or so) deadly sins - PLMCE 20157 (or so) deadly sins - PLMCE 2015
7 (or so) deadly sins - PLMCE 2015
 

Plus de cstalks

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Novcstalks
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Augcstalks
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sepcstalks
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Augcstalks
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th Maycstalks
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Marcstalks
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Marcstalks
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Marcstalks
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Febcstalks
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Febcstalks
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Febcstalks
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jancstalks
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 

Plus de cstalks (15)

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Nov
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Aug
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sep
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Mar
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Mar
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Mar
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Feb
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Feb
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Feb
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jan
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 

The Multicore Midlife Crisis and Coping with Slow Memory

  • 1. The Multicore Midlife Crisis Bogdan Marius Tudor CSTalks 30 March 2011
  • 2. Outline •  The Memory Problem •  Do We Need All These Cores? •  Tomorrow’s Multicore •  Research Perspective 5/4/11 2
  • 3. Remember Single Core? Wikipedia 5/4/11 3
  • 4. My Next Processors 4000 3000 Cache Size [kB] 2000 1000 0 66 200 1000 2250 1600 2400 2400 MHz MHz MHz MHz MHz MHz MHz Apr-94 Apr-98 Nov-01 May-04 Jul-06 Jul-08 Mar-11 5/4/11 4
  • 5. My Next Processors 4000 3000 Cache Size [kB] 2000 1000 0 66 200 1000 2250 1600 2400 2400 MHz MHz MHz MHz MHz MHz MHz Apr-94 Apr-98 Nov-01 May-04 Jul-06 Jul-08 Mar-11 5/4/11 5
  • 6. So What? Yeap, they improved the cache size. Do I care? The interesting part is why they did it. 5/4/11 6
  • 7. The Memory Problem •  Moore’s Law: the number Processor of transistors double Core Core Core Core every 18 months –  Singlecore: new transistors = faster speed –  Multicore: new transistors Cache = more cores •  Memory speed increase Memory does not obey Moore’s Law! 5/4/11 7
  • 8. The Memory Problem •  Problem: More cores compete for same slow memory! •  Implications: IF IF ID Queue ID ID X Stalled! M access to cache or RAM W J 5 cycles L > 100 cycles 5/4/11 8
  • 9. The Memory Problem •  Problem: More cores compete for same slow memory! •  Solution: Increase cache size J –  Maintain cache hit rate •  2x cache hit rate requires 4x cache size •  Exponential increase in #transistors need –  Cache coherence overhead 5/4/11 9
  • 10. Increasing Cache Size Not practical! B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 2009 5/4/11 10
  • 11. Other Approaches •  Improve memory speed –  Slow, power-hungry and error-prone •  Better caching •  Improve memory bandwidth –  Latency tradeoff •  Prefetch –  Mixed blessings •  Allow more in-flight requests 5/4/11 11
  • 12. Do We Need All These Cores? •  Average utilization: < 20% •  We don’t have too many parallel apps •  We just have enough compute power •  Until you try to encode an HD video –  Star Trek holodecks: not there yet •  CPU vendors still have to make a living 5/4/11 12
  • 13. Tomorrow’s Multicore Intel 5/4/11 13
  • 14. Tomorrow’s Multicore •  Intel Core i3, i5, i7 –  Video is integrated into CPU –  Must balance sequential and parallel performance –  Lower energy requirements than prev. generations •  Heterogeneous cores –  Many, slow, good at floating points –  Some general purpose cores –  “Combine” cores into super-cores •  Must live with the memory problems 5/4/11 14
  • 15. Tomorrow’s Multicore •  The number of cores is becoming less important –  They can’t keep increasing them –  i3, i5, i7: how many cores each? 5/4/11 15
  • 16. Tomorrow’s Multicore Wikipedia 5/4/11 16
  • 17. Tomorrow’s Multicore •  The number of cores is becoming less important –  They can’t keep increasing them –  i3, i5, i7: how many cores each? •  Important is what the system provides –  FLOP intensive: GPU-style cores –  I/O intensive: FAWN (CMU) –  Memory intensive: Opteron/Xeon NUMA servers 5/4/11 17
  • 18. A Research Perspective •  Coping with heterogeneity is hard –  Different degrees of parallelism have different sequential executions speeds –  Many tradeoffs: Speed vs. Energy vs. Memory intensity vs. I/O intensity •  Need models for heterogeneity –  Understand the cost of the applications in terms of FLOPS, INTOPS, memory, I/O etc. •  Silver lining: stick to sequential apps (?) 5/4/11 18
  • 19. A Research Perspective •  Coping with slow memory •  Need to improve data locality by orders of magnitude •  Compiler support, auto-tunners etc. •  Space-efficient data types: •  HOT area in algo & systems •  Bloom filters: NSDI’10: 3 papers! •  Succinct data structures: STOC’08-STOC’10 •  Cache oblivious algorithms 5/4/11 19
  • 20. A Research Perspective •  Software-helped cache coherence –  Or go without it J •  Renounce some programming patterns •  Java initializes all objects to some value… •  Rethink those hash tables •  Go for approximate solutions –  It’s better if you can provide error bounds 5/4/11 20
  • 21. Discussion Thank you for your attention 5/4/11 21