SlideShare une entreprise Scribd logo
1  sur  29
William Stallings
Computer Organization
and Architecture
8th Edition


Chapter 18
Multicore Computers
Hardware Performance Issues
• Microprocessors have seen an exponential
  increase in performance
  —Improved organization
  —Increased clock frequency
• Increase in Parallelism
  —Pipelining
  —Superscalar
  —Simultaneous multithreading (SMT)
• Diminishing returns
  —More complexity requires more logic
  —Increasing chip area for coordinating and
   signal transfer logic
     – Harder to design, make and debug
Alternative Chip
Organizations
Intel Hardware
Trends
Increased Complexity
• Power requirements grow exponentially with chip
  density and clock frequency
   — Can use more chip area for cache
      – Smaller
      – Order of magnitude lower power requirements
• By 2015
   — 100 billion transistors on 300mm2 die
      – Cache of 100MB
      – 1 billion transistors for logic
• Pollack’s rule:
   — Performance is roughly proportional to square root of
     increase in complexity
      – Double complexity gives 40% more performance
• Multicore has potential for near-linear
  improvement
• Unlikely that one core can use all cache
  effectively
Power and Memory Considerations
Chip Utilization of Transistors
Software Performance Issues
• Performance benefits dependent on
  effective exploitation of parallel resources
• Even small amounts of serial code impact
  performance
  —10% inherently serial on 8 processor system
   gives only 4.7 times performance
• Communication, distribution of work and
  cache coherence overheads
• Some applications effectively exploit
  multicore processors
Effective Applications for Multicore Processors
• Database
• Servers handling independent transactions
• Multi-threaded native applications
   — Lotus Domino, Siebel CRM
• Multi-process applications
   — Oracle, SAP, PeopleSoft
• Java applications
   — Java VM is multi-thread with scheduling and memory
     management
   — Sun’s Java Application Server, BEA’s Weblogic, IBM
     Websphere, Tomcat
• Multi-instance applications
   — One application running multiple times
• E.g. Value Game Software
Multicore Organization
•   Number of core processors on chip
•   Number of levels of cache on chip
•   Amount of shared cache
•   Next slide examples of each organization:
•   (a) ARM11 MPCore
•   (b) AMD Opteron
•   (c) Intel Core Duo
•   (d) Intel Core i7
Multicore Organization Alternatives
Advantages of shared L2 Cache
• Constructive interference reduces overall miss
  rate
• Data shared by multiple cores not replicated at
  cache level
• With proper frame replacement algorithms mean
  amount of shared cache dedicated to each core is
  dynamic
  — Threads with less locality can have more cache
• Easy inter-process communication through
  shared memory
• Cache coherency confined to L1
• Dedicated L2 cache gives each core more rapid
  access
  — Good for threads with strong locality
• Shared L3 cache may also improve performance
Individual Core Architecture
• Intel Core Duo uses superscalar cores
• Intel Core i7 uses simultaneous multi-
  threading (SMT)
  —Scales up number of threads supported
     – 4 SMT cores, each supporting 4 threads appears as
       16 core
Intel x86 Multicore Organization -
Core Duo (1)
• 2006
• Two x86 superscalar, shared L2 cache
• Dedicated L1 cache per core
   —32KB instruction and 32KB data
• Thermal control unit per core
   —Manages chip heat dissipation
   —Maximize performance within constraints
   —Improved ergonomics
• Advanced Programmable Interrupt
  Controlled (APIC)
   —Inter-process interrupts between cores
   —Routes interrupts to appropriate core
   —Includes timer so OS can interrupt core
Intel x86 Multicore Organization -
Core Duo (2)
• Power Management Logic
   —Monitors thermal conditions and CPU activity
   —Adjusts voltage and power consumption
   —Can switch individual logic subsystems
• 2MB shared L2 cache
   —Dynamic allocation
   —MESI support for L1 caches
   —Extended to support multiple Core Duo in SMP
     – L2 data shared between local cores or external
• Bus interface
Intel x86 Multicore Organization -
Core i7
•   November 2008
•   Four x86 SMT processors
•   Dedicated L2, shared L3 cache
•   Speculative pre-fetch for caches
•   On chip DDR3 memory controller
    — Three 8 byte channels (192 bits) giving 32GB/s
    — No front side bus
• QuickPath Interconnection
    — Cache coherent point-to-point link
    — High speed communications between processor chips
    — 6.4G transfers per second, 16 bits per transfer
    — Dedicated bi-directional pairs
    — Total bandwidth 25.6GB/s
ARM11 MPCore
• Up to 4 processors each with own L1 instruction and data
  cache
• Distributed interrupt controller
• Timer per CPU
• Watchdog
   — Warning alerts for software failures
   — Counts down from predetermined values
   — Issues warning at zero
• CPU interface
   — Interrupt acknowledgement, masking and completion
     acknowledgement
• CPU
   — Single ARM11 called MP11
• Vector floating-point unit
   — FP co-processor
• L1 cache
• Snoop control unit
   — L1 cache coherency
ARM11
MPCore
Block
Diagram
ARM11 MPCore Interrupt Handling
• Distributed Interrupt Controller (DIC) collates
  from many sources
• Masking
• Prioritization
• Distribution to target MP11 CPUs
• Status tracking
• Software interrupt generation
• Number of interrupts independent of MP11 CPU
  design
• Memory mapped
• Accessed by CPUs via private interface through
  SCU
• Can route interrupts to single or multiple CPUs
• Provides inter-process communication
  — Thread on one CPU can cause activity by thread on
    another CPU
DIC Routing
•   Direct to specific CPU
•   To defined group of CPUs
•   To all CPUs
•   OS can generate interrupt to:
    —All but self
    —Self
    —Other specific CPU
• Typically combined with shared memory
  for inter-process communication
• 16 interrupt ids available for inter-process
  communication
Interrupt States
• Inactive
  —Non-asserted
  —Completed by that CPU but pending or active
   in others
• Pending
  —Asserted
  —Processing not started on that CPU
• Active
  —Started on that CPU but not complete
  —Can be pre-empted by higher priority interrupt
Interrupt Sources
• Inter-process Interrupts (IPI)
   — Private to CPU
   — ID0-ID15
   — Software triggered
   — Priority depends on target CPU not source
• Private timer and/or watchdog interrupt
   — ID29 and ID30
• Legacy FIQ line
   — Legacy FIQ pin, per CPU, bypasses interrupt distributor
   — Directly drives interrupts to CPU
• Hardware
   — Triggered by programmable events on associated
     interrupt lines
   — Up to 224 lines
   — Start at ID32
ARM11 MPCore Interrupt Distributor
Cache Coherency
• Snoop Control Unit (SCU) resolves most shared
  data bottleneck issues
• L1 cache coherency based on MESI
• Direct data Intervention
  — Copying clean entries between L1 caches without
    accessing external memory
  — Reduces read after write from L1 to L2
  — Can resolve local L1 miss from rmote L1 rather than L2
• Duplicated tag RAMs
  — Cache tags implemented as separate block of RAM
  — Same length as number of lines in cache
  — Duplicates used by SCU to check data availability before
    sending coherency commands
  — Only send to CPUs that must update coherent data
    cache
• Migratory lines
  — Allows moving dirty data between CPUs without writing
    to L2 and reading back from external memory
Recommended Reading
• Stallings chapter 18
• ARM web site
Intel Core i& Block Diagram
Intel Core Duo Block Diagram
Performance Effect of Multiple Cores
Recommended Reading
• Multicore Association web site
• ARM web site

Contenu connexe

Tendances

Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and ArchitectureDhaval Bagal
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
Multi core processors
Multi core processorsMulti core processors
Multi core processorsAdithya Bhat
 
M.c.a. (sem ii) operating systems
M.c.a. (sem   ii) operating systemsM.c.a. (sem   ii) operating systems
M.c.a. (sem ii) operating systemsTushar Rajput
 
Central processing unit and stack organization r013
Central processing unit and stack organization   r013Central processing unit and stack organization   r013
Central processing unit and stack organization r013arunachalamr16
 
Register Organization and Instruction cycle
Register Organization and Instruction cycleRegister Organization and Instruction cycle
Register Organization and Instruction cycleMuhammad Ameer Mohavia
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 
Operating System Lecture Notes
Operating System Lecture NotesOperating System Lecture Notes
Operating System Lecture NotesFellowBuddy.com
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance mentoresd
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and ArchitectureVinit Raut
 

Tendances (20)

Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Embedded C - Optimization techniques
Embedded C - Optimization techniquesEmbedded C - Optimization techniques
Embedded C - Optimization techniques
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
M.c.a. (sem ii) operating systems
M.c.a. (sem   ii) operating systemsM.c.a. (sem   ii) operating systems
M.c.a. (sem ii) operating systems
 
Central processing unit and stack organization r013
Central processing unit and stack organization   r013Central processing unit and stack organization   r013
Central processing unit and stack organization r013
 
Register Organization and Instruction cycle
Register Organization and Instruction cycleRegister Organization and Instruction cycle
Register Organization and Instruction cycle
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Operating System Lecture Notes
Operating System Lecture NotesOperating System Lecture Notes
Operating System Lecture Notes
 
Memory management
Memory managementMemory management
Memory management
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
Cache memory
Cache memoryCache memory
Cache memory
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memory
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
Memory Addressing
Memory AddressingMemory Addressing
Memory Addressing
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 

Similaire à chap 18 multicore computers

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptKadri20
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptxRadhaC10
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technologyAmirali Sharifian
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptxPratik Gohel
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architecturesYoung Alista
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processingdilip kumar
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdfaliamjd
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerAmrutaMehata
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD) Ali Raza
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD) Ali Raza
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 finalKhalid Elmeadawy
 

Similaire à chap 18 multicore computers (20)

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptx
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdf
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
 
Cpu
CpuCpu
Cpu
 
OS-Part-01.pdf
OS-Part-01.pdfOS-Part-01.pdf
OS-Part-01.pdf
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 final
 

Plus de Sher Shah Merkhel

Plus de Sher Shah Merkhel (13)

13 risc
13 risc13 risc
13 risc
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
11 instruction sets addressing modes
11  instruction sets addressing modes 11  instruction sets addressing modes
11 instruction sets addressing modes
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
 
09 arithmetic 2
09 arithmetic 209 arithmetic 2
09 arithmetic 2
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmetic
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
07 input output
07 input output07 input output
07 input output
 
06 external memory
06 external memory06 external memory
06 external memory
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
03 top level view of computer function and interconnection
03 top level view of computer function and interconnection03 top level view of computer function and interconnection
03 top level view of computer function and interconnection
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
01 introduction
01 introduction01 introduction
01 introduction
 

Dernier

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Dernier (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

chap 18 multicore computers

  • 1. William Stallings Computer Organization and Architecture 8th Edition Chapter 18 Multicore Computers
  • 2. Hardware Performance Issues • Microprocessors have seen an exponential increase in performance —Improved organization —Increased clock frequency • Increase in Parallelism —Pipelining —Superscalar —Simultaneous multithreading (SMT) • Diminishing returns —More complexity requires more logic —Increasing chip area for coordinating and signal transfer logic – Harder to design, make and debug
  • 5. Increased Complexity • Power requirements grow exponentially with chip density and clock frequency — Can use more chip area for cache – Smaller – Order of magnitude lower power requirements • By 2015 — 100 billion transistors on 300mm2 die – Cache of 100MB – 1 billion transistors for logic • Pollack’s rule: — Performance is roughly proportional to square root of increase in complexity – Double complexity gives 40% more performance • Multicore has potential for near-linear improvement • Unlikely that one core can use all cache effectively
  • 6. Power and Memory Considerations
  • 7. Chip Utilization of Transistors
  • 8. Software Performance Issues • Performance benefits dependent on effective exploitation of parallel resources • Even small amounts of serial code impact performance —10% inherently serial on 8 processor system gives only 4.7 times performance • Communication, distribution of work and cache coherence overheads • Some applications effectively exploit multicore processors
  • 9. Effective Applications for Multicore Processors • Database • Servers handling independent transactions • Multi-threaded native applications — Lotus Domino, Siebel CRM • Multi-process applications — Oracle, SAP, PeopleSoft • Java applications — Java VM is multi-thread with scheduling and memory management — Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat • Multi-instance applications — One application running multiple times • E.g. Value Game Software
  • 10. Multicore Organization • Number of core processors on chip • Number of levels of cache on chip • Amount of shared cache • Next slide examples of each organization: • (a) ARM11 MPCore • (b) AMD Opteron • (c) Intel Core Duo • (d) Intel Core i7
  • 12. Advantages of shared L2 Cache • Constructive interference reduces overall miss rate • Data shared by multiple cores not replicated at cache level • With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic — Threads with less locality can have more cache • Easy inter-process communication through shared memory • Cache coherency confined to L1 • Dedicated L2 cache gives each core more rapid access — Good for threads with strong locality • Shared L3 cache may also improve performance
  • 13. Individual Core Architecture • Intel Core Duo uses superscalar cores • Intel Core i7 uses simultaneous multi- threading (SMT) —Scales up number of threads supported – 4 SMT cores, each supporting 4 threads appears as 16 core
  • 14. Intel x86 Multicore Organization - Core Duo (1) • 2006 • Two x86 superscalar, shared L2 cache • Dedicated L1 cache per core —32KB instruction and 32KB data • Thermal control unit per core —Manages chip heat dissipation —Maximize performance within constraints —Improved ergonomics • Advanced Programmable Interrupt Controlled (APIC) —Inter-process interrupts between cores —Routes interrupts to appropriate core —Includes timer so OS can interrupt core
  • 15. Intel x86 Multicore Organization - Core Duo (2) • Power Management Logic —Monitors thermal conditions and CPU activity —Adjusts voltage and power consumption —Can switch individual logic subsystems • 2MB shared L2 cache —Dynamic allocation —MESI support for L1 caches —Extended to support multiple Core Duo in SMP – L2 data shared between local cores or external • Bus interface
  • 16. Intel x86 Multicore Organization - Core i7 • November 2008 • Four x86 SMT processors • Dedicated L2, shared L3 cache • Speculative pre-fetch for caches • On chip DDR3 memory controller — Three 8 byte channels (192 bits) giving 32GB/s — No front side bus • QuickPath Interconnection — Cache coherent point-to-point link — High speed communications between processor chips — 6.4G transfers per second, 16 bits per transfer — Dedicated bi-directional pairs — Total bandwidth 25.6GB/s
  • 17. ARM11 MPCore • Up to 4 processors each with own L1 instruction and data cache • Distributed interrupt controller • Timer per CPU • Watchdog — Warning alerts for software failures — Counts down from predetermined values — Issues warning at zero • CPU interface — Interrupt acknowledgement, masking and completion acknowledgement • CPU — Single ARM11 called MP11 • Vector floating-point unit — FP co-processor • L1 cache • Snoop control unit — L1 cache coherency
  • 19. ARM11 MPCore Interrupt Handling • Distributed Interrupt Controller (DIC) collates from many sources • Masking • Prioritization • Distribution to target MP11 CPUs • Status tracking • Software interrupt generation • Number of interrupts independent of MP11 CPU design • Memory mapped • Accessed by CPUs via private interface through SCU • Can route interrupts to single or multiple CPUs • Provides inter-process communication — Thread on one CPU can cause activity by thread on another CPU
  • 20. DIC Routing • Direct to specific CPU • To defined group of CPUs • To all CPUs • OS can generate interrupt to: —All but self —Self —Other specific CPU • Typically combined with shared memory for inter-process communication • 16 interrupt ids available for inter-process communication
  • 21. Interrupt States • Inactive —Non-asserted —Completed by that CPU but pending or active in others • Pending —Asserted —Processing not started on that CPU • Active —Started on that CPU but not complete —Can be pre-empted by higher priority interrupt
  • 22. Interrupt Sources • Inter-process Interrupts (IPI) — Private to CPU — ID0-ID15 — Software triggered — Priority depends on target CPU not source • Private timer and/or watchdog interrupt — ID29 and ID30 • Legacy FIQ line — Legacy FIQ pin, per CPU, bypasses interrupt distributor — Directly drives interrupts to CPU • Hardware — Triggered by programmable events on associated interrupt lines — Up to 224 lines — Start at ID32
  • 23. ARM11 MPCore Interrupt Distributor
  • 24. Cache Coherency • Snoop Control Unit (SCU) resolves most shared data bottleneck issues • L1 cache coherency based on MESI • Direct data Intervention — Copying clean entries between L1 caches without accessing external memory — Reduces read after write from L1 to L2 — Can resolve local L1 miss from rmote L1 rather than L2 • Duplicated tag RAMs — Cache tags implemented as separate block of RAM — Same length as number of lines in cache — Duplicates used by SCU to check data availability before sending coherency commands — Only send to CPUs that must update coherent data cache • Migratory lines — Allows moving dirty data between CPUs without writing to L2 and reading back from external memory
  • 25. Recommended Reading • Stallings chapter 18 • ARM web site
  • 26. Intel Core i& Block Diagram
  • 27. Intel Core Duo Block Diagram
  • 28. Performance Effect of Multiple Cores
  • 29. Recommended Reading • Multicore Association web site • ARM web site