SlideShare a Scribd company logo
1 of 53
Massively Parallel Architectures Colin Egan, Jason McGuiness and Michael Hicks (Compiler Technology and  Computer Architecture Research Group)
Presentation Structure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The memory wall ,[object Object],[object Object],[object Object],[object Object],[object Object]
The memory wall ,[object Object],[object Object],[object Object]
The memory wall ,[object Object],[object Object]
Overcoming the memory wall ,[object Object],[object Object],[object Object]
Processing in memory ,[object Object]
Processing in memory ,[object Object],[object Object],[object Object]
Processing in memory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Processing in memory ,[object Object],[object Object],[object Object],[object Object],[object Object]
Processing in memory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Processing in memory ,[object Object],[object Object],[object Object],[object Object]
Processing in memory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cellular architectures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cyclops ,[object Object],[object Object]
Cyclops ,[object Object],[object Object],[object Object]
Logical View of the Cyclops64 Chip Architecture On-chip Memory (SRAM) : 160 x 28KB = 4480 KB Off-chip Memory (DDR2) : 4 x 256MB  = 1GB Hard Drive (IDE) : 1 x 120GB  = 120 GB SP is located in the memory banks and is accessed directly by the TU and  via the crossbar network by the other TUs. Processors : 80 / chip Thread Units (TU) : 2 / processor Floating-Point Unit (FPU) : 1 / processor Shared Registers (SR) : 0 / processor Scratch-Pad Memory (SP) :  n  KB / TU Processor Chip Board Crossbar Network TU TU TU … SP SP SP FPU SR TU TU TU … SP SP SP FPU SR TU TU TU … SP SP SP FPU SR … MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK … 6 * 4 GB/sec 4 GB/sec 50 MB/sec 1 Gbits/sec Off-Chip Memory Other Chips via 3D mesh Off-Chip Memory Off-Chip Memory Off-Chip Memory IDE HDD 4 GB/sec 6 * 4 GB/sec SP SP SP SP SP SP SP SP
DIMES ,[object Object],[object Object]
DIMES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DIMES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DIMES ,[object Object]
Jason’s Work with DIMES ,[object Object],[object Object],[object Object],[object Object]
Jason’s Work with DIMES ,[object Object],[object Object],[object Object],[object Object]
Jason’s Work with DIMES Shortly after program start Shortly before work stealing Just after work stealing  More work stealing
Gilgamesh ,[object Object],[object Object],[object Object]
Gilgamesh System Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gilgamesh System Architecture
MIND Chip ,[object Object],[object Object],[object Object],[object Object]
MIND Chip Architecture
MIND Chip Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MIND Node Structure
MIND Node Structure ,[object Object],[object Object],[object Object],[object Object]
MIND PIM Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parcels ,[object Object],[object Object],[object Object]
Multithreaded Execution ,[object Object],[object Object],[object Object],[object Object],[object Object]
Shamrock Notre Dame ,[object Object]
Shamrock Architecture Overview
Node Structure
Node Structure ,[object Object],[object Object],[object Object],[object Object]
Shamrock Chip ,[object Object],[object Object]
Shamrock Chip Architecture ,[object Object]
Shamrock Chip ,[object Object],[object Object],[object Object]
picoChip ,[object Object],[object Object]
picoChip ,[object Object],[object Object],[object Object],[object Object],[object Object]
picoChip ,[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Massively Parallel Architectures ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hintonparallellabs
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantagesNitesh Tudu
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Single and Multi core processor
Single and Multi core processorSingle and Multi core processor
Single and Multi core processorMunaam Munawar
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computerHassan A-j
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processorMazin Alwaaly
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processorZeeshan Aslam
 
Multiprocessor architecture and programming
Multiprocessor architecture and programmingMultiprocessor architecture and programming
Multiprocessor architecture and programmingRaul Goycoolea Seoane
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technologySindhu Nathan
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureUmair Amjad
 
Advanced trends in microcontrollers by suhel
Advanced trends in microcontrollers by suhelAdvanced trends in microcontrollers by suhel
Advanced trends in microcontrollers by suhelSuhel Mulla
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010Umair Mohsin
 

What's hot (20)

Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
UNIT 3.docx
UNIT 3.docxUNIT 3.docx
UNIT 3.docx
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Single and Multi core processor
Single and Multi core processorSingle and Multi core processor
Single and Multi core processor
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processor
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
 
Cpu spec
Cpu specCpu spec
Cpu spec
 
IEEExeonmem
IEEExeonmemIEEExeonmem
IEEExeonmem
 
Multiprocessor architecture and programming
Multiprocessor architecture and programmingMultiprocessor architecture and programming
Multiprocessor architecture and programming
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
Sucet os module_5_notes
Sucet os module_5_notesSucet os module_5_notes
Sucet os module_5_notes
 
Nehalem
NehalemNehalem
Nehalem
 
Advanced trends in microcontrollers by suhel
Advanced trends in microcontrollers by suhelAdvanced trends in microcontrollers by suhel
Advanced trends in microcontrollers by suhel
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010
 

Viewers also liked

The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...Jason Hearne-McGuiness
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introductionnpinto
 
Storage strategy and tsm roadmap
Storage strategy and tsm roadmapStorage strategy and tsm roadmap
Storage strategy and tsm roadmapIBM Danmark
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQInMobi Technology
 
Greenplum- an opensource
Greenplum- an opensourceGreenplum- an opensource
Greenplum- an opensourceRosy Mani
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Vivre en parallèle - Softshake 2013
Vivre en parallèle - Softshake 2013Vivre en parallèle - Softshake 2013
Vivre en parallèle - Softshake 2013Henri Tremblay
 

Viewers also liked (9)

The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...The Challenges facing Libraries and Imperative Languages from Massively Paral...
The Challenges facing Libraries and Imperative Languages from Massively Paral...
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction
 
Storage strategy and tsm roadmap
Storage strategy and tsm roadmapStorage strategy and tsm roadmap
Storage strategy and tsm roadmap
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
 
Greenplum- an opensource
Greenplum- an opensourceGreenplum- an opensource
Greenplum- an opensource
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Vivre en parallèle - Softshake 2013
Vivre en parallèle - Softshake 2013Vivre en parallèle - Softshake 2013
Vivre en parallèle - Softshake 2013
 
1.sma
1.sma1.sma
1.sma
 

Similar to Massively Parallel Architectures

Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorIOSR Journals
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxsivasubramanianManic2
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cacheVISHAL DONGA
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsateeq ateeq
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
Free Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private LimitedFree Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private LimitedHemraj Singh Chouhan
 
Intel 8th generation and 7th gen microprocessor full details especially for t...
Intel 8th generation and 7th gen microprocessor full details especially for t...Intel 8th generation and 7th gen microprocessor full details especially for t...
Intel 8th generation and 7th gen microprocessor full details especially for t...Chessin Chacko
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3Shah Zaib
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsIOSR Journals
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelManoraj Pannerselum
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.gadisaAdamu
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 

Similar to Massively Parallel Architectures (20)

Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips Processor
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 
Cache memory presentation
Cache memory presentationCache memory presentation
Cache memory presentation
 
Module 1 unit 3
Module 1  unit 3Module 1  unit 3
Module 1 unit 3
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Free Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private LimitedFree Hardware & Networking Slides by ITE Infotech Private Limited
Free Hardware & Networking Slides by ITE Infotech Private Limited
 
Intel 8th generation and 7th gen microprocessor full details especially for t...
Intel 8th generation and 7th gen microprocessor full details especially for t...Intel 8th generation and 7th gen microprocessor full details especially for t...
Intel 8th generation and 7th gen microprocessor full details especially for t...
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.Ch1Intro.pdf Computer organization and org.
Ch1Intro.pdf Computer organization and org.
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 

More from Jason Hearne-McGuiness

A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.Jason Hearne-McGuiness
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...Jason Hearne-McGuiness
 
A DSEL for Addressing the Problems Posed by Parallel Architectures
A DSEL for Addressing the Problems Posed by Parallel ArchitecturesA DSEL for Addressing the Problems Posed by Parallel Architectures
A DSEL for Addressing the Problems Posed by Parallel ArchitecturesJason Hearne-McGuiness
 
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...Jason Hearne-McGuiness
 
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Jason Hearne-McGuiness
 

More from Jason Hearne-McGuiness (6)

A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
 
A DSEL for Addressing the Problems Posed by Parallel Architectures
A DSEL for Addressing the Problems Posed by Parallel ArchitecturesA DSEL for Addressing the Problems Posed by Parallel Architectures
A DSEL for Addressing the Problems Posed by Parallel Architectures
 
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...
Implementing Batcher’s Bitonic Sort in C++: An Investigation into using Libra...
 
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
 
Introducing Parallel Pixie Dust
Introducing Parallel Pixie DustIntroducing Parallel Pixie Dust
Introducing Parallel Pixie Dust
 

Massively Parallel Architectures

  • 1. Massively Parallel Architectures Colin Egan, Jason McGuiness and Michael Hicks (Compiler Technology and Computer Architecture Research Group)
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Logical View of the Cyclops64 Chip Architecture On-chip Memory (SRAM) : 160 x 28KB = 4480 KB Off-chip Memory (DDR2) : 4 x 256MB = 1GB Hard Drive (IDE) : 1 x 120GB = 120 GB SP is located in the memory banks and is accessed directly by the TU and via the crossbar network by the other TUs. Processors : 80 / chip Thread Units (TU) : 2 / processor Floating-Point Unit (FPU) : 1 / processor Shared Registers (SR) : 0 / processor Scratch-Pad Memory (SP) : n KB / TU Processor Chip Board Crossbar Network TU TU TU … SP SP SP FPU SR TU TU TU … SP SP SP FPU SR TU TU TU … SP SP SP FPU SR … MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK MEMORY BANK … 6 * 4 GB/sec 4 GB/sec 50 MB/sec 1 Gbits/sec Off-Chip Memory Other Chips via 3D mesh Off-Chip Memory Off-Chip Memory Off-Chip Memory IDE HDD 4 GB/sec 6 * 4 GB/sec SP SP SP SP SP SP SP SP
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Jason’s Work with DIMES Shortly after program start Shortly before work stealing Just after work stealing More work stealing
  • 31.
  • 32.
  • 34.
  • 36.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.