SlideShare une entreprise Scribd logo
1  sur  45
HETEROGENEOUS SYSTEM
ARCHITECTURE OVERVIEW
Ofer Rosenberg
DISCLAIMER:
This presentation is not an Official HSA Foundation
presentation.
Most of the Material is taken from HSA HotChips 2013
Some slides contains my insights / Opinions
CONTENT
 Introduction
 hUMA
 hQ
 HSAIL
 HSA Software
 HSA Challenges
 HSA Availability
INTRODUCTION
HISTORIC PERSPECTIVE
Accelerated System
 Program runs on CPU
 API to access Accelerators
 ASIC or Firmware
 Configurable, but operation is
fixed
Heterogeneous System
 Program runs on CPU
 Offloads work on Accelerators
 GPU, DSP, etc.
 Offloaded work is JITed (compiled at
runtime)
5
Distributed SoC based
HSA FOUNDATION
 Originated from AMD’s FSA – Fusion System Architecture
 HSA Foundation Founded in June 2012
6
HSA FOUNDATION MEMBERS
7
Founders
Promoters
Supporters
Contributors
Academic
Associates
Slide Taken from Phil Rogers HSA Overview, HotChips 2013
WHAT IS HSA ALL ABOUT ?
(MY TAKE)
 “Bring Accelerators forward as a first class processor”
 Unified address space, pageable memory, coherency
 Eliminate drivers from dispatch path (user mode queues)
 Standardized SW stack built on top of a set of HW requirements
 Improve interoperability between IP vendors
 Unified Architecture for Accelerators
 Start from GPU, extend to DSP / FPGA
/ Fixed-Function Acc , etc.
 SoC Centric
 Major features are optimal for SoC
environment (same memory/die)
 Support of distributed system is
possible, yet inefficient (PCI atomics,
others)
8Slide Taken from Phil Rogers HSA Overview, HotChips 2013
HSA WORKING GROUPS
 HSA Systems Architecture
 hUMA – Unified Memory Model
 hQ – HSA Queuing Model
 HSA Programmer Reference Specification
 HSAIL – HSA Intermediate Language
 HSA System Runtime
 HSA Compliance
 HSA Tools
9http://hsafoundation.com/standards/
OPENCL™ AND HSA
 HSA is an optimized platform architecture
for OpenCL™
 Not an alternative to OpenCL™
 OpenCL™ on HSA will benefit from
 Avoidance of wasteful copies
 Low latency dispatch
 Improved memory model
 Pointers shared between CPU and GPU
 OpenCL™ 2.0 shows considerable alignment
with HSA
 Many HSA member companies are also active
with Khronos in the OpenCL™ working group
10Slide Taken from Phil Rogers HSA Overview, HotChips 2013
hUMA
© Copyright 2012 HSA Foundation. All Rights Reserved. 11
hUMA
HSA Unified Memory Architecture
Evolution of CPU / GPU memory systems:
1. CPU uses Virtual Addresses, GPU uses Physical Addresses
 Memory had to be pinned
 GPU can access a limited area in the CPU memory (Aperture)
 Requires copy from system memory to GPU-visible memory
 Pointer-based data structures can’t be shared
2. CPU uses Virtual Addresses, GPU uses Virtual Addresses (but not the same)
 Memory still had to be pinned
12
 GPU can access the entire system
memory
 Copy is not required
 Pointer-based data structures still
can’t be shared
3. hUMA
hUMA
HSA Unified Memory Architecture
 Shared Virtual Memory
 CPU & GPU see the
same addresses
 Pageable Memory
 GPU can (somehow)
initiate a page fault
 Cache coherency
13
SHARED VIRTUAL MEMORY
 Advantages
 No mapping tricks, no copying back-and-forth between different PA
addresses
 Send pointers (not data) back and forth between HSA agents.
 Note the Hardware Implications …
 Common Page Tables (and common interpretation of architectural
semantics such as shareability, protection, etc).
 Common mechanisms for address translation (and servicing
address translation faults)
 Concept of a process address space (PASID) to allow multiple, per
process virtual address spaces within the system.
14 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
CACHE COHERENCY DOMAINS
 Advantages
 Composability
 Reduced SW complexity when communicating between agents
 Lower barrier to entry when porting software
 Note the Hardware Implications …
 Hardware coherency support between all HSA agents
 Can take many forms
 Stand alone Snoop Filters / Directories
 Combined L3/Filters
 Snoop-based systems (no filter)
 Etc …
15 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
hQ
© Copyright 2012 HSA Foundation. All Rights Reserved. 16
hQ Motivation
1. GPU Dispatch has a lot of overhead
 SW/Driver stack overhead
 User mode to Kernel mode switch
17
hQ Motivation
2. Master/Slave pattern is limiting (and has a lot of overhead)
 CPU schedules work to the GPU
 Communication overhead (report results  next kernel grid size)
18
Slide from “Introduction to Dynamic Parallelism”, Stephen Jones, NVIDIA Corporation
hQ
HSA QUEUING MODEL
 User mode queuing for low latency dispatch
 Application dispatches directly
 No OS or driver in the dispatch path
 Architected Queuing Layer
 Single compute dispatch path for all hardware
 No driver translation, direct to hardware
 Allows for dispatch to queue from any agent
 CPU or GPU
 GPU can spawn its
own work
19
Picture from AMD Blog:
hQ: From Master/Slave to Masterpiece
ARCHITECTED QUEUEING LANGUAGE
 HSA Queues look just like standard
shared memory queues, supporting
multi-producer, single-consumer
 Support is allowed for single-producer,
single-consumer
 Queues consist of storage, read/write
indices, ID, etc.
 Queues are created/destroyed via calls
to the HSA runtime
 “Packets” are placed in queues directly
from user mode, via an architected
protocol
 Packet format is architected
20
Producer Producer
Consumer
Read Index
Write Index
Storage in
coherent, shared
memory
Packets
 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
HSAIL
© Copyright 2012 HSA Foundation. All Rights Reserved. 21
WHAT IS HSAIL?
 HSAIL is the intermediate language for parallel compute in HSA
 Generated by a high level compiler (LLVM, gcc, Java VM, etc)
 Low-level IR, close to machine ISA level
 Compiled down to target ISA by an IHV “Finalizer”
 Finalizer may execute at run time, install time, or build time
 Example: OpenCL™ Compilation Stack using HSAIL
22
OpenCL™ Kernel
High-Level Compiler Flow (Developer)
Finalizer Flow (Runtime)
EDG or CLANG
SPIR
LLVM
HSAIL HSAIL
Finalizer
Hardware ISA
Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
HSAIL INSTRUCTION SET HIGHLIGHTS
 “SIMT” – Single Instruction Multiple Data
 ISA is Scalar, describes one serial thread – Parallelism is done by HW
 RISC-Like
 Load-store architecture
 136 opcodes
 Fixed number of Registers
 1 Control
 Pool of 512 bytes
 Single
 Double
 Quad
 7 segments of memory
 global, read only, group, spill, private, arg, kernarg
23
01: version 0:95: $full : $large;
02: // static method HotSpotMethod<Main.lambda$2(Player)>
03: kernel &run (
04: kernarg_u64 %_arg0 // Kernel signature for lambda method
05: ) {
06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register
07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord
08:
09: cvt_u64_s32 $d2, $s2; // Convert X gid to long
10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref
11: add_u64 $d2, $d2, 24; // Adjust for actual elements start
12: add_u64 $d2, $d2, $d6; // Add to array ref ptr
13: ld_global_u64 $d6, [$d2]; // Load from array element into reg
14: @L0:
15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam()
16: mov_b64 $d3, $d0;
17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores ()
18: cvt_f32_s32 $s16, $s3;
19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores()
20: cvt_f32_s32 $s17, $s0;
21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores
22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores()
23: ret;
24: };
HSA SOFTWARE
© Copyright 2012 HSA Foundation. All Rights Reserved. 24
HIGH-LEVEL SOFTWARE STACK
 Programming Languages
 OpenCL 2.0
 C++ AMP
 Java (Aparapi/Sumatra)
 HSA Runtime (User Mode Driver)
 System Query
 Access to JIT Compilers
 Access to Queues
 JIT Compilers
 Offline or online (JIT)
 LLVM Compiler (LLVM  HSAIL)
 HSAIL Finalizer (HSAIL  BIN)
 Kernel Mode Driver
25
http://www.hsafoundation.com/hsa-developer-tools/
HSA OPEN SOURCE SOFTWARE
 HSA will feature an open source linux execution and compilation stack
 Allows a single shared implementation for many components
 Enables university research and collaboration in all areas
 Because it’s the right thing to do
26
Component Name IHV or Common Rationale
HSA Bolt Library Common Enable understanding and debug
HSAIL Code Generator Common Enable research
LLVM Contributions Common Industry and academic collaboration
HSAIL Assembler Common Enable understanding and debug
HSA Runtime Common Standardize on a single runtime
HSA Finalizer IHV Enable research and debug
HSA Kernel Driver IHV For inclusion in linux distros
Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
JAVA HETEROGENEOUS
ENABLEMENT ROADMAP
CPU ISA GPU ISA
JVM
Application
APARAPI
GPUCPU
OpenCL™
27
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
HSA Runtime
LLVM Optimizer
IR
CPU ISA GPU ISA
Sumatra Enabled JVM
Application
HSA CPUHSA CPU
HSA Finalizer
HSAIL
Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
HSA Challenges
(My Take)
© Copyright 2012 HSA Foundation. All Rights Reserved. 28
HSA CHALLENGES –
VENDOR SUPPORT
29
Founders
Promoters
Supporters
Contributors
Academic
Slide Taken from Phil Rogers HSA Overview, HotChips 2013
Missing some key players:
Intel, NVIDIA, Apple, Microsoft, Google, …
HSA CHALLENGES –
LANGUAGES SUPPORT
 HSAIL (or LLVM) is not an attractive level to code at…
 Leverage existing parallel languages/paradigms to exploit HSA features:
 C++ AMP
 OpenCL 2.0 (done!)
 OpenMP
 Add your favorite …
 Extend popular languages to exploit HSA:
 Scripting languages: Python
 Web languages : HTML5, RoR, Javascript, …
 DSL languages
30
HSA CHALLENGES –
SECURITY
 HSA design had some security measures in mind:
 Accelerator supports privilege level, with user and privileged memory
 Execute, Read and Write are protected by page table entries
 Support in fixed time context sceduling (DoS protection)
 But:
 Advanced features such as hUMA & uQ are potential back door
 OS & Security Apps currently do not monitor the accelerators
 Monitoring may require OS changes
 Detailed specification can be used to find attack vectors
 Some accelerators architecture may introduce a security flaw
 Example: local memory on GPU
31
HSA Availability
© Copyright 2012 HSA Foundation. All Rights Reserved. 32
HSA AVAILABILITY
 AMD released “Kaveri”, the first SoC which is HSA-able
 HW supports HUMA, hQ, etc.
 HSA software stack is not publicly available yet (expected this year)
© Copyright 2012 HSA Foundation. All Rights Reserved. 33
http://www.tomshardware.com/reviews/a8
8x-socket-fm2-motherboard,3764.html
HSA AVAILABILITY
Simulators:
 HSAEMU – A full system emulator for HSA platforms
 Work done by System SW Lab at NTHU (National Tsing Hua University)
 http://hsaemu.org/
 Code available on GitHub - https://github.com/SSLAB-HSA/HSAemu
 HSAIL Simulator
 Code available on GitHub - https://github.com/HSAFoundation/HSAIL-Instruction-
Set-Simulator
34
THANK YOU
35
BACKUPS
36
REFERENCES
• HSA Foundation:
• http://hsafoundation.com/
• HSA whitepaper
• http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf
• hUMA
• http://www.slideshare.net/AMD/amd-heterogeneous-uniform-memory-access
• http://www.pcper.com/reviews/Processors/AMD-Details-hUMA-HSA-Action
• http://www.bit-tech.net/news/hardware/2013/04/30/amd-huma-heterogeneous-unified-memory-acces/
• http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx#3
• ANANDTECH Hawaii architecture
• http://www.anandtech.com/show/7457/the-radeon-r9-290x-review/3
• hQ
• http://community.amd.com/community/amd-blogs/amd-business/blog/2013/10/21/hq-from-masterslave-to-masterpiece
• http://on-demand.gputechconf.com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model.pdf
• HSA purpose analysis by Moor
• http://developer.amd.com/apu/wordpress/wp-content/uploads/2012/01/HSAF-Purpose-and-Outlook-by-Moor-Insights-Strategy.pdf
• IOMMUv2 spec
• http://developer.amd.com/wordpress/media/2012/10/48882.pdf
© Copyright 2012 HSA Foundation. All Rights Reserved. 37
hUMA & Discrete GPUs
 hUMA can be extended beyond SoC, if the proper HW exists
(such as Hawaii GPU…)
38
Slide from “IOMMUv2: the Ins and Outs of Heterogeneous GPU use”,
AFDS 2012
HSAIL AND SPIR
39
Feature HSAIL SPIR
Intended Users
Compiler developers who want to
control their own code generation.
Compiler developers who want a fast
path to acceleration across a wide
variety of devices.
IR Level
Low-level, just above the machine
instruction set High-level, just below LLVM-IR
Back-end code generation Thin, fast, robust.
Flexible. Can include many
optimizations and compiler
transformation including register
allocation.
Where are compiler
optimizations performed?
Most done in high-level compiler,
before HSAIL generation.
Most done in back-end code generator,
between SPIR and device machine
instruction set
Registers Fixed-size register pool Infinite
SSA Form No Yes
Binary format Yes Yes
Code generator for LLVM Yes Yes
Back-end device targets
Modern GPU architectures
supported by members of the HSA
Foundation
Any OpenCL device including GPUs,
CPUs, FPGAs
Memory Model
Relaxed consistency with
acquire/release, barriers, and fine-
grained barriers
Flexible. Can support the OpenCL 1.2
Memory Model
Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
Hardware - APUs, CPUs, GPUs
Driver Stack
Domain Libraries
OpenCL™, DX Runtimes,
User Mode Drivers
Graphics Kernel Mode Driver
Apps
Apps
Apps
Apps
Apps
Apps
HSA Software Stack
Task Queuing
Libraries
HSA Domain Libraries,
OpenCL ™ 2.x Runtime
HSA Kernel
Mode Driver
HSA Runtime
HSA JIT
Apps
Apps
Apps
Apps
Apps
Apps
User mode component Kernel mode component Components contributed by third parties
HSA SOFTWARE STACK
40
OPENCL™ AND HSA
 HSA is an optimized platform architecture
for OpenCL™
 Not an alternative to OpenCL™
 OpenCL™ on HSA will benefit from
 Avoidance of wasteful copies
 Low latency dispatch
 Improved memory model
 Pointers shared between CPU and GPU
 OpenCL™ 2.0 shows considerable alignment
with HSA
 Many HSA member companies are also active
with Khronos in the OpenCL™ working group
41Slide Taken from Phil Rogers HSA Overview, HotChips 2013
BOLT — PARALLEL PRIMITIVES
LIBRARY FOR HSA
 Easily leverage the inherent power efficiency of GPU computing
 Common routines such as scan, sort, reduce, transform
 More advanced routines like heterogeneous pipelines
 Bolt library works with OpenCL and C++ AMP
 Enjoy the unique advantages of the HSA platform
 Move the computation not the data
 Finally a single source code base for the CPU and GPU!
 Developers can focus on core algorithms
 Bolt version 1.0 for OpenCL and C++ AMP is available now at
https://github.com/HSA-Libraries/Bolt
42Slide Taken from Phil Rogers HSA Overview, HotChips 2013
HSA OPEN SOURCE SOFTWARE
 HSA will feature an open source linux execution and compilation stack
 Allows a single shared implementation for many components
 Enables university research and collaboration in all areas
 Because it’s the right thing to do
43
Component Name IHV or Common Rationale
HSA Bolt Library Common Enable understanding and debug
HSAIL Code Generator Common Enable research
LLVM Contributions Common Industry and academic collaboration
HSAIL Assembler Common Enable understanding and debug
HSA Runtime Common Standardize on a single runtime
HSA Finalizer IHV Enable research and debug
HSA Kernel Driver IHV For inclusion in linux distros
LINES-OF-CODE AND PERFORMANCE FOR
DIFFERENT PROGRAMMING MODELS
AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM.
Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta
0
50
100
150
200
250
300
350
LOC
Copy-back Algorithm Launch Copy Compile Init Performance
Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt
Performance
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0Copy-back
Algorithm
Launch
Copy
Compile
Init.
Copy-back
Algorithm
Launch
Copy
Compile
Copy-back
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
(Exemplary ISV “Hessian” Kernel)
44
AMD’S FIRST HSA SOC
© Copyright 2012 HSA Foundation. All Rights Reserved. 45

Contenu connexe

Tendances

HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...AMD Developer Central
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 

Tendances (20)

HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 

Similaire à HSA Introduction

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
AMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvineAMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvinePankaj Singh
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performancebrettallison
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptxmohaaalsa
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 

Similaire à HSA Introduction (20)

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
AMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvineAMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_Irvine
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptx
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 

Plus de Ofer Rosenberg

Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Introduction To GPUs 2012
Introduction To GPUs 2012Introduction To GPUs 2012
Introduction To GPUs 2012Ofer Rosenberg
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
Open CL For Speedup Workshop
Open CL For Speedup WorkshopOpen CL For Speedup Workshop
Open CL For Speedup WorkshopOfer Rosenberg
 

Plus de Ofer Rosenberg (9)

GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
The GPGPU Continuum
The GPGPU ContinuumThe GPGPU Continuum
The GPGPU Continuum
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
From fermi to kepler
From fermi to keplerFrom fermi to kepler
From fermi to kepler
 
Introduction To GPUs 2012
Introduction To GPUs 2012Introduction To GPUs 2012
Introduction To GPUs 2012
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Open CL For Speedup Workshop
Open CL For Speedup WorkshopOpen CL For Speedup Workshop
Open CL For Speedup Workshop
 

Dernier

Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 

Dernier (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 

HSA Introduction

  • 2. DISCLAIMER: This presentation is not an Official HSA Foundation presentation. Most of the Material is taken from HSA HotChips 2013 Some slides contains my insights / Opinions
  • 3. CONTENT  Introduction  hUMA  hQ  HSAIL  HSA Software  HSA Challenges  HSA Availability
  • 5. HISTORIC PERSPECTIVE Accelerated System  Program runs on CPU  API to access Accelerators  ASIC or Firmware  Configurable, but operation is fixed Heterogeneous System  Program runs on CPU  Offloads work on Accelerators  GPU, DSP, etc.  Offloaded work is JITed (compiled at runtime) 5 Distributed SoC based
  • 6. HSA FOUNDATION  Originated from AMD’s FSA – Fusion System Architecture  HSA Foundation Founded in June 2012 6
  • 8. WHAT IS HSA ALL ABOUT ? (MY TAKE)  “Bring Accelerators forward as a first class processor”  Unified address space, pageable memory, coherency  Eliminate drivers from dispatch path (user mode queues)  Standardized SW stack built on top of a set of HW requirements  Improve interoperability between IP vendors  Unified Architecture for Accelerators  Start from GPU, extend to DSP / FPGA / Fixed-Function Acc , etc.  SoC Centric  Major features are optimal for SoC environment (same memory/die)  Support of distributed system is possible, yet inefficient (PCI atomics, others) 8Slide Taken from Phil Rogers HSA Overview, HotChips 2013
  • 9. HSA WORKING GROUPS  HSA Systems Architecture  hUMA – Unified Memory Model  hQ – HSA Queuing Model  HSA Programmer Reference Specification  HSAIL – HSA Intermediate Language  HSA System Runtime  HSA Compliance  HSA Tools 9http://hsafoundation.com/standards/
  • 10. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  OpenCL™ 2.0 shows considerable alignment with HSA  Many HSA member companies are also active with Khronos in the OpenCL™ working group 10Slide Taken from Phil Rogers HSA Overview, HotChips 2013
  • 11. hUMA © Copyright 2012 HSA Foundation. All Rights Reserved. 11
  • 12. hUMA HSA Unified Memory Architecture Evolution of CPU / GPU memory systems: 1. CPU uses Virtual Addresses, GPU uses Physical Addresses  Memory had to be pinned  GPU can access a limited area in the CPU memory (Aperture)  Requires copy from system memory to GPU-visible memory  Pointer-based data structures can’t be shared 2. CPU uses Virtual Addresses, GPU uses Virtual Addresses (but not the same)  Memory still had to be pinned 12  GPU can access the entire system memory  Copy is not required  Pointer-based data structures still can’t be shared 3. hUMA
  • 13. hUMA HSA Unified Memory Architecture  Shared Virtual Memory  CPU & GPU see the same addresses  Pageable Memory  GPU can (somehow) initiate a page fault  Cache coherency 13
  • 14. SHARED VIRTUAL MEMORY  Advantages  No mapping tricks, no copying back-and-forth between different PA addresses  Send pointers (not data) back and forth between HSA agents.  Note the Hardware Implications …  Common Page Tables (and common interpretation of architectural semantics such as shareability, protection, etc).  Common mechanisms for address translation (and servicing address translation faults)  Concept of a process address space (PASID) to allow multiple, per process virtual address spaces within the system. 14 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
  • 15. CACHE COHERENCY DOMAINS  Advantages  Composability  Reduced SW complexity when communicating between agents  Lower barrier to entry when porting software  Note the Hardware Implications …  Hardware coherency support between all HSA agents  Can take many forms  Stand alone Snoop Filters / Directories  Combined L3/Filters  Snoop-based systems (no filter)  Etc … 15 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
  • 16. hQ © Copyright 2012 HSA Foundation. All Rights Reserved. 16
  • 17. hQ Motivation 1. GPU Dispatch has a lot of overhead  SW/Driver stack overhead  User mode to Kernel mode switch 17
  • 18. hQ Motivation 2. Master/Slave pattern is limiting (and has a lot of overhead)  CPU schedules work to the GPU  Communication overhead (report results  next kernel grid size) 18 Slide from “Introduction to Dynamic Parallelism”, Stephen Jones, NVIDIA Corporation
  • 19. hQ HSA QUEUING MODEL  User mode queuing for low latency dispatch  Application dispatches directly  No OS or driver in the dispatch path  Architected Queuing Layer  Single compute dispatch path for all hardware  No driver translation, direct to hardware  Allows for dispatch to queue from any agent  CPU or GPU  GPU can spawn its own work 19 Picture from AMD Blog: hQ: From Master/Slave to Masterpiece
  • 20. ARCHITECTED QUEUEING LANGUAGE  HSA Queues look just like standard shared memory queues, supporting multi-producer, single-consumer  Support is allowed for single-producer, single-consumer  Queues consist of storage, read/write indices, ID, etc.  Queues are created/destroyed via calls to the HSA runtime  “Packets” are placed in queues directly from user mode, via an architected protocol  Packet format is architected 20 Producer Producer Consumer Read Index Write Index Storage in coherent, shared memory Packets  Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
  • 21. HSAIL © Copyright 2012 HSA Foundation. All Rights Reserved. 21
  • 22. WHAT IS HSAIL?  HSAIL is the intermediate language for parallel compute in HSA  Generated by a high level compiler (LLVM, gcc, Java VM, etc)  Low-level IR, close to machine ISA level  Compiled down to target ISA by an IHV “Finalizer”  Finalizer may execute at run time, install time, or build time  Example: OpenCL™ Compilation Stack using HSAIL 22 OpenCL™ Kernel High-Level Compiler Flow (Developer) Finalizer Flow (Runtime) EDG or CLANG SPIR LLVM HSAIL HSAIL Finalizer Hardware ISA Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
  • 23. HSAIL INSTRUCTION SET HIGHLIGHTS  “SIMT” – Single Instruction Multiple Data  ISA is Scalar, describes one serial thread – Parallelism is done by HW  RISC-Like  Load-store architecture  136 opcodes  Fixed number of Registers  1 Control  Pool of 512 bytes  Single  Double  Quad  7 segments of memory  global, read only, group, spill, private, arg, kernarg 23 01: version 0:95: $full : $large; 02: // static method HotSpotMethod<Main.lambda$2(Player)> 03: kernel &run ( 04: kernarg_u64 %_arg0 // Kernel signature for lambda method 05: ) { 06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register 07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord 08: 09: cvt_u64_s32 $d2, $s2; // Convert X gid to long 10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref 11: add_u64 $d2, $d2, 24; // Adjust for actual elements start 12: add_u64 $d2, $d2, $d6; // Add to array ref ptr 13: ld_global_u64 $d6, [$d2]; // Load from array element into reg 14: @L0: 15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam() 16: mov_b64 $d3, $d0; 17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores () 18: cvt_f32_s32 $s16, $s3; 19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores() 20: cvt_f32_s32 $s17, $s0; 21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores 22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores() 23: ret; 24: };
  • 24. HSA SOFTWARE © Copyright 2012 HSA Foundation. All Rights Reserved. 24
  • 25. HIGH-LEVEL SOFTWARE STACK  Programming Languages  OpenCL 2.0  C++ AMP  Java (Aparapi/Sumatra)  HSA Runtime (User Mode Driver)  System Query  Access to JIT Compilers  Access to Queues  JIT Compilers  Offline or online (JIT)  LLVM Compiler (LLVM  HSAIL)  HSAIL Finalizer (HSAIL  BIN)  Kernel Mode Driver 25 http://www.hsafoundation.com/hsa-developer-tools/
  • 26. HSA OPEN SOURCE SOFTWARE  HSA will feature an open source linux execution and compilation stack  Allows a single shared implementation for many components  Enables university research and collaboration in all areas  Because it’s the right thing to do 26 Component Name IHV or Common Rationale HSA Bolt Library Common Enable understanding and debug HSAIL Code Generator Common Enable research LLVM Contributions Common Industry and academic collaboration HSAIL Assembler Common Enable understanding and debug HSA Runtime Common Standardize on a single runtime HSA Finalizer IHV Enable research and debug HSA Kernel Driver IHV For inclusion in linux distros Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
  • 27. JAVA HETEROGENEOUS ENABLEMENT ROADMAP CPU ISA GPU ISA JVM Application APARAPI GPUCPU OpenCL™ 27 CPU ISA GPU ISA JVM Application APARAPI HSA CPUHSA CPU HSA Finalizer HSAIL CPU ISA GPU ISA JVM Application APARAPI HSA CPUHSA CPU HSA Finalizer HSAIL HSA Runtime LLVM Optimizer IR CPU ISA GPU ISA Sumatra Enabled JVM Application HSA CPUHSA CPU HSA Finalizer HSAIL Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
  • 28. HSA Challenges (My Take) © Copyright 2012 HSA Foundation. All Rights Reserved. 28
  • 29. HSA CHALLENGES – VENDOR SUPPORT 29 Founders Promoters Supporters Contributors Academic Slide Taken from Phil Rogers HSA Overview, HotChips 2013 Missing some key players: Intel, NVIDIA, Apple, Microsoft, Google, …
  • 30. HSA CHALLENGES – LANGUAGES SUPPORT  HSAIL (or LLVM) is not an attractive level to code at…  Leverage existing parallel languages/paradigms to exploit HSA features:  C++ AMP  OpenCL 2.0 (done!)  OpenMP  Add your favorite …  Extend popular languages to exploit HSA:  Scripting languages: Python  Web languages : HTML5, RoR, Javascript, …  DSL languages 30
  • 31. HSA CHALLENGES – SECURITY  HSA design had some security measures in mind:  Accelerator supports privilege level, with user and privileged memory  Execute, Read and Write are protected by page table entries  Support in fixed time context sceduling (DoS protection)  But:  Advanced features such as hUMA & uQ are potential back door  OS & Security Apps currently do not monitor the accelerators  Monitoring may require OS changes  Detailed specification can be used to find attack vectors  Some accelerators architecture may introduce a security flaw  Example: local memory on GPU 31
  • 32. HSA Availability © Copyright 2012 HSA Foundation. All Rights Reserved. 32
  • 33. HSA AVAILABILITY  AMD released “Kaveri”, the first SoC which is HSA-able  HW supports HUMA, hQ, etc.  HSA software stack is not publicly available yet (expected this year) © Copyright 2012 HSA Foundation. All Rights Reserved. 33 http://www.tomshardware.com/reviews/a8 8x-socket-fm2-motherboard,3764.html
  • 34. HSA AVAILABILITY Simulators:  HSAEMU – A full system emulator for HSA platforms  Work done by System SW Lab at NTHU (National Tsing Hua University)  http://hsaemu.org/  Code available on GitHub - https://github.com/SSLAB-HSA/HSAemu  HSAIL Simulator  Code available on GitHub - https://github.com/HSAFoundation/HSAIL-Instruction- Set-Simulator 34
  • 37. REFERENCES • HSA Foundation: • http://hsafoundation.com/ • HSA whitepaper • http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf • hUMA • http://www.slideshare.net/AMD/amd-heterogeneous-uniform-memory-access • http://www.pcper.com/reviews/Processors/AMD-Details-hUMA-HSA-Action • http://www.bit-tech.net/news/hardware/2013/04/30/amd-huma-heterogeneous-unified-memory-acces/ • http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx#3 • ANANDTECH Hawaii architecture • http://www.anandtech.com/show/7457/the-radeon-r9-290x-review/3 • hQ • http://community.amd.com/community/amd-blogs/amd-business/blog/2013/10/21/hq-from-masterslave-to-masterpiece • http://on-demand.gputechconf.com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model.pdf • HSA purpose analysis by Moor • http://developer.amd.com/apu/wordpress/wp-content/uploads/2012/01/HSAF-Purpose-and-Outlook-by-Moor-Insights-Strategy.pdf • IOMMUv2 spec • http://developer.amd.com/wordpress/media/2012/10/48882.pdf © Copyright 2012 HSA Foundation. All Rights Reserved. 37
  • 38. hUMA & Discrete GPUs  hUMA can be extended beyond SoC, if the proper HW exists (such as Hawaii GPU…) 38 Slide from “IOMMUv2: the Ins and Outs of Heterogeneous GPU use”, AFDS 2012
  • 39. HSAIL AND SPIR 39 Feature HSAIL SPIR Intended Users Compiler developers who want to control their own code generation. Compiler developers who want a fast path to acceleration across a wide variety of devices. IR Level Low-level, just above the machine instruction set High-level, just below LLVM-IR Back-end code generation Thin, fast, robust. Flexible. Can include many optimizations and compiler transformation including register allocation. Where are compiler optimizations performed? Most done in high-level compiler, before HSAIL generation. Most done in back-end code generator, between SPIR and device machine instruction set Registers Fixed-size register pool Infinite SSA Form No Yes Binary format Yes Yes Code generator for LLVM Yes Yes Back-end device targets Modern GPU architectures supported by members of the HSA Foundation Any OpenCL device including GPUs, CPUs, FPGAs Memory Model Relaxed consistency with acquire/release, barriers, and fine- grained barriers Flexible. Can support the OpenCL 1.2 Memory Model Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
  • 40. Hardware - APUs, CPUs, GPUs Driver Stack Domain Libraries OpenCL™, DX Runtimes, User Mode Drivers Graphics Kernel Mode Driver Apps Apps Apps Apps Apps Apps HSA Software Stack Task Queuing Libraries HSA Domain Libraries, OpenCL ™ 2.x Runtime HSA Kernel Mode Driver HSA Runtime HSA JIT Apps Apps Apps Apps Apps Apps User mode component Kernel mode component Components contributed by third parties HSA SOFTWARE STACK 40
  • 41. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  OpenCL™ 2.0 shows considerable alignment with HSA  Many HSA member companies are also active with Khronos in the OpenCL™ working group 41Slide Taken from Phil Rogers HSA Overview, HotChips 2013
  • 42. BOLT — PARALLEL PRIMITIVES LIBRARY FOR HSA  Easily leverage the inherent power efficiency of GPU computing  Common routines such as scan, sort, reduce, transform  More advanced routines like heterogeneous pipelines  Bolt library works with OpenCL and C++ AMP  Enjoy the unique advantages of the HSA platform  Move the computation not the data  Finally a single source code base for the CPU and GPU!  Developers can focus on core algorithms  Bolt version 1.0 for OpenCL and C++ AMP is available now at https://github.com/HSA-Libraries/Bolt 42Slide Taken from Phil Rogers HSA Overview, HotChips 2013
  • 43. HSA OPEN SOURCE SOFTWARE  HSA will feature an open source linux execution and compilation stack  Allows a single shared implementation for many components  Enables university research and collaboration in all areas  Because it’s the right thing to do 43 Component Name IHV or Common Rationale HSA Bolt Library Common Enable understanding and debug HSAIL Code Generator Common Enable research LLVM Contributions Common Industry and academic collaboration HSAIL Assembler Common Enable understanding and debug HSA Runtime Common Standardize on a single runtime HSA Finalizer IHV Enable research and debug HSA Kernel Driver IHV For inclusion in linux distros
  • 44. LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM. Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta 0 50 100 150 200 250 300 350 LOC Copy-back Algorithm Launch Copy Compile Init Performance Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt Performance 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0Copy-back Algorithm Launch Copy Compile Init. Copy-back Algorithm Launch Copy Compile Copy-back Algorithm Launch Algorithm Launch Algorithm Launch Algorithm Launch Algorithm Launch (Exemplary ISV “Hessian” Kernel) 44
  • 45. AMD’S FIRST HSA SOC © Copyright 2012 HSA Foundation. All Rights Reserved. 45