SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
CLICK TO EDIT MASTER TITLE STYLE 
 Click to edit Master text styles 
‒ Second level 
‒ Third level 
‒ Fourth level 
‒ Fifth level THE HETEROGENEOUS SYSTEM ARCHITECTURE 
ITS (NOT) ALL ABOUT THE GPU 
PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD 
SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION 
1 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THREE ERAS OF PROCESSOR PERFORMANCE 
Single-Core 
Era 
Multi-Core 
Era 
 Click to edit Master text styles 
Enabled by: 
‒ Second level 
 Moore’s Observation 
 Voltage Scaling 
 Micro-Architecture 
Constrained by: 
‒ Third level 
‒ Fourth level 
Enabled by: 
 Moore’s Observation 
 Desire for Throughput 
 20 years of SMP arch 
Constrained by: 
‒ Fifth level 
Single-thread Performance 
? 
o 
we are 
here 
Power 
Complexity 
Time 
Throughput Performance 
Power 
Parallel SW availability 
Scalability 
Time 
o 
we are 
here 
(# of Processors) 
2 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
Heterogeneous 
Systems Era 
Enabled by: 
Constrained by: 
Targeted Application 
 Moore’s Observation 
 Abundant data parallelism 
 Power efficient data parallel 
Performance 
processing (GPUs) 
Programming models 
Communication overheads 
we are 
here 
Time 
(Data-parallel exploitation) 
o
CLICK TO EDIT MASTER TITLE STYLE 
THE PROGRESS OF FLOATING POINT PERFORMANCE 
 Click to edit Master text styles 
‒ Second level 
‒ Third level 
‒ Fourth level 
‒ Fifth level 
3 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
WHY THE DIFFERENCE? ITS ABOUT THE PURPOSE 
 CPUs: built for general purpose, serial instruction threads, often high data 
locality, lot’s of conditional execution and dealing with data interdependency 
 Click to edit Master text styles 
‒ CPU cache hierarchy is focused on general purpose data access from/to execution units, feeding back 
‒ Second level 
previously computed data to the execution units with very low latency 
‒ Third level 
‒ Comparatively few registers (vs GPUs), but large caches keep often used “arbitrary” data close to the 
execution ‒ units 
Fourth level 
‒ Fifth level 
‒ Execution model is synchronous 
 GPUs: purpose-built for highly data parallel, latency tolerant workloads 
‒ Apply the same sequence of instructions over and over on data with little variation but high throughput 
(“streaming data”), passing the data from one processing stage to another (latency tolerance) 
‒ Compute units have a relatively large register file store 
‒ Using a lot of “specialty caches” (constant cache, Texture Cache, etc), data caches optimized for SW data 
prefetch 
‒ separate memory spaces (e.g. GPU local memory) 
‒ Execution model is asynchronous 
4 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THE HSA (= HETEROGENEOUS SYSTEM ARCHITECTURE) GOALS 
 To bring accelerators forward as a first class processor within the system 
 Click to edit Master text styles 
‒ Unified process address space access across all processors 
‒ Second level 
‒ Operates in pageable system memory 
‒ Third level 
‒ Full memory coherency between the CPU and HSA components 
‒ Fourth level 
‒ Well-defined relaxed consistency memory model suited for many high level languages 
‒ Fifth level 
‒ User mode dispatch/scheduling (eliminates “drivers” from the dispatch path) 
‒ QoS through pre-emption and context switching* 
 It is not dictating a specific platform or component microarchitecture! 
‒ It defines the “what needs to be supported”, not the “how it needs to be supported” 
‒ Focus is on providing a robust data parallel application execution infrastructure for “regular languages” 
5 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THE GOALS OF HSA, PART 2 
 To attract mainstream programmers 
 Click to edit Master text styles 
‒ By making it easier writing data parallel code 
‒ Second level 
‒ Support broader set of languages beyond traditional GPGPU Languages 
‒ Third level 
‒ Support for Task Parallel & Nested Data Parallel Runtimes 
‒ Fourth level 
‒ Rich debugging and performance analysis support 
‒ Fifth level 
 Create a platform architecture accessible for all accelerator types, not 
only GPU 
‒ Focused on the APU/SOC and compute, but not preventing other accelerator types to 
participate 
 But one can’t do it alone… 
6 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
WHAT IS THE HSA FOUNDATION? 
 This is the short version… 
 Click to edit Master text styles 
“The HSA Foundation is a not-for-profit consortium of SOC and SOC IP 
vendors, OEMs, academia, OSVs and ISVs defining a consistent 
‒ Second level 
heterogeneous platform architecture to make it dramatically easier to 
‒ Third level 
program heterogeneous parallel devices” 
‒ Fourth level 
‒ Fifth level 
7 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
Feb 2014 
 It spans multiple host platform architectures and programmable data 
parallel components collaborating within the same HSA system architecture 
- CPU: x86, ARM, MIPS, … 
- Accelerator types: GPUs, DSPs, … 
 Specifications define HW & SW platform requirements for applications to 
target the feature set from high level languages and APIs 
 The System Architecture specification defines the required component and 
platform features for HSA compliant components 
 Other HSA specifications leverage these properties for the SW definition
CLICK TO EDIT MASTER TITLE STYLE 
THE KEY DELIVERABLES OF THE HSA FOUNDATION 
 IP, Specifications and API’s 
 The primary specifications are 
 Click to edit Master text styles 
‒ The HSA Platform System Architecture Specification 
‒ Second level 
‒ Focus on hardware requirements and low level system software 
Support ‒ Third level 
‒ “Small Model” (32bit) and “Large Model” ( 64bit) 
‒ The HSA Programmer Reference Manual 
‒ Fourth level 
‒ Definition of an HSAIL Virtual ISA 
‒ Fifth level 
‒ Binary format (BRIG) 
‒ Compiler writers guide and Libraries developer guide 
‒ The HSA System Runtime Specification 
‒ HSA Tools 
‒ LLVM to HSAIL Compiler 
‒ HSAIL Assembler 
 HSA Conformance Suite 
 Open source community and “reference” implementation of SW stack (Linux) by AMD 
‒ To “jump start” the ecosystem 
‒ Allow a single shared implementation where appropriate 
‒ Enable university research in all areas 
8 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
Platform 
(Software) 
System 
Architecture 
Specification 
Programmer’s 
Reference 
Manual 
System 
Runtime 
Specification 
Tools Conformance
OPENCL™ WITH HSA, NOT OPENCL™ VS HSA! 
CLICK TO EDIT MASTER TITLE STYLE 
 HSA is an optimized platform architecture, which will run OpenCL™ very well 
 Click to edit Master text styles 
‒ It is a complementary standard, not a competitor to OpenCL™ 
‒ It is focused on the hardware and system platform runtime definition more than an API itself 
‒ Second level 
‒ It is ready to support many more languages than C/C++, including managed code languages 
‒ Third level 
‒ Fourth level 
 OpenCL™ on HSA benefits from a rich and consistent platform infrastructure 
‒ Fifth level 
‒ Pointers shared between CPU and GPU (Shared Virtual Memory), Avoidance of wasteful copies 
‒ Low latency dispatch 
‒ Improved and consistent memory model 
‒ Virtual function calls 
‒ Flexible control flow 
‒ Exception generation and handling 
‒ Device and platform atomics 
9 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THE SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION 
 The membership spans a wide variety of IP and platform 
architecture owners 
 Click to edit Master text styles 
‒ Several host platform architectures (x86, ARM, MIPS, …) 
‒ Second level 
‒ Third level 
 The specifications define a common set of platform properties 
‒ Fourth level 
‒ provide a dependable hardware and system foundation for application software, 
libraries and ‒ runtimes 
Fifth level 
‒ Eliminate “weak points” in the system software- and hardware architecture of 
traditional platforms, causing overhead processing data parallel workloads 
‒ Focusing on a few key concepts (queues, signals, memory) optimized and used 
throughout the architecture 
10 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THE SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION 
 The main deliverables are: 
‒ Well-defined, consistent and dependable memory model all HSA agents operate in 
 Click to edit Master text styles 
‒ Share access to process virtual memory between HSA agents (“ptr-is-ptr”) 
‒ Second level 
‒ Low-latency workload dispatch, contained in user-mode queues 
‒ Third level 
‒ Scalability across a wide range of platforms, from smartphones to clients and servers 
‒ Fourth level 
‒ These properties are ‒ Fifth level 
leveraged in the “HSA Programmer’s Reference”, HSAIL and HSA 
Runtime specifications 
11 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
DESIGN CRITERIA FOR THE HSA INFRASTRUCTURE 
CLICK TO EDIT MASTER TITLE STYLE 
 HSA defines platform & HW requirements Software can depend on 
‒ Comprehensive Memory model (well-defined visibility and ordering rules for transactions) 
 Click to edit Master text styles 
‒ Shared Virtual Memory (same page table mapping, HW access enforcement) 
‒ Second level 
‒ Page fault capability for every HSA agent 
‒ Third level 
‒ Cache Coherency Domains 
‒ Fourth level 
‒ Architected, memory-based signaling and synchronization 
‒ Fifth level 
‒ User Mode Queues, Architected Queuing Language (AQL), Service Queues for runtime callbacks 
‒ Preemption / Quality of Service* 
‒ Error Reporting 
‒ Hardware Debug and profiling support requirements 
‒ Architected Topology Discovery 
 Thin System SW Layer enables HW features for use by an application runtime 
‒ Mainly responsible for hw init, access enforcement and resource management 
‒ Provides a consistent, dependable feature set for application layer through SW primitives 
12 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
HSA SECURITY AND EXECUTION MODEL 
 HSA components operate in the same security infrastructure as the host CPU 
 Click to edit Master text styles 
‒ User and privileged memory distinction 
‒ Second level 
‒ Hardware enforced process space isolation 
‒ Third level 
‒ Page attributes (Read, write, execute) protections enforced by HW and apply as defined by system 
software 
‒ Fourth level 
‒ Fifth level 
 Internally, the platform partitions functionality by privilege level 
‒ User mode queues can only run AQL packets within the defined process context 
 HSA defines Quality of Service requirements 
‒ Requires support for mechanisms to schedule both HSA and non-HSA workloads for devices that 
support both task types with appropriate priority, latency, throughput and scheduling constraints. 
‒ Context Switch 
‒ Preempt 
‒ Terminate and Context Reset 
13 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
HSA COMMAND AND DISPATCH FLOW 
 Click to edit Master text styles 
‒ Second level 
‒ Third level 
Application 
‒ Fourth level 
‒ Fifth level 
C 
Application 
B 
Application 
A 
Optional Dispatch 
Buffer 
C 
C C 
C 
C 
Hardware Queue 
B 
B B 
Hardware Queue 
A 
A A 
14 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
Accelerator 
Hardware 
(GPU) 
Hardware Queue 
SW view: 
 User-mode dispatches to HW 
 No Kernel Driver overhead 
 Low dispatch times 
 CPU & GPU dispatch APIs 
HW view: 
 HW / microcode controlled 
 HW scheduling 
 Architected Queuing Language (AQL) 
 HW-managed protection
WHAT DEFINES HSA PLATFORMS AND COMPONENTS? 
CLICK TO EDIT MASTER TITLE STYLE 
 An HSA compatible platform consists of “HSA components” and “HSA agents” 
 Click to edit Master text styles 
‒ Both adhere to the same various system architecture requirements (memory model, etc) 
‒ Second level 
‒ HSA agents can submit AQL packets for execution, may be but are not required to be an HSA component (e.g. 
host CPU cores can be an HSA agent, but “step out” of the role) 
‒ Third level 
‒ HSA components can be a “target” of AQL, by implication they are also HSA agents (= generate AQL packets) 
‒ Fourth level 
‒ Fifth level 
 Defined in the “System Architecture” specification 
‒ The HSAIL and “Programmer’s Reference Manual” specifications define the software execution model 
‒ Architected mechanisms to enqueue and dispatch workloads from one HSA agent queue to another eliminate 
the need to use the host CPU for these purposes for a lot of scenarios 
‒ Architected infrastructure allows exchanging data with non-HSA compliant components in a platform 
‒ Fundamental data types are naturally aligned 
15 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
WHAT DEFINES HSA PLATFORMS AND COMPONENTS? 
 There are two different machine models (“small” and “large”) 
‒ target different functionality levels 
‒ It takes into account different feature requirements for different platform environments 
‒ In all cases, the same HSA application programming model is used to target HSA agents and provides the same 
power–efficient and low-latency dispatch mechanisms, synchronization primitives and SW programming model 
 Click to edit Master text styles 
‒ Second level 
‒ Third level 
‒ Fourth level 
 Applications written to target HSA small model machines will generally work on large 
‒ Fifth level 
model machines 
‒ If the large model platform and host Operating System provides a 32bit process environment 
Properties Small Machine Model Large Machine Model 
Platform targets embedded or personal device space (controllers, 
smartphones, etc.) 
16 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
PC, workstation, cloud Server, etc running more demanding workloads 
Native pointer size 32bit 64bit (+ 32bit ptr if 32bit processes are supported) 
Floating point size Half (FP16*), Single (FP32) precision Half (FP16*), Single (FP32), Double (FP64) precision 
Atomic ops size 32bit 32bit, 64bit 
*min. Load and store on memory
CLICK TO EDIT MASTER TITLE STYLE 
THE HSA MEMORY MODEL REQUIREMENTS 
 A memory model defines how writes by one work item or agent becomes 
 Click to edit Master text styles 
visible other work items and agents 
‒ Second level 
‒ Rules that need to be adhered to by compilers and application threads 
‒ Third level 
‒ It defines visibility and ordering rules of write and read events across work items, HSA agents and 
interactions with non-HSA components in the system 
‒ Fourth level 
‒ Important to define ‒ Fifth level 
scope for performance optimizations in the compiler and allow reordering in the 
Finalizer 
 HSA is defined to be compatible with C++11, Java and .NET Memory Models 
‒ Relaxed consistency memory model for parallel compute performance 
‒ Inherently maps to many CPU and device architectures 
‒ Efficient sequential consistency mechanisms supported to fit high-level language programming 
models 
‒ Scope visibility controlled by Load.Acquire / Store.Release & Barrier semantics 
17 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
THE HSA MEMORY MODEL REQUIREMENTS 
 A consistent, full set of atomic operations is available 
 Click to edit Master text styles 
‒ Naturally aligned on size, small machine model supports 32bit, large machine model supports 32bit and 64bit 
‒ Second level 
 Cache Coherency between HSA agents (& host CPU) is maintained by default 
‒ Third level 
‒ Key feature of the HSA system & platform environment 
‒ Fourth level 
‒ Defined transition points to non-HSA “data domains” (e.g. graphics, audio, …) 
‒ Fifth level 
18 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
THE HSA SIGNALING INFRASTRUCTURE 
CLICK TO EDIT MASTER TITLE STYLE 
 HSA supports hardware-assisted signaling and synchronization 
primitives 
 Click to edit Master text styles 
‒ ‒ Defines Second consistent, level 
memory based semantics to synchronize with work items processed by 
HSA agents 
‒ Third level 
‒ e.g. 32bit or 64bit value, content update, wait on value by HSA agents and AQL packets 
‒ Fourth level 
‒ Typically hardware-‒ Fifth level 
assisted, power-efficient & low-latency way to synchronize execution of 
work items between threads on HSA agents 
 Allows one-to-one and one-to-many signaling 
‒ The signaling semantics follow general atomicity requirements defined in the memory model 
‒ System Software, runtime & application SW can use this infrastructure to efficiently build 
higher-level synchronization primitives like mutexes, semaphores, … 
19 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
Source: wikimedia commons
CLICK TO EDIT MASTER TITLE STYLE 
WHAT IS HSAIL? 
 HSAIL is the intermediate language for parallel compute in HSA 
 Click to edit Master text styles 
‒ HSAIL is a low level instruction set designed for parallel compute in a shared virtual memory environment. 
‒ Second level 
‒ Generated by a high level compiler (LLVM, gcc, Java VM, etc) 
‒ Third level 
‒ Compiled down to GPU ISA or other parallel processor ISA by an IHV Finalizer 
‒ Fourth level 
‒ Finalizer may execute at run time, install time or build time, depending on platform type 
‒ Fifth level 
 HSAIL is SIMT in form and does not dictate hardware microarchitecture 
‒ HSAIL is designed for fast compile time, moving most optimizations to HL compiler 
‒ Limited register set avoids full register allocation in Finalizer 
‒ HSAIL is at the same level as PTX: an intermediate assembly or Virtual Machine Target 
‒ Represented as bit-code in in a Brig file format with support late binding of libraries. 
20 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
LOOKING BEYOND THE TRADITIONAL GPU LANGUAGES 
 Dynamic Languages are one of the most interesting areas for exploration of 
 Click to edit Master text styles 
heterogeneous parallel runtimes and have many applications 
‒ Second level 
 Dynamic Languages have different compilation foundations targeting HSA / HSAIL 
‒ Third level 
‒ Java/Scala, JavaScript, Dart, etc 
‒ Fourth level 
‒ Fifth level 
 See Project Sumatra - http://openjdk.java.net/projects/sumatra/ 
‒ Formal Project for GPU Acceleration for Java in OpenJDK 
 HSA allows efficient support for other standards based language environments 
‒ like OpenMP, Fortran, GO, Haskell 
‒ And domain specific languages like Halide, Julia, and many others 
21 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
LOOKING BEYOND THE DATA PARALLEL COMPUTE APPLICATION 
 The initial release of the HSA specifications focuses on data 
parallel compute language and application scenarios 
 Click to edit Master text styles 
‒ Focus is on integrating GPUs into the general software infrastructure 
‒ Second level 
‒ But the next generations of the specifications may apply to other domains 
‒ Third level 
‒ With their domain-specific HW processor language focus 
‒ Fourth level 
‒ Fifth level 
 By design the HSA infrastructure is quite easy to extend 
‒ Initial focus is on data parallel compute tasks 
‒ But other areas of support are under consideration for the future 
‒ Architected Topology infrastructure allows to reliably identify and address 
domain specific accelerator capabilities 
 By design the HSA infrastructure is easy to virtualize 
‒ Programming model does leverage few, simple hardware & platform paradigms 
(queues, signals, memory) for its operation 
‒ Future spec work may put additional requirements to cover such environments 
22 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 
CPU GPU 
Audio 
Processor 
Image 
Processor 
Shared Memory and Coherency Fabric 
Video 
Decode 
Encode 
DSP 
Security 
Processor 
Fixed 
Function 
Accelerator
CLICK TO EDIT MASTER TITLE STYLE 
IN SUMMARY… 
 HSA is not about a specific API, feature or runtime 
 Click to edit Master text styles 
‒ It is about a paradigm to efficiently access the various heterogeneous components in a system by software 
‒ Second level 
‒ It allows application programmers to use the languages of their choice to efficiently implement their code 
‒ Third level 
 HSA is not about a specific hardware or vendor or Operating System 
‒ Fourth level 
‒ It defines a few fundamental requirements and concepts as building blocks software at all levels can depend on 
‒ Fifth level 
‒ HW vendors can efficiently expose their compute acceleration features to software in an architected way 
‒ OS, runtimes and application frameworks can build efficient data and task parallel runtimes leveraging these 
‒ Application software can more easily use the right tool for the job through high level language support 
 HSA is an open and flexible concept 
‒ Collaborative participation through the HSA Foundation is encouraged for companies and academia 
‒ The first set of standards by the HSA Foundation is either released or imminent 
‒ This is a good time to engage 
23 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
WHERE TO FIND FURTHER INFORMATION ON HSA? 
CLICK TO EDIT MASTER TITLE STYLE 
 HSA  Click Foundation to edit Master Website: text http://styles 
www.hsafoundation.com 
‒ The ‒ main Second location level 
for specs, developer info, tools, publications and many things more 
‒ Also the location to apply for membership  
‒ Third level 
‒ Fourth level 
 Tools are now available at GitHUB – HSA Foundation 
‒ Fifth level 
‒ libHSA Assembler and Disassembler 
‒ https://github.com/HSAFoundation/HSAIL-Tools 
‒ HSAIL Instruction Set Simulator 
‒ https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator 
‒ HSA ISS Loader Library for Java and C++ for creation and dspatch HSAIL Kernels 
‒ https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator 
 More to come soon … 
24 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
CLICK TO EDIT MASTER TITLE STYLE 
 Click to edit Master text styles 
 With thanks to Phil Rogers, Greg Stoner Sasa Marinkovic and others for some materials and feedback 
‒ Second level 
‒ Third level 
‒ Fourth level 
‒ Fifth level 
Trademark Attribution 
HSA Foundation, the HSA Foundation logo and combinations thereof are trademarks of HSA Foundation, Inc. in the United States and/or other 
jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. 
©2014 HSA Foundation, Inc. All rights reserved. 
25 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
ANY QUESTIONS? 
CLICK TO EDIT MASTER TITLE STYLE 
 Of course there are, so go ahead  
 Click to edit Master text styles 
‒ Second level 
‒ Third level 
‒ Fourth level 
‒ Fifth level 
26 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014

Contenu connexe

Tendances

HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability ManagementXavierDevroey
 
Resume_AlicePancamo2016
Resume_AlicePancamo2016Resume_AlicePancamo2016
Resume_AlicePancamo2016Alice Pancamo
 
Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Bruno Cornec
 
E scala design platform
E scala design platformE scala design platform
E scala design platformesenciatech
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
 

Tendances (20)

HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
HSA Features
HSA FeaturesHSA Features
HSA Features
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability Management
 
Resume_AlicePancamo2016
Resume_AlicePancamo2016Resume_AlicePancamo2016
Resume_AlicePancamo2016
 
Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you
 
E scala design platform
E scala design platformE scala design platform
E scala design platform
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 

En vedette

Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation OverviewHSA Foundation
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityAMD
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
Heterogenous data base
Heterogenous data baseHeterogenous data base
Heterogenous data baseHaqnawaz Ch
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Introduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcIntroduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcSupasit Kajkamhaeng
 
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나Amazon Web Services Korea
 
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016Amazon Web Services Korea
 
Interoperability between heterogeneous healthcare information systems by John...
Interoperability between heterogeneous healthcare information systems by John...Interoperability between heterogeneous healthcare information systems by John...
Interoperability between heterogeneous healthcare information systems by John...Luigi Ceccaroni
 
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과SeongWan Kim
 

En vedette (17)

HSA Overview
HSA Overview HSA Overview
HSA Overview
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation Overview
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
Heterogenous data base
Heterogenous data baseHeterogenous data base
Heterogenous data base
 
Overview of Parallel HDF5
Overview of Parallel HDF5Overview of Parallel HDF5
Overview of Parallel HDF5
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Introduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcIntroduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpc
 
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나
Hyper Render - 차상훈, 원동연 (메가존) :: AWS 클라우드 렌더링 세미나
 
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016
AWS 클라우드 랜더팜 자동화 솔루션 HyperRender 소개 :: 인디고 :: AWS Media Day 2016
 
Interoperability between heterogeneous healthcare information systems by John...
Interoperability between heterogeneous healthcare information systems by John...Interoperability between heterogeneous healthcare information systems by John...
Interoperability between heterogeneous healthcare information systems by John...
 
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과
컴퓨터 시뮬레이션으로 확인해보는 특수상대성이론의 시각적 효과
 

Similaire à KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Peter Tröger
 
Embedded Solutions 2010 : Challenges In Porting and Abstraction , by Mapusoft
Embedded Solutions 2010 : Challenges In Porting  and Abstraction , by Mapusoft Embedded Solutions 2010 : Challenges In Porting  and Abstraction , by Mapusoft
Embedded Solutions 2010 : Challenges In Porting and Abstraction , by Mapusoft New-Tech Magazine
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compilerVignesh Tamil
 
2013 Perforce Collaboration Tour - MathWorks
2013 Perforce Collaboration Tour - MathWorks2013 Perforce Collaboration Tour - MathWorks
2013 Perforce Collaboration Tour - MathWorksPerforce
 
ch8.pdf operating systems by galvin to learn
ch8.pdf operating systems by  galvin to learnch8.pdf operating systems by  galvin to learn
ch8.pdf operating systems by galvin to learnrajitha ellandula
 
Embedded system
Embedded systemEmbedded system
Embedded systemmangal das
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Atv Reddy
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
SUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSUSUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSUSUSE España
 
Software (systems and application)
Software (systems and application)Software (systems and application)
Software (systems and application)Sherief Elmetwali
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overviewabhi1112
 
dms_resume_2016_v1
dms_resume_2016_v1dms_resume_2016_v1
dms_resume_2016_v1David Stock
 
Building reliable Ceph clusters with SUSE Enterprise Storage
Building reliable Ceph clusters with SUSE Enterprise StorageBuilding reliable Ceph clusters with SUSE Enterprise Storage
Building reliable Ceph clusters with SUSE Enterprise StorageLars Marowsky-Brée
 
Silberschatz / OS Concepts
Silberschatz /  OS Concepts Silberschatz /  OS Concepts
Silberschatz / OS Concepts Alanisca Alanis
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
Parallel language & compilers
Parallel language & compilersParallel language & compilers
Parallel language & compilersdikshagupta111
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Subhajit Sahu
 

Similaire à KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU (20)

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)
 
Embedded Solutions 2010 : Challenges In Porting and Abstraction , by Mapusoft
Embedded Solutions 2010 : Challenges In Porting  and Abstraction , by Mapusoft Embedded Solutions 2010 : Challenges In Porting  and Abstraction , by Mapusoft
Embedded Solutions 2010 : Challenges In Porting and Abstraction , by Mapusoft
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
2013 Perforce Collaboration Tour - MathWorks
2013 Perforce Collaboration Tour - MathWorks2013 Perforce Collaboration Tour - MathWorks
2013 Perforce Collaboration Tour - MathWorks
 
ch8.pdf operating systems by galvin to learn
ch8.pdf operating systems by  galvin to learnch8.pdf operating systems by  galvin to learn
ch8.pdf operating systems by galvin to learn
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
SUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSUSUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSU
 
Software (systems and application)
Software (systems and application)Software (systems and application)
Software (systems and application)
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overview
 
dms_resume_2016_v1
dms_resume_2016_v1dms_resume_2016_v1
dms_resume_2016_v1
 
Building reliable Ceph clusters with SUSE Enterprise Storage
Building reliable Ceph clusters with SUSE Enterprise StorageBuilding reliable Ceph clusters with SUSE Enterprise Storage
Building reliable Ceph clusters with SUSE Enterprise Storage
 
Silberschatz / OS Concepts
Silberschatz /  OS Concepts Silberschatz /  OS Concepts
Silberschatz / OS Concepts
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Parallel language & compilers
Parallel language & compilersParallel language & compilers
Parallel language & compilers
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 

Plus de HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 

Plus de HSA Foundation (9)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 

Dernier

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU

  • 1. CLICK TO EDIT MASTER TITLE STYLE  Click to edit Master text styles ‒ Second level ‒ Third level ‒ Fourth level ‒ Fifth level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION 1 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 2. CLICK TO EDIT MASTER TITLE STYLE THREE ERAS OF PROCESSOR PERFORMANCE Single-Core Era Multi-Core Era  Click to edit Master text styles Enabled by: ‒ Second level  Moore’s Observation  Voltage Scaling  Micro-Architecture Constrained by: ‒ Third level ‒ Fourth level Enabled by:  Moore’s Observation  Desire for Throughput  20 years of SMP arch Constrained by: ‒ Fifth level Single-thread Performance ? o we are here Power Complexity Time Throughput Performance Power Parallel SW availability Scalability Time o we are here (# of Processors) 2 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 Heterogeneous Systems Era Enabled by: Constrained by: Targeted Application  Moore’s Observation  Abundant data parallelism  Power efficient data parallel Performance processing (GPUs) Programming models Communication overheads we are here Time (Data-parallel exploitation) o
  • 3. CLICK TO EDIT MASTER TITLE STYLE THE PROGRESS OF FLOATING POINT PERFORMANCE  Click to edit Master text styles ‒ Second level ‒ Third level ‒ Fourth level ‒ Fifth level 3 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 4. CLICK TO EDIT MASTER TITLE STYLE WHY THE DIFFERENCE? ITS ABOUT THE PURPOSE  CPUs: built for general purpose, serial instruction threads, often high data locality, lot’s of conditional execution and dealing with data interdependency  Click to edit Master text styles ‒ CPU cache hierarchy is focused on general purpose data access from/to execution units, feeding back ‒ Second level previously computed data to the execution units with very low latency ‒ Third level ‒ Comparatively few registers (vs GPUs), but large caches keep often used “arbitrary” data close to the execution ‒ units Fourth level ‒ Fifth level ‒ Execution model is synchronous  GPUs: purpose-built for highly data parallel, latency tolerant workloads ‒ Apply the same sequence of instructions over and over on data with little variation but high throughput (“streaming data”), passing the data from one processing stage to another (latency tolerance) ‒ Compute units have a relatively large register file store ‒ Using a lot of “specialty caches” (constant cache, Texture Cache, etc), data caches optimized for SW data prefetch ‒ separate memory spaces (e.g. GPU local memory) ‒ Execution model is asynchronous 4 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 5. CLICK TO EDIT MASTER TITLE STYLE THE HSA (= HETEROGENEOUS SYSTEM ARCHITECTURE) GOALS  To bring accelerators forward as a first class processor within the system  Click to edit Master text styles ‒ Unified process address space access across all processors ‒ Second level ‒ Operates in pageable system memory ‒ Third level ‒ Full memory coherency between the CPU and HSA components ‒ Fourth level ‒ Well-defined relaxed consistency memory model suited for many high level languages ‒ Fifth level ‒ User mode dispatch/scheduling (eliminates “drivers” from the dispatch path) ‒ QoS through pre-emption and context switching*  It is not dictating a specific platform or component microarchitecture! ‒ It defines the “what needs to be supported”, not the “how it needs to be supported” ‒ Focus is on providing a robust data parallel application execution infrastructure for “regular languages” 5 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 6. CLICK TO EDIT MASTER TITLE STYLE THE GOALS OF HSA, PART 2  To attract mainstream programmers  Click to edit Master text styles ‒ By making it easier writing data parallel code ‒ Second level ‒ Support broader set of languages beyond traditional GPGPU Languages ‒ Third level ‒ Support for Task Parallel & Nested Data Parallel Runtimes ‒ Fourth level ‒ Rich debugging and performance analysis support ‒ Fifth level  Create a platform architecture accessible for all accelerator types, not only GPU ‒ Focused on the APU/SOC and compute, but not preventing other accelerator types to participate  But one can’t do it alone… 6 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 7. CLICK TO EDIT MASTER TITLE STYLE WHAT IS THE HSA FOUNDATION?  This is the short version…  Click to edit Master text styles “The HSA Foundation is a not-for-profit consortium of SOC and SOC IP vendors, OEMs, academia, OSVs and ISVs defining a consistent ‒ Second level heterogeneous platform architecture to make it dramatically easier to ‒ Third level program heterogeneous parallel devices” ‒ Fourth level ‒ Fifth level 7 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 Feb 2014  It spans multiple host platform architectures and programmable data parallel components collaborating within the same HSA system architecture - CPU: x86, ARM, MIPS, … - Accelerator types: GPUs, DSPs, …  Specifications define HW & SW platform requirements for applications to target the feature set from high level languages and APIs  The System Architecture specification defines the required component and platform features for HSA compliant components  Other HSA specifications leverage these properties for the SW definition
  • 8. CLICK TO EDIT MASTER TITLE STYLE THE KEY DELIVERABLES OF THE HSA FOUNDATION  IP, Specifications and API’s  The primary specifications are  Click to edit Master text styles ‒ The HSA Platform System Architecture Specification ‒ Second level ‒ Focus on hardware requirements and low level system software Support ‒ Third level ‒ “Small Model” (32bit) and “Large Model” ( 64bit) ‒ The HSA Programmer Reference Manual ‒ Fourth level ‒ Definition of an HSAIL Virtual ISA ‒ Fifth level ‒ Binary format (BRIG) ‒ Compiler writers guide and Libraries developer guide ‒ The HSA System Runtime Specification ‒ HSA Tools ‒ LLVM to HSAIL Compiler ‒ HSAIL Assembler  HSA Conformance Suite  Open source community and “reference” implementation of SW stack (Linux) by AMD ‒ To “jump start” the ecosystem ‒ Allow a single shared implementation where appropriate ‒ Enable university research in all areas 8 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 Platform (Software) System Architecture Specification Programmer’s Reference Manual System Runtime Specification Tools Conformance
  • 9. OPENCL™ WITH HSA, NOT OPENCL™ VS HSA! CLICK TO EDIT MASTER TITLE STYLE  HSA is an optimized platform architecture, which will run OpenCL™ very well  Click to edit Master text styles ‒ It is a complementary standard, not a competitor to OpenCL™ ‒ It is focused on the hardware and system platform runtime definition more than an API itself ‒ Second level ‒ It is ready to support many more languages than C/C++, including managed code languages ‒ Third level ‒ Fourth level  OpenCL™ on HSA benefits from a rich and consistent platform infrastructure ‒ Fifth level ‒ Pointers shared between CPU and GPU (Shared Virtual Memory), Avoidance of wasteful copies ‒ Low latency dispatch ‒ Improved and consistent memory model ‒ Virtual function calls ‒ Flexible control flow ‒ Exception generation and handling ‒ Device and platform atomics 9 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 10. CLICK TO EDIT MASTER TITLE STYLE THE SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION  The membership spans a wide variety of IP and platform architecture owners  Click to edit Master text styles ‒ Several host platform architectures (x86, ARM, MIPS, …) ‒ Second level ‒ Third level  The specifications define a common set of platform properties ‒ Fourth level ‒ provide a dependable hardware and system foundation for application software, libraries and ‒ runtimes Fifth level ‒ Eliminate “weak points” in the system software- and hardware architecture of traditional platforms, causing overhead processing data parallel workloads ‒ Focusing on a few key concepts (queues, signals, memory) optimized and used throughout the architecture 10 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 11. CLICK TO EDIT MASTER TITLE STYLE THE SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION  The main deliverables are: ‒ Well-defined, consistent and dependable memory model all HSA agents operate in  Click to edit Master text styles ‒ Share access to process virtual memory between HSA agents (“ptr-is-ptr”) ‒ Second level ‒ Low-latency workload dispatch, contained in user-mode queues ‒ Third level ‒ Scalability across a wide range of platforms, from smartphones to clients and servers ‒ Fourth level ‒ These properties are ‒ Fifth level leveraged in the “HSA Programmer’s Reference”, HSAIL and HSA Runtime specifications 11 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 12. DESIGN CRITERIA FOR THE HSA INFRASTRUCTURE CLICK TO EDIT MASTER TITLE STYLE  HSA defines platform & HW requirements Software can depend on ‒ Comprehensive Memory model (well-defined visibility and ordering rules for transactions)  Click to edit Master text styles ‒ Shared Virtual Memory (same page table mapping, HW access enforcement) ‒ Second level ‒ Page fault capability for every HSA agent ‒ Third level ‒ Cache Coherency Domains ‒ Fourth level ‒ Architected, memory-based signaling and synchronization ‒ Fifth level ‒ User Mode Queues, Architected Queuing Language (AQL), Service Queues for runtime callbacks ‒ Preemption / Quality of Service* ‒ Error Reporting ‒ Hardware Debug and profiling support requirements ‒ Architected Topology Discovery  Thin System SW Layer enables HW features for use by an application runtime ‒ Mainly responsible for hw init, access enforcement and resource management ‒ Provides a consistent, dependable feature set for application layer through SW primitives 12 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 13. CLICK TO EDIT MASTER TITLE STYLE HSA SECURITY AND EXECUTION MODEL  HSA components operate in the same security infrastructure as the host CPU  Click to edit Master text styles ‒ User and privileged memory distinction ‒ Second level ‒ Hardware enforced process space isolation ‒ Third level ‒ Page attributes (Read, write, execute) protections enforced by HW and apply as defined by system software ‒ Fourth level ‒ Fifth level  Internally, the platform partitions functionality by privilege level ‒ User mode queues can only run AQL packets within the defined process context  HSA defines Quality of Service requirements ‒ Requires support for mechanisms to schedule both HSA and non-HSA workloads for devices that support both task types with appropriate priority, latency, throughput and scheduling constraints. ‒ Context Switch ‒ Preempt ‒ Terminate and Context Reset 13 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 14. CLICK TO EDIT MASTER TITLE STYLE HSA COMMAND AND DISPATCH FLOW  Click to edit Master text styles ‒ Second level ‒ Third level Application ‒ Fourth level ‒ Fifth level C Application B Application A Optional Dispatch Buffer C C C C C Hardware Queue B B B Hardware Queue A A A 14 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 Accelerator Hardware (GPU) Hardware Queue SW view:  User-mode dispatches to HW  No Kernel Driver overhead  Low dispatch times  CPU & GPU dispatch APIs HW view:  HW / microcode controlled  HW scheduling  Architected Queuing Language (AQL)  HW-managed protection
  • 15. WHAT DEFINES HSA PLATFORMS AND COMPONENTS? CLICK TO EDIT MASTER TITLE STYLE  An HSA compatible platform consists of “HSA components” and “HSA agents”  Click to edit Master text styles ‒ Both adhere to the same various system architecture requirements (memory model, etc) ‒ Second level ‒ HSA agents can submit AQL packets for execution, may be but are not required to be an HSA component (e.g. host CPU cores can be an HSA agent, but “step out” of the role) ‒ Third level ‒ HSA components can be a “target” of AQL, by implication they are also HSA agents (= generate AQL packets) ‒ Fourth level ‒ Fifth level  Defined in the “System Architecture” specification ‒ The HSAIL and “Programmer’s Reference Manual” specifications define the software execution model ‒ Architected mechanisms to enqueue and dispatch workloads from one HSA agent queue to another eliminate the need to use the host CPU for these purposes for a lot of scenarios ‒ Architected infrastructure allows exchanging data with non-HSA compliant components in a platform ‒ Fundamental data types are naturally aligned 15 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 16. CLICK TO EDIT MASTER TITLE STYLE WHAT DEFINES HSA PLATFORMS AND COMPONENTS?  There are two different machine models (“small” and “large”) ‒ target different functionality levels ‒ It takes into account different feature requirements for different platform environments ‒ In all cases, the same HSA application programming model is used to target HSA agents and provides the same power–efficient and low-latency dispatch mechanisms, synchronization primitives and SW programming model  Click to edit Master text styles ‒ Second level ‒ Third level ‒ Fourth level  Applications written to target HSA small model machines will generally work on large ‒ Fifth level model machines ‒ If the large model platform and host Operating System provides a 32bit process environment Properties Small Machine Model Large Machine Model Platform targets embedded or personal device space (controllers, smartphones, etc.) 16 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 PC, workstation, cloud Server, etc running more demanding workloads Native pointer size 32bit 64bit (+ 32bit ptr if 32bit processes are supported) Floating point size Half (FP16*), Single (FP32) precision Half (FP16*), Single (FP32), Double (FP64) precision Atomic ops size 32bit 32bit, 64bit *min. Load and store on memory
  • 17. CLICK TO EDIT MASTER TITLE STYLE THE HSA MEMORY MODEL REQUIREMENTS  A memory model defines how writes by one work item or agent becomes  Click to edit Master text styles visible other work items and agents ‒ Second level ‒ Rules that need to be adhered to by compilers and application threads ‒ Third level ‒ It defines visibility and ordering rules of write and read events across work items, HSA agents and interactions with non-HSA components in the system ‒ Fourth level ‒ Important to define ‒ Fifth level scope for performance optimizations in the compiler and allow reordering in the Finalizer  HSA is defined to be compatible with C++11, Java and .NET Memory Models ‒ Relaxed consistency memory model for parallel compute performance ‒ Inherently maps to many CPU and device architectures ‒ Efficient sequential consistency mechanisms supported to fit high-level language programming models ‒ Scope visibility controlled by Load.Acquire / Store.Release & Barrier semantics 17 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 18. CLICK TO EDIT MASTER TITLE STYLE THE HSA MEMORY MODEL REQUIREMENTS  A consistent, full set of atomic operations is available  Click to edit Master text styles ‒ Naturally aligned on size, small machine model supports 32bit, large machine model supports 32bit and 64bit ‒ Second level  Cache Coherency between HSA agents (& host CPU) is maintained by default ‒ Third level ‒ Key feature of the HSA system & platform environment ‒ Fourth level ‒ Defined transition points to non-HSA “data domains” (e.g. graphics, audio, …) ‒ Fifth level 18 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 19. THE HSA SIGNALING INFRASTRUCTURE CLICK TO EDIT MASTER TITLE STYLE  HSA supports hardware-assisted signaling and synchronization primitives  Click to edit Master text styles ‒ ‒ Defines Second consistent, level memory based semantics to synchronize with work items processed by HSA agents ‒ Third level ‒ e.g. 32bit or 64bit value, content update, wait on value by HSA agents and AQL packets ‒ Fourth level ‒ Typically hardware-‒ Fifth level assisted, power-efficient & low-latency way to synchronize execution of work items between threads on HSA agents  Allows one-to-one and one-to-many signaling ‒ The signaling semantics follow general atomicity requirements defined in the memory model ‒ System Software, runtime & application SW can use this infrastructure to efficiently build higher-level synchronization primitives like mutexes, semaphores, … 19 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 Source: wikimedia commons
  • 20. CLICK TO EDIT MASTER TITLE STYLE WHAT IS HSAIL?  HSAIL is the intermediate language for parallel compute in HSA  Click to edit Master text styles ‒ HSAIL is a low level instruction set designed for parallel compute in a shared virtual memory environment. ‒ Second level ‒ Generated by a high level compiler (LLVM, gcc, Java VM, etc) ‒ Third level ‒ Compiled down to GPU ISA or other parallel processor ISA by an IHV Finalizer ‒ Fourth level ‒ Finalizer may execute at run time, install time or build time, depending on platform type ‒ Fifth level  HSAIL is SIMT in form and does not dictate hardware microarchitecture ‒ HSAIL is designed for fast compile time, moving most optimizations to HL compiler ‒ Limited register set avoids full register allocation in Finalizer ‒ HSAIL is at the same level as PTX: an intermediate assembly or Virtual Machine Target ‒ Represented as bit-code in in a Brig file format with support late binding of libraries. 20 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 21. CLICK TO EDIT MASTER TITLE STYLE LOOKING BEYOND THE TRADITIONAL GPU LANGUAGES  Dynamic Languages are one of the most interesting areas for exploration of  Click to edit Master text styles heterogeneous parallel runtimes and have many applications ‒ Second level  Dynamic Languages have different compilation foundations targeting HSA / HSAIL ‒ Third level ‒ Java/Scala, JavaScript, Dart, etc ‒ Fourth level ‒ Fifth level  See Project Sumatra - http://openjdk.java.net/projects/sumatra/ ‒ Formal Project for GPU Acceleration for Java in OpenJDK  HSA allows efficient support for other standards based language environments ‒ like OpenMP, Fortran, GO, Haskell ‒ And domain specific languages like Halide, Julia, and many others 21 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 22. CLICK TO EDIT MASTER TITLE STYLE LOOKING BEYOND THE DATA PARALLEL COMPUTE APPLICATION  The initial release of the HSA specifications focuses on data parallel compute language and application scenarios  Click to edit Master text styles ‒ Focus is on integrating GPUs into the general software infrastructure ‒ Second level ‒ But the next generations of the specifications may apply to other domains ‒ Third level ‒ With their domain-specific HW processor language focus ‒ Fourth level ‒ Fifth level  By design the HSA infrastructure is quite easy to extend ‒ Initial focus is on data parallel compute tasks ‒ But other areas of support are under consideration for the future ‒ Architected Topology infrastructure allows to reliably identify and address domain specific accelerator capabilities  By design the HSA infrastructure is easy to virtualize ‒ Programming model does leverage few, simple hardware & platform paradigms (queues, signals, memory) for its operation ‒ Future spec work may put additional requirements to cover such environments 22 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014 CPU GPU Audio Processor Image Processor Shared Memory and Coherency Fabric Video Decode Encode DSP Security Processor Fixed Function Accelerator
  • 23. CLICK TO EDIT MASTER TITLE STYLE IN SUMMARY…  HSA is not about a specific API, feature or runtime  Click to edit Master text styles ‒ It is about a paradigm to efficiently access the various heterogeneous components in a system by software ‒ Second level ‒ It allows application programmers to use the languages of their choice to efficiently implement their code ‒ Third level  HSA is not about a specific hardware or vendor or Operating System ‒ Fourth level ‒ It defines a few fundamental requirements and concepts as building blocks software at all levels can depend on ‒ Fifth level ‒ HW vendors can efficiently expose their compute acceleration features to software in an architected way ‒ OS, runtimes and application frameworks can build efficient data and task parallel runtimes leveraging these ‒ Application software can more easily use the right tool for the job through high level language support  HSA is an open and flexible concept ‒ Collaborative participation through the HSA Foundation is encouraged for companies and academia ‒ The first set of standards by the HSA Foundation is either released or imminent ‒ This is a good time to engage 23 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 24. WHERE TO FIND FURTHER INFORMATION ON HSA? CLICK TO EDIT MASTER TITLE STYLE  HSA  Click Foundation to edit Master Website: text http://styles www.hsafoundation.com ‒ The ‒ main Second location level for specs, developer info, tools, publications and many things more ‒ Also the location to apply for membership  ‒ Third level ‒ Fourth level  Tools are now available at GitHUB – HSA Foundation ‒ Fifth level ‒ libHSA Assembler and Disassembler ‒ https://github.com/HSAFoundation/HSAIL-Tools ‒ HSAIL Instruction Set Simulator ‒ https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator ‒ HSA ISS Loader Library for Java and C++ for creation and dspatch HSAIL Kernels ‒ https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator  More to come soon … 24 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 25. CLICK TO EDIT MASTER TITLE STYLE  Click to edit Master text styles  With thanks to Phil Rogers, Greg Stoner Sasa Marinkovic and others for some materials and feedback ‒ Second level ‒ Third level ‒ Fourth level ‒ Fifth level Trademark Attribution HSA Foundation, the HSA Foundation logo and combinations thereof are trademarks of HSA Foundation, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2014 HSA Foundation, Inc. All rights reserved. 25 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014
  • 26. ANY QUESTIONS? CLICK TO EDIT MASTER TITLE STYLE  Of course there are, so go ahead   Click to edit Master text styles ‒ Second level ‒ Third level ‒ Fourth level ‒ Fifth level 26 | THE HETEROGENEOUS SYSTEM ARCHITECTURE – IT’S (NOT) ALL ABOUT THE GPU | MARCH 1, 2014