SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
OpenACC on AMD GPUs and APUs
with the PGI Accelerator Compilers
Michael Wolfe

Michael.Wolfe@pgroup.com
http://www.pgroup.com

APU13
San Jose, November, 2013
 C, C++, Fortran compilers
 Optimizing
 Vectorizing
 Parallelizing

 Graphical parallel tools
 PGDBG debugger
 PGPROF profiler







AMD, Intel, NVIDIA processors
PGI Unified Binary™ technology
Linux, MacOS, Windows
Visual Studio & Eclipse integration
PGI Accelerator support
 OpenACC
 CUDA Fortran

www.pgroup.com
SMP Parallel Programming

for( i = 0; i < n; ++i )
a[i] = sinf(b[i]) + cosf(c[i]);
SMP Parallel Programming

#pragma omp parallel for private(i)
for( i = 0; i < n; ++i )
a[i] = sinf(b[i]) + cosf(c[i]);
% pgcc –mp x.c …
AMD Radeon Block Diagram*
 Multiple Compute Units
 Vector Unit
 Pipelining / Multithreading

 Device Memory
 Cache Hierarchy


SW-managed cache (LDS)

*From “AMD Accelerated Parallel Processing – OpenCL Programming Guide”, © 2012 Advanced Micro Devices, Inc.
Heterogeneous Parallel
Programming

for( i = 0; i < n; ++i )
a[i] = sinf(b[i]) + cosf(c[i]);
Heterogeneous Parallel
Programming
#pragma acc parallel loop private(i) 
pcopyin(b[0:n], c[0:n]) 
pcopyout(a[0:n])
for( i = 0; i < n; ++i )
a[i] = sinf(b[i]) + cosf(c[i]);
% pgcc –acc –ta=radeon x.c
Overview
 Parallel programming
 GPU Architectural highlights
 OpenACC 5 minute summary
 PGI Implementation
 Performance
Abstract CPU+Accelerator Target
Accelerator Architecture Features
 Potentially separate memory (relatively small)
 High bandwidth memory interface
 Many degrees of parallelism
 MIMD parallelism across many cores
 SIMD parallelism within a core
 Multithreading for latency tolerance

 Asynchronous with host
 Performance from Parallelism
 slower clock, less ILP, simpler control unit, smaller caches
OpenACC
Open Programming Standard for Parallel Computing
“PGI OpenACC will enable programmers to easily develop portable applications that
maximize the performance and power efficiency benefits of the hybrid CPU/GPU
architecture of Titan.”
--Buddy Bland, Titan Project Director, Oak Ridge National Lab
“OpenACC is a technically impressive initiative brought together by members of the
OpenMP Working Group on Accelerators, as well as many others. We look forward to
releasing a version of this proposal in the next release of OpenMP.”

--Michael Wong, CEO OpenMP Directives Board
OpenACC Overview

 Directive-based
 Parallel Computation
 Data Management

#pragma acc data copyin( a[0:n] ) 
copy( b(0:n] ) create( tmp[0:n] )
{
for( int i = 0; i < iters; ++i ){
relax( a, b, tmp, n );
relax( b, a, tmp, n );
}
}
relax(float *x,float *y,float *t,int n){
#pragma acc data 
present( x[0:n], y[0:n], t[0:n] )
{
#pragma acc parallel loop
for( int j = 0; j < n; ++j )
t[j] = x[j];
#pragma acc parallel loop
for( int j = 1; j < n-1; ++j
x[j] = 0.25f*(t[j-1]+t[j+1] +
y[n-j+1] + y[n-j-1]);
}
}
OpenACC compared to OpenMP
 Data parallelism

 Thread parallelism

 Parallel per region

 Fixed number of threads

 Flexible || mapping

 Fixed || thread mapping

 Structured parallelism

 Tasks and loops

 Performance portability

 ?
PGI OpenACC Implementation
 C, C++, Fortran
 pgcc, pgc++, pgfortran

 Command line options





-acc
-ta=radeon
-ta=radeon,host
-ta=radeon,nvidia

 Planner
 maps program ||ism to
hardware ||ism

 Code Generator
 OpenCL API

 Runtime
 initialization
 data management
 kernel launches
Planner
 Maps parallel loops
 OpenACC abstractions
 gang, worker, vector

 OpenCL abstractions
 work group, work item

 Hardware abstractions
 wavefront

#pragma acc parallel loop gang
for( int j = 0; j < n; ++j )
t[j] = x[j];

#pragma acc parallel loop gang vector
for( int j = 0; j < n; ++j )
t[j] = x[j];
#pragma acc kernels loop independent
for( int j = 0; j < n; ++j )
t[j] = x[j];
Code Generator
 Low-level OpenCL
 “assembly code in C”

 SPIR interface to AMD
Radeon LLVM back-end

 Uses non-standard
features
 device addresses
Runtime
 Dynamically loads
OpenCL library

 Supports multiple devices
 Multiple command
queues
 Host as a device (*)

 Memory management
 device addresses
 bigbuffer(s) suballocation

 Profiling support
Performance
 AMD Piledriver 5800K
 4.0GHz
 2MB cache
 8 cores

 Single thread/core
 OpenMP parallel
 PGI 13.10 –fast –mp

 AMD Radeon 7970





Tahiti
925 MHz
3GB memory
32 compute units

 OpenACC parallel
 PGI 13.10 –fast –acc
–ta=radeon:tahiti
Cloverleaf Mantevo Miniapp
 Lagrangian-Eulerian hydrodynamics
 compressible Euler equation solver in 2D
 9500 lines of Fortran+C with OpenMP, OpenACC
 Accelerating Hydrocodes with OpenACC, OpenCL and CUDA,
Herdman et al, 2012 SC Companion
DOI: 10.1109/SC.Companion.2012.66
Performance Results
40
35
30
25

Serial

OpenMP

20

R7970
15

S10000

10
5
0

960^2x87

1920^2x87

3840^2x87

960^2x2955

1920^2x2955
OpenACC on AMD GPUs and APUs
 OpenACC is designed for performance portability
 PGI Accelerator compilers provide evidence
 Target-specific tuning still underway
 Open Beta compilers available now
 Product version in January 2014
Copyright Notice

© Contents copyright 2013, NVIDIA Corp. This material may not be
reproduced in any manner without the expressed written
permission of NVIDIA Corp.

Contenu connexe

Tendances

The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelAMD Developer Central
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengAMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 

Tendances (20)

The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 

Similaire à PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by Michael Wolfe

Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleSri Ambati
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
Porting and Maintaining your C++ Game on Android without losing your mind
Porting and Maintaining your C++ Game on Android without losing your mindPorting and Maintaining your C++ Game on Android without losing your mind
Porting and Maintaining your C++ Game on Android without losing your mindBeMyApp
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsGanesan Narayanasamy
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereRodrique Heron
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 

Similaire à PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by Michael Wolfe (20)

Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
Porting and Maintaining your C++ Game on Android without losing your mind
Porting and Maintaining your C++ Game on Android without losing your mindPorting and Maintaining your C++ Game on Android without losing your mind
Porting and Maintaining your C++ Game on Android without losing your mind
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 

Plus de AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 

Plus de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by Michael Wolfe

  • 1. OpenACC on AMD GPUs and APUs with the PGI Accelerator Compilers Michael Wolfe Michael.Wolfe@pgroup.com http://www.pgroup.com APU13 San Jose, November, 2013
  • 2.  C, C++, Fortran compilers  Optimizing  Vectorizing  Parallelizing  Graphical parallel tools  PGDBG debugger  PGPROF profiler      AMD, Intel, NVIDIA processors PGI Unified Binary™ technology Linux, MacOS, Windows Visual Studio & Eclipse integration PGI Accelerator support  OpenACC  CUDA Fortran www.pgroup.com
  • 3. SMP Parallel Programming for( i = 0; i < n; ++i ) a[i] = sinf(b[i]) + cosf(c[i]);
  • 4. SMP Parallel Programming #pragma omp parallel for private(i) for( i = 0; i < n; ++i ) a[i] = sinf(b[i]) + cosf(c[i]); % pgcc –mp x.c …
  • 5. AMD Radeon Block Diagram*  Multiple Compute Units  Vector Unit  Pipelining / Multithreading  Device Memory  Cache Hierarchy  SW-managed cache (LDS) *From “AMD Accelerated Parallel Processing – OpenCL Programming Guide”, © 2012 Advanced Micro Devices, Inc.
  • 6. Heterogeneous Parallel Programming for( i = 0; i < n; ++i ) a[i] = sinf(b[i]) + cosf(c[i]);
  • 7. Heterogeneous Parallel Programming #pragma acc parallel loop private(i) pcopyin(b[0:n], c[0:n]) pcopyout(a[0:n]) for( i = 0; i < n; ++i ) a[i] = sinf(b[i]) + cosf(c[i]); % pgcc –acc –ta=radeon x.c
  • 8. Overview  Parallel programming  GPU Architectural highlights  OpenACC 5 minute summary  PGI Implementation  Performance
  • 10. Accelerator Architecture Features  Potentially separate memory (relatively small)  High bandwidth memory interface  Many degrees of parallelism  MIMD parallelism across many cores  SIMD parallelism within a core  Multithreading for latency tolerance  Asynchronous with host  Performance from Parallelism  slower clock, less ILP, simpler control unit, smaller caches
  • 11. OpenACC Open Programming Standard for Parallel Computing “PGI OpenACC will enable programmers to easily develop portable applications that maximize the performance and power efficiency benefits of the hybrid CPU/GPU architecture of Titan.” --Buddy Bland, Titan Project Director, Oak Ridge National Lab “OpenACC is a technically impressive initiative brought together by members of the OpenMP Working Group on Accelerators, as well as many others. We look forward to releasing a version of this proposal in the next release of OpenMP.” --Michael Wong, CEO OpenMP Directives Board
  • 12. OpenACC Overview  Directive-based  Parallel Computation  Data Management #pragma acc data copyin( a[0:n] ) copy( b(0:n] ) create( tmp[0:n] ) { for( int i = 0; i < iters; ++i ){ relax( a, b, tmp, n ); relax( b, a, tmp, n ); } } relax(float *x,float *y,float *t,int n){ #pragma acc data present( x[0:n], y[0:n], t[0:n] ) { #pragma acc parallel loop for( int j = 0; j < n; ++j ) t[j] = x[j]; #pragma acc parallel loop for( int j = 1; j < n-1; ++j x[j] = 0.25f*(t[j-1]+t[j+1] + y[n-j+1] + y[n-j-1]); } }
  • 13. OpenACC compared to OpenMP  Data parallelism  Thread parallelism  Parallel per region  Fixed number of threads  Flexible || mapping  Fixed || thread mapping  Structured parallelism  Tasks and loops  Performance portability  ?
  • 14. PGI OpenACC Implementation  C, C++, Fortran  pgcc, pgc++, pgfortran  Command line options     -acc -ta=radeon -ta=radeon,host -ta=radeon,nvidia  Planner  maps program ||ism to hardware ||ism  Code Generator  OpenCL API  Runtime  initialization  data management  kernel launches
  • 15. Planner  Maps parallel loops  OpenACC abstractions  gang, worker, vector  OpenCL abstractions  work group, work item  Hardware abstractions  wavefront #pragma acc parallel loop gang for( int j = 0; j < n; ++j ) t[j] = x[j]; #pragma acc parallel loop gang vector for( int j = 0; j < n; ++j ) t[j] = x[j]; #pragma acc kernels loop independent for( int j = 0; j < n; ++j ) t[j] = x[j];
  • 16. Code Generator  Low-level OpenCL  “assembly code in C”  SPIR interface to AMD Radeon LLVM back-end  Uses non-standard features  device addresses
  • 17. Runtime  Dynamically loads OpenCL library  Supports multiple devices  Multiple command queues  Host as a device (*)  Memory management  device addresses  bigbuffer(s) suballocation  Profiling support
  • 18. Performance  AMD Piledriver 5800K  4.0GHz  2MB cache  8 cores  Single thread/core  OpenMP parallel  PGI 13.10 –fast –mp  AMD Radeon 7970     Tahiti 925 MHz 3GB memory 32 compute units  OpenACC parallel  PGI 13.10 –fast –acc –ta=radeon:tahiti
  • 19. Cloverleaf Mantevo Miniapp  Lagrangian-Eulerian hydrodynamics  compressible Euler equation solver in 2D  9500 lines of Fortran+C with OpenMP, OpenACC  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA, Herdman et al, 2012 SC Companion DOI: 10.1109/SC.Companion.2012.66
  • 21. OpenACC on AMD GPUs and APUs  OpenACC is designed for performance portability  PGI Accelerator compilers provide evidence  Target-specific tuning still underway  Open Beta compilers available now  Product version in January 2014
  • 22. Copyright Notice © Contents copyright 2013, NVIDIA Corp. This material may not be reproduced in any manner without the expressed written permission of NVIDIA Corp.