Presentation PT-4056, Harnessing Heterogeneous Systems Using C++ AMP – How the Story is Evolving, by Boby George, at the AMD Developer Summit (APU13) November 11-13, 2013.
2. HARNESSING HETEROGENEOUS SYSTEMS USING C++ AMP
Introduction
Updates
•Performance
•Productivity
•Portability
Future
2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
4. C++ AMP TIMELINE
2011
• Introduced @
AMD Fusion
Summit 11
• Announced
C++ AMP open
specification
4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
2012
• C++ AMP
V1 released
• Open Spec
V1 released
2013
• C++ AMP
V2 released
• Support in
additional
compilers
5. C++ ACCELERATED MASSIVE PARALLELISM (C++ AMP)
INTRODUCTION
What is C++ AMP?
‒ Programming model for expressing data parallel algorithms
‒ Exploit heterogeneous systems using mainstream tools
‒ Just C++ code, consisting of a language extensions and libraries
What C++ AMP gives you?
‒ Productivity: Write C++ code that runs on heterogeneous systems
‒ Portability: Write code once and run on various hardwareplatforms
‒ Performance: Write C++ code that accelerate massively
C++ Data
Parallelism
5 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Microsoft
C++ AMP
6. CODE INTRODUCTION
SEQUENTIAL C++ CODE
1. #include <iostream>
2.
3.
4. int main()
5. {
6.
int v[11] = {'G', 'd', 'k', 'k', 'n', 31, 'v', 'n', 'q', 'k', 'c'};
7.
8.
9.
10.
11.
for (int idx = 0; idx < 11; idx++)
{
v[idx] += 1;
}
12. for(unsigned int i = 0; i < 11; i++)
13.
std::cout << static_cast<char>( v[i]);
14. }
6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
7. CODE INTRODUCTION
C++ AMP CODE
1. #include <iostream>
2. #include <amp.h>
3. using namespace concurrency;
4. int main()
5. {
6.
int v[11] = {'G', 'd', 'k', 'k', 'n', 31, 'v', 'n', 'q', 'k', 'c'};
7.
8.
9.
10.
11.
array_view<int> av(11, v);
parallel_for_each(av.extent, [=](index<1> idx) restrict(amp)
{
av[idx] += 1;
});
12. for(unsigned int i = 0; i < 11; i++)
13.
std::cout << static_cast<char>(av[i]);
14. }
7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Concept Count (5)
array_view: wraps the data to operate on the
accelerator. array_view variables captured and
associated data copied to accelerator (on demand)
parallel_for_each: execute the lambda on
the accelerator once per thread
extent: the parallel loop bounds or
computation “shape”
index: the thread ID that is running the
lambda, used to index into data
restrict(amp): tells the compiler to check
that code conforms to C++ subset, and tells
compiler to target GPU
8. MAPPING TO HARDWARE
Vector Lanes
GPU
Multicore
CPU
Data
Parallelism
8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
12. PERFORMANCE
problem size in multiples of 1024
Support for Shared Memory Architecture in Visual Studio 2013
execution time in milliseconds
12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
13. PERFORMANCE PRODUCTIVITY
Enhanced Texture Functionality in Visual Studio 2013
‒ Already used to develop portable 3D Face Scanner
13 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
14. PRODUCTIVITY
Tooling Updates
‒ Side by Side CPUGPU debugging for WARP accelerator
‒ C++ AMP GPU debugging on Windows 7 and Server 2008 R2
‒ Remote GPU hardware debugging on NVIDIA GPUs
RuntimeLibrary Updates
‒ Array_view API improvements
‒ C++ AMP runtime improvements like faster texture copying
‒ Added scan algorithms to C++ AMP Algorithms Library
14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
15. PORTABILITY
C++ AMP is a high level language
C++ AMP
Announcing…
‒ C++ AMP support in CLANG
‒ Via LLVM targeting HSAIL & Khronos SPIR 1.2
‒ AMD is the project sponsor
‒ Attend Ben Sander’s talk for more details
‒ Objectives
Khronos
SPIR 1.2
DirectCompute
Hardware
‒ Offers consistent C++ AMP programming model across hardware and platforms
‒ Open source work to seed additional support on other compilers and hardware
‒ Microsoft’s Engagement
‒ Collaboration with AMD for design and validation inputs
‒ Preview bits @ https://bitbucket.org/multicoreware/cppamp-driver/
Visual Studio will continue to offer premier C++ AMP dev experience
15 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
HSAIL
16. PORTABILITY
Announcing PathScale ENZO 2014
‒ Targets NVIDIA hardware directly for higher performance
‒ Plans to target AMD hardware and Windows platform
‒ Currently in Private Beta testing phase
Complete the picture…
C++ AMP
DirectCompute
Khronos
SPIR 1.2
HSAIL
Native Code
Generation
Hardware
16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Your
favorite
compiler
18. C++ AMP GROWTH CHART
VS 2012
Performance
Productivity
Portability
18 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
VS 2013
VS Next
End Goal
19. PERFORMANCE
Support Shared Virtual Memory Architectures
More performant CPU accelerator
Convergence of CPUGPU parallelization technology
19 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
20. PRODUCTIVITY
Convergence of platforms
‒ Write code once and run across multiple platforms
Enhanced tooling support
Continue to invest in parallel algorithms
20 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
21. PORTABILITY
ISO standardization of C++ AMP features like multidimensional arrays, extend etc..
Update Open Specification to latest version of C++ AMP in Visual Studio
‒ Open Specification v1.2 to be released by November 2013
Engage with partners for C++ AMP implementation on non Microsoft technologies
21 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL