This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
6. GPGPU EVOLUTION
Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X2 GPUs
Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A
14,336 Xeon X5670 CPUs
7,168 Nvidia Tesla M2050 GPUs
Source: http://www.top500.org/lists/
Tianhe-1 : 563 TFLOPS
Tianhe-1A : 2577 TFLOPS
7. GPGPU EVOLUTION
2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support for Renderscript
2014 – NVIDIA Tegra 5 will support CUDA
2013 – GPGPU Continuum becomes a reality
8. THE GPGPU CONTINUUM
Apple A6 GPU
25 GFLOPS
< 2W
AMD G-T16R
46 GFLOPS*
4.5W
Intel i7-3770
511 GFLOPS*
77W
NVIDIA GTX Titan
4500 GFLOPS
250W
ORNL TITAN SC
27 PFLOPS
8200 KW
* GFLOPS of CPU+GPU
Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum:
A common ecosystem
built on a common (SW) architecture
9. INTRO TO LEADING MOBILE GPU VENDORS
Imagination PowerVR 543
• Apple, Samsung, Motorola,
Intel
• Unified Shaders
• Supports OpenCL 1.1 (E)
• 38 Gflops (Apple’s MP4 ver)
Vivante CG4000
• Unified Shaders
• 4 Cores, SIMD4 each
• Supports OpenCL 1.2
• 48 Gflops
Qualcomm Adreno 320
• Part of Snapdragon S4
• Unified Shader
• SIMD4 ?
• Supports OpenCL 1.1 (E)
• 50 GFlops
ARM Mali T604
• 4 Cores
• Multiple “pipes” per core
• Supports OpenCL 1.1
• 68 GFlops
NVIDIA Tegra 4
• 6 X 4-wide Vertex shaders
• 4 X 4-wide Pixel Shaders
• No GPGPU support
• 74 GFLOPS
http://kyokojap.myweb.hinet.net/gpu_gflops/
10. MOBILE GPGPU CHALLENGES
•
Many Different GPU Architectures
• Optimizing for each sets high bar on development costs
•
Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)
•
Ecosystem
•
•
Lack of libraries, wizards, middleware Slow & expensive development
Distribution Model
• Driver updates are part of OS distribution (no more per-month updates…)
• End users are less likely to update version higher standards on stability &
performance of driver release
•
Security – the unspoken issue (hole) …
11. GPGPU CONTINUUM CHALLENGES
•
Many Different GPU Architectures
• Optimizing for each sets high bar on development costs
•
Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)
•
Ecosystem
•
•
Lack of libraries, wizards, middleware Slow & expensive development
Distribution Model
• End users are less likely to update version higher standards on stability &
performance of driver release
•
Security – the unspoken issue (hole) …
These challenges are a barrier to GPGPU adoption across the continuum
13. TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
OpenCL
Render
Script
GPU
Direct
Compute
CUDA
14. TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
PyOpenCL
WebCL
Aparapi
(Java)
OpenCL
OpenACC
Render
Script
GPU
Direct
Compute
C++ AMP
CUDA
Fortran
NumbaPro
(Python)
15. TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
PyOpenCL
WebCL
Aparapi
(Java)
OpenCL
OpenACC
Render
Script
GPU
Direct
Compute
C++ AMP
CUDA
Fortran
NumbaPro
(Python)
A Jungle of languages… but are these the right ones ?
16. TOWARDS THE CONTINUUM (1) - LANGUAGES
•
Current GPGPU languages are C/C++
based
• There are “binding” to Python, Java,
Javascript – but kernels are still C/C++
•
Current developers trends:
• Managed languages (Java , C#)
• Scripting languages (Python, PHP)
https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language
• Higher abstraction & manageability:
• More room for tools to excel on
optimization
• Mitigate difference between GPU
architectures
GPGPU languages need to evolve
Data from CodeEval.com, based on 100K+ code samples
18. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
OpenCL
LLVM IR
Vendor X IL
Vendor X GPU
CUDA
19. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
•
Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR
•
Most languages also posses similar
“API capabilities” set
OpenACC
Render
Script
OpenCL
LLVM IR
Vendor X IL
Vendor X GPU
CUDA
20. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
•
Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR
•
•
Most languages also posses similar
“API capabilities” set
Defining a common stack based on
LLVM & common API will:
• Improve the compiler
OpenACC
Render
Script
OpenCL
LLVM IR
Vendor X IL
• Increase driver quality & stability
• Enable unified debugger / profiler
Vendor X GPU
• …
Define GPGPU Virtual Machine based on LLVM
CUDA
21. TAKEAWAYS
• GPGPU Continuum is here - from Mobile devices to HPC
• Vision: A common ecosystem built on a common (SW)
architecture
• Challenges: many architectures, immature tools, ecosystem
22. QUESTIONS
• Q: What about “Heterogeneous Computing” ?
• A: Go back, replace each “GPGPU” with “Heterogeneous
Computing” – and it all fits…
• More ?