The GPGPU Continuum

THE GPGPU CONTINUUM

Ofer Rosenberg

The GPU continuum workshop, April 25 2013

CONTENT
• Intel’s Compute Continuum
• GPGPU Evolution
• The GPGPU Continuum
• Mobile GPGPU challenges
• GPGPU Continuum challenges
• Towards the Continuum

INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010

GPGPU EVOLUTION

G80 – 346 GFLOPS

2004 – Stanford University: Brook for GPUs
2006 – AMD releases CTM
NVIDIA releases CUDA
2008 – OpenCL 1.0 released

R580 – 375 GFLOPS

GPGPU EVOLUTION

Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X2 GPUs
Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A
14,336 Xeon X5670 CPUs
7,168 Nvidia Tesla M2050 GPUs
Source: http://www.top500.org/lists/

Tianhe-1 : 563 TFLOPS
Tianhe-1A : 2577 TFLOPS

GPGPU EVOLUTION

2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support for Renderscript
2014 – NVIDIA Tegra 5 will support CUDA

2013 – GPGPU Continuum becomes a reality

THE GPGPU CONTINUUM

Apple A6 GPU
25 GFLOPS
< 2W

AMD G-T16R
46 GFLOPS*
4.5W

Intel i7-3770
511 GFLOPS*
77W

NVIDIA GTX Titan
4500 GFLOPS
250W

ORNL TITAN SC
27 PFLOPS
8200 KW

* GFLOPS of CPU+GPU

Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum:

A common ecosystem
built on a common (SW) architecture

INTRO TO LEADING MOBILE GPU VENDORS
Imagination PowerVR 543
• Apple, Samsung, Motorola,
Intel
• Unified Shaders
• Supports OpenCL 1.1 (E)
• 38 Gflops (Apple’s MP4 ver)

Vivante CG4000
• Unified Shaders
• 4 Cores, SIMD4 each
• Supports OpenCL 1.2
• 48 Gflops

Qualcomm Adreno 320
• Part of Snapdragon S4
• Unified Shader
• SIMD4 ?
• Supports OpenCL 1.1 (E)
• 50 GFlops

ARM Mali T604
• 4 Cores
• Multiple “pipes” per core
• Supports OpenCL 1.1
• 68 GFlops

NVIDIA Tegra 4
• 6 X 4-wide Vertex shaders
• 4 X 4-wide Pixel Shaders
• No GPGPU support
• 74 GFLOPS
http://kyokojap.myweb.hinet.net/gpu_gflops/

MOBILE GPGPU CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•

Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)

•

Ecosystem
•

•

Lack of libraries, wizards, middleware  Slow & expensive development

Distribution Model
• Driver updates are part of OS distribution (no more per-month updates…)
• End users are less likely to update version  higher standards on stability &
performance of driver release

•

Security – the unspoken issue (hole) …

GPGPU CONTINUUM CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•

Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)

•

Ecosystem
•

•

Lack of libraries, wizards, middleware  Slow & expensive development

Distribution Model
• End users are less likely to update version higher standards on stability &
performance of driver release

•

Security – the unspoken issue (hole) …

These challenges are a barrier to GPGPU adoption across the continuum

TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …

GPU


OpenCL
Render
Script

GPU

Direct
Compute

CUDA

PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC

Render
Script

GPU

Direct
Compute
C++ AMP

CUDA
Fortran
NumbaPro
(Python)

PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC

Render
Script

GPU

Direct
Compute
C++ AMP

CUDA
Fortran
NumbaPro
(Python)

A Jungle of languages… but are these the right ones ?

•

Current GPGPU languages are C/C++
based
• There are “binding” to Python, Java,
Javascript – but kernels are still C/C++

•

Current developers trends:
• Managed languages (Java , C#)
• Scripting languages (Python, PHP)

https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language

• Higher abstraction & manageability:
• More room for tools to excel on
optimization
• Mitigate difference between GPU
architectures

GPGPU languages need to evolve
Data from CodeEval.com, based on 100K+ code samples

TOWARDS THE CONTINUUM (2) - SOFTWARE STACK

CUDA

LLVM IR

Vendor X IL
Vendor X GPU


OpenCL

LLVM IR

Vendor X IL
Vendor X GPU

CUDA

•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR

•

Most languages also posses similar
“API capabilities” set

OpenACC
Render
Script

OpenCL

LLVM IR

Vendor X IL
Vendor X GPU

CUDA

•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR

•

•

Most languages also posses similar
“API capabilities” set
Defining a common stack based on
LLVM & common API will:
• Improve the compiler

OpenACC
Render
Script

OpenCL

LLVM IR

Vendor X IL

• Increase driver quality & stability
• Enable unified debugger / profiler

Vendor X GPU

• …

Define GPGPU Virtual Machine based on LLVM

CUDA

TAKEAWAYS
• GPGPU Continuum is here - from Mobile devices to HPC
• Vision: A common ecosystem built on a common (SW)
architecture

• Challenges: many architectures, immature tools, ecosystem

QUESTIONS
• Q: What about “Heterogeneous Computing” ?
• A: Go back, replace each “GPGPU” with “Heterogeneous
Computing” – and it all fits…

• More ?

SOME SOURCES:
•

http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Newgraphics-the-biggest-news-in-Ivy-Bridge.html

•

http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf

•

http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5

•

http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400

•

http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/

•

http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno

•

http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29

•

http://en.wikipedia.org/wiki/Mali_(GPU)

•

http://johndayautomotivelectronics.com/?p=12412

•

http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidiageforce-ulp/

•

http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx

•

https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0

•

http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/

The GPGPU Continuum

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à The GPGPU Continuum

Similaire à The GPGPU Continuum (20)

Dernier

Dernier (20)

The GPGPU Continuum