Nvidia SC13 Podcast

SUPERCOMPUTING 2013 PRESS DECK

Sumit Gupta | General Manager, Tesla Accelerated Computing

SC13
News

1

IBM Taps GPU Accelerators

2

New Product Announcements

3

New Supercomputer Announcements

Accelerated Computing Growing Fast
2x Growth in One Year
50%

Percent of HPC Systems
With Accelerators

44%

Hundreds of GPU
Accelerated Apps
300

242

250

40%

200

30%
22%

24%

150

20%

NVIDIA GPU is
Accelerator of Choice
INTEL PHI

4%

OTHERS

11%

182
113

100

10%

50

0%

0
2010

2011

2012

Intersect360 Research
HPC User Site Census: Systems, July 2013

NVIDIA GPUs

85%

2011

2012

2013
Intersect360 Research
HPC User Site Census: Systems, July 2013

IBM Using GPUs to Accelerate
Enterprise & Data Analytics Applications
Application
Infrastructure

Business Intelligence

Predictive Analytics
Risk Analytics

IBM Partners with NVIDIA to Build NextGeneration Supercomputers

+
Tesla

GPU

POWER8

CPU

GPU-Accelerated POWER-Based Systems Available in 2014

GPU Computing in Data Centers

Power
ARM64
x86

x86

2007

2008

2009

2010

2011

2012

2013

2014

Linux GCC Compiler to Support GPU Accelerators
Open Source
OpenACC in GCC by Mentor Graphics & Samsung

Pervasive Impact
Free to all Linux users

Mainstream
Most Widely Used HPC Compiler

“ Incorporating OpenACC into GCC is an excellent example of open source and
open standards working together to make accelerated computing broadly
accessible to all Linux developers.

”

7

OpenACC-standard.org confidential

Oscar Hernandez
Oak Ridge National Laboratory

Tesla K40

World’s Fastest Accelerator
for Supercomputing and
Big Data Analytics

CUDA 6

Dramatically Simplifies
Parallel Programming with
Unified Memory

Tesla K40

World’s Fastest Accelerator
FASTER

1.4 TF| 2880 Cores | 288 GB/s
ns/day

5

LARGER

2x Memory Enables More Apps

AMBER Benchmark

4

SMARTER

Unlock Extra Performance
Using Power Headroom

6GB

3
2

Fluid
Rendering
Dynamics
Seismic
Analysis

1
0
CPU

K20X

K40

GPU Boost

12GB
AMBER Benchmark: SPFP-Nucleosome
CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40

GPU Boost

Up to 25% Extra Performance on Applications
Use Power Headroom to Run at Higher Clocks
1.40

25%

Faster
1.20

20%

Faster

14%

Faster

17%

Faster

1.00
0.80

13%

Faster

0.60
0.40
0.20

11%

Faster

0.00
AMBER SPFP-TRPCage

Tesla K40 (base)

LAMMPS-EAM

NAMD 2.9-APOA1

Tesla K40 with GPU Boost

ANNOUNCING

Unified Memory

CUDA 6

Unified Memory

Dramatically Lower Developer Effort
Developer View Today

System
Memory

GPU Memory

Developer View With
Unified Memory

Unified Memory

Super Simplified Memory Management Code
CPU Code
void sortfile(FILE *fp, int N) {
char *data;
data = (char *)malloc(N);

CUDA 6 Code with Unified Memory
void sortfile(FILE *fp, int N) {
char *data;
cudaMallocManaged(&data, N);

fread(data, 1, N, fp);
qsort(data, N, 1, compare);

qsort<<<...>>>(data,N,1,compare);
cudaDeviceSynchronize();

use_data(data);

use_data(data);

free(data);

}

fread(data, 1, N, fp);

cudaFree(data);

}

Greenest Supercomputer in the World
Tokyo Tech KFC System

4000+ MFLOPS per Watt
25% Higher than #1 Green500 System
160 Tesla K20X GPUs

Oil Immersion Technology
Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W

ANSYS Fluent Doubles Performance with GPUs
Automobile Drag Simulation Throughput
30

Number of Jobs per Day

25

90%
Faster

20
15

2x

10

Better Insight for Low Drag Design

5

2%

0
CPU

K40

2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s
Sedan Geometry, 3.6M mixed cells
Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver

1.5B

Less Drag

Gal. of Fuel Saved/Year

Tesla K40

20-40% Faster than K20X on Applications
1.5

1.4x

K20X

1.3x

1.2x

1.3x

K40 @ base

1.3x

1.3x

K40 @ boost

1.3x

1.0

0.5

0.0
ANSYS 14

LAMMPS

NAMD 2.9

AMBER

LSMS

QMCPACK

SMP-V14sp-4

EAM

APOA1

SPFP-Nucleosome

Fe32

3x3x1

CUBLAS

First Tesla K40 Customers

CSC Finland

Texas Advanced
Computing Center

CEA France

Swinburne
Australia

K20X

K40

Peak Single Precision
Peak SGEMM

3.93 TF
2.95 TF

4.29 TF
3.22 TF

Peak Double Precision
Peak DGEMM

1.31 TF
1.22 TF

1.43 TF
1.33 TF

Memory size

6 GB

12 GB

Memory BW (ECC off)

250 GB/s

288 GB/s

Memory Clock

2.6 GHz

3.0 GHz

PCIe Gen

Gen 2

Gen 3

# of Cores

2688

2880

Core Clock

732 MHz

Base: 745 MHz
Boost Clocks: 810 & 875 Mhz

Total Board Power

235W

235W

Form Factor

PCIe Passive

PCIe Passive, Active
9

Nvidia SC13 Podcast

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Nvidia SC13 Podcast

Similaire à Nvidia SC13 Podcast (20)

Plus de inside-BigData.com

Plus de inside-BigData.com (20)

Dernier

Dernier (20)

Nvidia SC13 Podcast