GPU programming

GPU Programming

Roberto Bonvallet
Departamento de Inform´ tica
a
Universidad T´ cnica Federico Santa Mar´a
e ı

Junio de 2010

CPU and GPU architectures

Control ALU ALU
ALU ALU
Cache

DRAM DRAM

CPU and GPU architectures

DRAM

Task and data parallelism

Task parallelism:
distributed
processing
distributed memory
message passing

Task and data parallelism

Task parallelism:
distributed
processing
distributed memory
message passing
Data parallelism:
same instruction on
different data
shared memory

Thread and memory hierarchies

Thread hierarchy:


Thread hierarchy:
grid of blocks


Thread hierarchy:
grid of blocks
blocks of threads


Thread hierarchy:
grid of blocks
blocks of threads
Memory hierarchy:
global memory (large, slow)
shared memory (per-block, small, fast)
registers (per-thread, small, fast)

Matrix-matrix multiplication

cij = aik bkj
k


cij = aik bkj
k

Cij = Aik Bkj
k


cij = aik bkj
k

Cij = Aik Bkj
k
Multiplication kernel:
initialize element of
Cij = 0
for each k:
fetch element of
Aik , Bkj into shared
memory
synchronize
compute element
of Cij = Cij + Aik Bkj
synchronize

Nvidia C1060

Core clock 602 Mhz
Multiprocessors 30
Thread processors 240 = 30 × 8
Memory size 4 GB
Memory bandwidth 102.4 GB/s
Single precision pp 933.12 Gﬂop
Double precision pp 77.76 Gﬂop

CUDA programming

Array allocation and copying
cudaMalloc((void **) &p, mem_size);

cudaMemcpy(host_p, dev_p, mem_size,
cudaMemcpyHostToDevice);

[...]

cudaMemcpy(dev_p, host_p, mem_size,
cudaMemcpyDeviceToHost);

cudaFree(p);

CUDA programming

Kernel deﬁnition
__global__ void
vector_sum(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x +
threadIdx.x;
c[i] = a[i] + b[i];
}

CUDA programming

Kernel deﬁnition
__global__ void
vector_sum(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x +
threadIdx.x;
c[i] = a[i] + b[i];
}

Kernel launch
f<<<grid_size, block_size,
sh_mem_size>>>(a, b, c);

Vortex Methods

Fluid discretized as vortices
(x, y, α)

Vortex Methods

(x, y, α)
Vortex interaction:
1
K(x, y) = − (−y, x)
2π x

Vortex Methods

(x, y, α)
Vortex interaction:
1
K(x, y) = − (−y, x)
2π x

Biot-Savart law:

u(x) = αp K(x − xp )
p

GPU programming

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à GPU programming

Similaire à GPU programming (20)

Dernier

Dernier (20)

GPU programming