SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
USING GPUS FOR
COLLISION DETECTION
AMD
Takahiro Harada
2 | Eurographics 2012 | Takahiro Harada
GPUS
3 | Eurographics 2012 | Takahiro Harada
GPU RAW PERFORMANCE
§ High memory bandwidth
§ High TFLOPS
§ Radeon HD 7970
–  32x4 SIMD engines
–  64 wide SIMD
–  3.79 TFLOPS
–  264 GB/s
0
50
100
150
200
250
300
0
0.5
1
1.5
2
2.5
3
3.5
4
4890 5870 6970 7970
TFLOPS
GB/s
4 | Eurographics 2012 | Takahiro Harada
RAW PERFORMANCE IS HIGH, BUT
§ GPU performs good only if used correctly
–  Divergence
–  ALU op / Memory op ratio
–  etc
§ Some algorithm requires operation GPUs are not good at
5 | Eurographics 2012 | Takahiro Harada
EX) HASH GRID (LINKED LIST)
§ Low ALU op / Mem op ratio
§ Spatial query using hash grid
do
{
write( node );
node = node->next;
}while( node )
§ CPUs is better for this 0
10
20
30
40
50
60
Query
HD7970
HD6970
HD5870
HD6870
PhenomXII6(1)
PhenomXII6(1)
PhenomXII6(3)
PhenomXII6(4)
PhenomXII6(5)
PhenomXII6(6)
CPU
6 | Eurographics 2012 | Takahiro Harada
EX) HASH GRID (LINKED LIST)
§ Low ALU op / Mem op ratio
§ Spatial query using hash grid
–  while()
–  {
§ Fetch element
§ Copy
–  }
§ CPUs is better for this
0
10
20
30
40
50
60
Query
HD7970
HD6970
HD5870
HD6870
PhenomXII6(1)
PhenomXII6(1)
PhenomXII6(3)
PhenomXII6(4)
PhenomXII6(5)
PhenomXII6(6)
GPU
7 | Eurographics 2012 | Takahiro Harada
REASON OF PERFORMANCE JUMP
§ Key architectural changes
–  No VLIW
–  Improved memory system
8 | Eurographics 2012 | Takahiro Harada
GPU COLLISION DETECTION
9 | Eurographics 2012 | Takahiro Harada
COLLISION DETECTION WAS HARD FOR GPUS
§ Old GPUs were not as flexible as today’s GPUs
–  Fixed function
–  Programmable shader (Vertex shader, pixel shader)
§ Grid based fluid simulation
§ Cloth simulation
Crane et al., Real-Time Simulation and Rendering of 3D Fluids, GPU Gems3 (2007)
10 | Eurographics 2012 | Takahiro Harada
UNIFORM GRID
§ Simplest but useful data structure
§ Pre Compute Language
–  Vertex shader, pixel shader
–  Vertex texture fetch -> Random write
–  Depth test, blending etc -> Atomics
§ Now it is very easy
–  Random write and atomics are supported
struct Cell
{
u32 m_counter;
u32 m_data[N ];
}
11 | Eurographics 2012 | Takahiro Harada
UNIFORM GRID
§ Simplest but useful data structure
§ Pre Compute Language
–  Vertex shader, pixel shader
–  Vertex texture fetch -> Random write
–  Depth test, blending etc -> Atomics
§ Now it is very easy
–  Random write and atomics are supported
2
5
8
12 | Eurographics 2012 | Takahiro Harada
WITH UNIFORM GRID
§ Often used for particle-based simulation
§ Other steps are embarrassingly parallel
§ Distinct Element Method (DEM)
§ Particles can be extended to…
13 | Eurographics 2012 | Takahiro Harada
SMOOTHED PARTICLE HYDRODYNAMICS
§ Fluid simulation
§ Solving the Navier-Stokes equation on particles
Harada et al., Smoothed Particle Hydrodynamics on GPUs, CGI (2007)
14 | Eurographics 2012 | Takahiro Harada
RIGID BODY SIMULATION
§ Represent a rigid body with a set of particles
§ Rigid body collision = Particle collision
Harada et al., Real-time Rigid Body Simulation on GPUs, GPU Gems3 (2007)
15 | Eurographics 2012 | Takahiro Harada
COUPLING RIGID BODIES + FLUID
16 | Eurographics 2012 | Takahiro Harada
CLOTH COLLISION DETECTION
§ Particle v.s. triangle mesh collision
§ Dual uniform grid
–  1st grid: particles
–  2nd grid: triangles
Harada et al., Real-time Fluid Simulation Coupled with Cloth, TPCG (2007)
17 | Eurographics 2012 | Takahiro Harada
SEVERAL PHYSICS PROBLEMS WERE SOLVED
Particle Simulations
Granular Materials Fluids Rigid Bodies
Cloth
Coupling
18 | Eurographics 2012 | Takahiro Harada
PROBLEM SOLVED??
§ Not yet
§ Uniform grid is not a perfect solution
§ More complicated collision detection is necessary
–  E.g., rigid body simulation
–  Broad-phase collision detection
–  Narrow-phase collision detection
19 | Eurographics 2012 | Takahiro Harada
BROAD-PHASE COLLISION DETECTION
§ Sweep & prune
–  Sort end points
–  Sweep sorted list
§ Optimizations
–  Split sweep of a large object
–  Workspace subdivision
–  Cell subdivision
Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, TOG(2010)
20 | Eurographics 2012 | Takahiro Harada
NARROW-PHASE COLLISION DETECTION
§ Variety of work
–  So many shape combinations -> divergence
§ Unified shape representation
§ Each collision computation is parallelizable
21 | Eurographics 2012 | Takahiro Harada
MULTIPLE GPU
COLLISION DETECTION
22 | Eurographics 2012 | Takahiro Harada
MULTIPLE GPU PROGRAMMING
23 | Eurographics 2012 | Takahiro Harada
Parallelization using Multiple GPUs
• Two levels of parallelization
• 1GPU
• Multiple GPUs
Memory
MemoryMemoryMemory
24 | Eurographics 2012 | Takahiro Harada
25 | Eurographics 2012 | Takahiro Harada
PARTICLE SIMULATION
ON
MULTIPLE GPUS
26 | Eurographics 2012 | Takahiro Harada
27 | Eurographics 2012 | Takahiro Harada
28 | Eurographics 2012 | Takahiro Harada
Data Management
• Not all the data have to be sent
• Data required for the computation has to be sent
• Two kinds of particles have to be sent to adjacent GPU
1. Escaped particles : Particles get out from adjacent
subdomain (adjacent GPU will be responsible for these
particles in the next time step)
2. Ghost particles in the ghost region
29 | Eurographics 2012 | Takahiro Harada
30 | Eurographics 2012 | Takahiro Harada
31 | Eurographics 2012 | Takahiro Harada
Data Transfer between GPUs
• No direct data transfer is provided on current hardwares
• Transfer via main memory
• The buffers equal to the number of connectors are allocated
• Each GPU writes data to the defined location at the same time
• Each GPU read data of the defined location at the same time
GPU0 GPU1 GPU2 GPU3
Main
Memory
32 | Eurographics 2012 | Takahiro Harada
Computation of 1 Time Step
GPU0 GPU1 GPU2 GPU3
Main
Memory
Compt
Force
GPU0
Compt
Force
GPU1
Compt
Force
GPU2
Compt
Force
GPU3
Update Update Update Update
Prepare
Send
Prepare
Send
Prepare
Send
Prepare
Send
Send Send Send Send
Synchronization
Receive Receive Receive Receive
Post
Receive
Post
Receive
Post
Receive
Post
Receive
33 | Eurographics 2012 | Takahiro Harada
Environment
• 4GPU server (Simulation) + 1GPU (Rendering)
• 1M particles
• 6GPU server boxes (Simulation) + 1GPU (Rendering) @
GDC2008
• 3M particles
GPU
GPU
GPU
GPU
GPU
GPU
GPU
Simulation GPUs
GPU
GPU
GPU
GPU
GPU
Simulation GPUs
34 | Eurographics 2012 | Takahiro Harada
MOVIE
Harada et al., Massive Particles: Particle-based Simulations on Multiple GPUs, SIG Talk(2008)
35 | Eurographics 2012 | Takahiro Harada
Results
Number of Particles (103
particles)
0
10
20
30
40
50
60
70
80
90
100
0 200 400 600 800 1000 1200
Total (1GPU)
Sim (2GPU)
Total (2GPU)
Sim (4GPU)
Total (4GPU)
ComputationTime(ms)
36 | Eurographics 2012 | Takahiro Harada
PROBLEM SOLVED??
§ Not yet
§ Most of the problems had identical object size
–  e.g., Particles
§ The reason is because of GPU architecture
–  Not designed to solve non-uniform problem
37 | Eurographics 2012 | Takahiro Harada
HETEROGENEOUS COLLISION
DETECTION
38 | Eurographics 2012 | Takahiro Harada
§ Large number of particles
§ Particles with identical size
–  Work granularity is almost the same
–  Good for the wide SIMD architecture
PARTICLE BASED SIMULATION ON THE GPU
Harada et al. 2007
39 | Eurographics 2012 | Takahiro Harada
MIXED PARTICLE SIMULATION
§ Not only small particles
§ Difficulty for GPUs
–  Large particles interact with small particles
–  Large-large collision
40 | Eurographics 2012 | Takahiro Harada
CHALLENGE
§ Non uniform work granularity
–  Small-small(SS) collision
§ Uniform, GPU
–  Large-large(LL) collision
§ Non Uniform, CPU
–  Large-small(LS) collision
§ Non Uniform, CPU
41 | Eurographics 2012 | Takahiro Harada
FUSION ARCHITECTURE
§ CPU and GPU are:
–  On the same die
–  Much closer
–  Efficient data sharing
§ CPU and GPU are good at different works
–  CPU: serial computation, conditional branch
–  GPU: parallel computation
§ Able to dispatch works to:
–  Serial work with varying granularity → CPU
–  Parallel work with the uniform granularity → GPU
42 | Eurographics 2012 | Takahiro Harada
METHOD
43 | Eurographics 2012 | Takahiro Harada
TWO SIMULATIONS
§ Small particles
§ Large particles
Build
Acc. Structure
SS
Collision
S
Integration
Build
Acc. Structure
LL
Collision
L
Integration
LS
Collision
Position
Velocity
Force
Grid
Position
Velocity
Force
44 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
Uniform Work
Non Uniform Work
CLASSIFY BY WORK GRANULARITY
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Build
Acc. Structure
45 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
GPU
CPU
CLASSIFY BY WORK GRANULARITY, ASSIGN PROCESSOR
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Build
Acc. Structure
46 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ Grid, small particle data has to be shared with the CPU for LS collision
–  Allocated as zero copy buffer
GPU
CPU
DATA SHARING
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
Build
Acc. Structure
Position
Velocity
Grid
Force
LS
Collision
47 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ Grid, small particle data has to be shared with the CPU for LS collision
–  Allocated as zero copy buffer
GPU
CPU
SYNCHRONIZATION
Position
Velocity
Force
Grid
Position
Velocity
Force
SS
Collision
S
Integration
L
Integration
LL
Collision
Position
Velocity
Grid
Force
Synchronization
LS
Collision
Build
Acc. Structure
Build
Acc. Structure
Synchronization
48 | Eurographics 2012 | Takahiro Harada
GPU
CPU
VISUALIZING WORKLOADS
Build
Acc. Structure
SS
Collision
S
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Synchronization
L
Integration
§ Small particles
§ Large particles
§ Grid construction can be moved at the end of the pipeline
–  Unbalanced workload
49 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ To get better load balancing
–  The sync is for passing the force buffer filled by the CPU to the GPU
–  Move the LL collision after the sync
GPU
CPU
LOAD BALANCING
Build
Acc. Structure
SS
Collision
S
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
Synchronization
L
Integration
LS
Collision
50 | Eurographics 2012 | Takahiro Harada
MULTI THREADING
(4 THREADS)
51 | Eurographics 2012 | Takahiro Harada
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collision
S
Integ.
CPU0
CPU1
CPU2
LS
Collision
LS
Collision
LS
Collision Synchronization
MergeMergeMerge
LL
Coll.
L
Integ.
Synchronization
S Sorting
S Sorting
S Sorting
Synchronization
Harada, Heterogeneous Particle-Based Simulation, SIG ASIA Talk(2011)
52 | Eurographics 2012 | Takahiro Harada
DEMO
GPUWork
CPUWork
53 | Eurographics 2012 | Takahiro Harada
QUESTIONS?

Contenu connexe

Tendances

Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Takahiro Harada
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard HoffnungPG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard HoffnungAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2Guerrilla
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)Takahiro Harada
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 

Tendances (20)

Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard HoffnungPG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 

En vedette

A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)Takahiro Harada
 
Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Takahiro Harada
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Takahiro Harada
 
Introducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxIntroducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxTakahiro Harada
 
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRaysTakahiro Harada
 
確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)Takahiro Harada
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Takahiro Harada
 
Component basedgameenginedesign
Component basedgameenginedesign Component basedgameenginedesign
Component basedgameenginedesign DADA246
 
ゲーム開発とデザインパターン
ゲーム開発とデザインパターンゲーム開発とデザインパターン
ゲーム開発とデザインパターンTakashi Komada
 

En vedette (11)

A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
 
Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
 
Introducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxIntroducing Firerender for 3DS Max
Introducing Firerender for 3DS Max
 
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
 
確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)
 
自由なデータ
自由なデータ自由なデータ
自由なデータ
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
 
Component basedgameenginedesign
Component basedgameenginedesign Component basedgameenginedesign
Component basedgameenginedesign
 
GPU最適化入門
GPU最適化入門GPU最適化入門
GPU最適化入門
 
ゲーム開発とデザインパターン
ゲーム開発とデザインパターンゲーム開発とデザインパターン
ゲーム開発とデザインパターン
 

Similaire à Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Aritra Sarkar
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverPraveen Narayanan
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfObject Automation
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorNajeeb Ahmad
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelMartin Zapletal
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 

Similaire à Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012) (20)

“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processor
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 

Dernier

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Dernier (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

  • 1. USING GPUS FOR COLLISION DETECTION AMD Takahiro Harada
  • 2. 2 | Eurographics 2012 | Takahiro Harada GPUS
  • 3. 3 | Eurographics 2012 | Takahiro Harada GPU RAW PERFORMANCE § High memory bandwidth § High TFLOPS § Radeon HD 7970 –  32x4 SIMD engines –  64 wide SIMD –  3.79 TFLOPS –  264 GB/s 0 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 3.5 4 4890 5870 6970 7970 TFLOPS GB/s
  • 4. 4 | Eurographics 2012 | Takahiro Harada RAW PERFORMANCE IS HIGH, BUT § GPU performs good only if used correctly –  Divergence –  ALU op / Memory op ratio –  etc § Some algorithm requires operation GPUs are not good at
  • 5. 5 | Eurographics 2012 | Takahiro Harada EX) HASH GRID (LINKED LIST) § Low ALU op / Mem op ratio § Spatial query using hash grid do { write( node ); node = node->next; }while( node ) § CPUs is better for this 0 10 20 30 40 50 60 Query HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6) CPU
  • 6. 6 | Eurographics 2012 | Takahiro Harada EX) HASH GRID (LINKED LIST) § Low ALU op / Mem op ratio § Spatial query using hash grid –  while() –  { § Fetch element § Copy –  } § CPUs is better for this 0 10 20 30 40 50 60 Query HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6) GPU
  • 7. 7 | Eurographics 2012 | Takahiro Harada REASON OF PERFORMANCE JUMP § Key architectural changes –  No VLIW –  Improved memory system
  • 8. 8 | Eurographics 2012 | Takahiro Harada GPU COLLISION DETECTION
  • 9. 9 | Eurographics 2012 | Takahiro Harada COLLISION DETECTION WAS HARD FOR GPUS § Old GPUs were not as flexible as today’s GPUs –  Fixed function –  Programmable shader (Vertex shader, pixel shader) § Grid based fluid simulation § Cloth simulation Crane et al., Real-Time Simulation and Rendering of 3D Fluids, GPU Gems3 (2007)
  • 10. 10 | Eurographics 2012 | Takahiro Harada UNIFORM GRID § Simplest but useful data structure § Pre Compute Language –  Vertex shader, pixel shader –  Vertex texture fetch -> Random write –  Depth test, blending etc -> Atomics § Now it is very easy –  Random write and atomics are supported struct Cell { u32 m_counter; u32 m_data[N ]; }
  • 11. 11 | Eurographics 2012 | Takahiro Harada UNIFORM GRID § Simplest but useful data structure § Pre Compute Language –  Vertex shader, pixel shader –  Vertex texture fetch -> Random write –  Depth test, blending etc -> Atomics § Now it is very easy –  Random write and atomics are supported 2 5 8
  • 12. 12 | Eurographics 2012 | Takahiro Harada WITH UNIFORM GRID § Often used for particle-based simulation § Other steps are embarrassingly parallel § Distinct Element Method (DEM) § Particles can be extended to…
  • 13. 13 | Eurographics 2012 | Takahiro Harada SMOOTHED PARTICLE HYDRODYNAMICS § Fluid simulation § Solving the Navier-Stokes equation on particles Harada et al., Smoothed Particle Hydrodynamics on GPUs, CGI (2007)
  • 14. 14 | Eurographics 2012 | Takahiro Harada RIGID BODY SIMULATION § Represent a rigid body with a set of particles § Rigid body collision = Particle collision Harada et al., Real-time Rigid Body Simulation on GPUs, GPU Gems3 (2007)
  • 15. 15 | Eurographics 2012 | Takahiro Harada COUPLING RIGID BODIES + FLUID
  • 16. 16 | Eurographics 2012 | Takahiro Harada CLOTH COLLISION DETECTION § Particle v.s. triangle mesh collision § Dual uniform grid –  1st grid: particles –  2nd grid: triangles Harada et al., Real-time Fluid Simulation Coupled with Cloth, TPCG (2007)
  • 17. 17 | Eurographics 2012 | Takahiro Harada SEVERAL PHYSICS PROBLEMS WERE SOLVED Particle Simulations Granular Materials Fluids Rigid Bodies Cloth Coupling
  • 18. 18 | Eurographics 2012 | Takahiro Harada PROBLEM SOLVED?? § Not yet § Uniform grid is not a perfect solution § More complicated collision detection is necessary –  E.g., rigid body simulation –  Broad-phase collision detection –  Narrow-phase collision detection
  • 19. 19 | Eurographics 2012 | Takahiro Harada BROAD-PHASE COLLISION DETECTION § Sweep & prune –  Sort end points –  Sweep sorted list § Optimizations –  Split sweep of a large object –  Workspace subdivision –  Cell subdivision Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, TOG(2010)
  • 20. 20 | Eurographics 2012 | Takahiro Harada NARROW-PHASE COLLISION DETECTION § Variety of work –  So many shape combinations -> divergence § Unified shape representation § Each collision computation is parallelizable
  • 21. 21 | Eurographics 2012 | Takahiro Harada MULTIPLE GPU COLLISION DETECTION
  • 22. 22 | Eurographics 2012 | Takahiro Harada MULTIPLE GPU PROGRAMMING
  • 23. 23 | Eurographics 2012 | Takahiro Harada Parallelization using Multiple GPUs • Two levels of parallelization • 1GPU • Multiple GPUs Memory MemoryMemoryMemory
  • 24. 24 | Eurographics 2012 | Takahiro Harada
  • 25. 25 | Eurographics 2012 | Takahiro Harada PARTICLE SIMULATION ON MULTIPLE GPUS
  • 26. 26 | Eurographics 2012 | Takahiro Harada
  • 27. 27 | Eurographics 2012 | Takahiro Harada
  • 28. 28 | Eurographics 2012 | Takahiro Harada Data Management • Not all the data have to be sent • Data required for the computation has to be sent • Two kinds of particles have to be sent to adjacent GPU 1. Escaped particles : Particles get out from adjacent subdomain (adjacent GPU will be responsible for these particles in the next time step) 2. Ghost particles in the ghost region
  • 29. 29 | Eurographics 2012 | Takahiro Harada
  • 30. 30 | Eurographics 2012 | Takahiro Harada
  • 31. 31 | Eurographics 2012 | Takahiro Harada Data Transfer between GPUs • No direct data transfer is provided on current hardwares • Transfer via main memory • The buffers equal to the number of connectors are allocated • Each GPU writes data to the defined location at the same time • Each GPU read data of the defined location at the same time GPU0 GPU1 GPU2 GPU3 Main Memory
  • 32. 32 | Eurographics 2012 | Takahiro Harada Computation of 1 Time Step GPU0 GPU1 GPU2 GPU3 Main Memory Compt Force GPU0 Compt Force GPU1 Compt Force GPU2 Compt Force GPU3 Update Update Update Update Prepare Send Prepare Send Prepare Send Prepare Send Send Send Send Send Synchronization Receive Receive Receive Receive Post Receive Post Receive Post Receive Post Receive
  • 33. 33 | Eurographics 2012 | Takahiro Harada Environment • 4GPU server (Simulation) + 1GPU (Rendering) • 1M particles • 6GPU server boxes (Simulation) + 1GPU (Rendering) @ GDC2008 • 3M particles GPU GPU GPU GPU GPU GPU GPU Simulation GPUs GPU GPU GPU GPU GPU Simulation GPUs
  • 34. 34 | Eurographics 2012 | Takahiro Harada MOVIE Harada et al., Massive Particles: Particle-based Simulations on Multiple GPUs, SIG Talk(2008)
  • 35. 35 | Eurographics 2012 | Takahiro Harada Results Number of Particles (103 particles) 0 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1000 1200 Total (1GPU) Sim (2GPU) Total (2GPU) Sim (4GPU) Total (4GPU) ComputationTime(ms)
  • 36. 36 | Eurographics 2012 | Takahiro Harada PROBLEM SOLVED?? § Not yet § Most of the problems had identical object size –  e.g., Particles § The reason is because of GPU architecture –  Not designed to solve non-uniform problem
  • 37. 37 | Eurographics 2012 | Takahiro Harada HETEROGENEOUS COLLISION DETECTION
  • 38. 38 | Eurographics 2012 | Takahiro Harada § Large number of particles § Particles with identical size –  Work granularity is almost the same –  Good for the wide SIMD architecture PARTICLE BASED SIMULATION ON THE GPU Harada et al. 2007
  • 39. 39 | Eurographics 2012 | Takahiro Harada MIXED PARTICLE SIMULATION § Not only small particles § Difficulty for GPUs –  Large particles interact with small particles –  Large-large collision
  • 40. 40 | Eurographics 2012 | Takahiro Harada CHALLENGE § Non uniform work granularity –  Small-small(SS) collision § Uniform, GPU –  Large-large(LL) collision § Non Uniform, CPU –  Large-small(LS) collision § Non Uniform, CPU
  • 41. 41 | Eurographics 2012 | Takahiro Harada FUSION ARCHITECTURE § CPU and GPU are: –  On the same die –  Much closer –  Efficient data sharing § CPU and GPU are good at different works –  CPU: serial computation, conditional branch –  GPU: parallel computation § Able to dispatch works to: –  Serial work with varying granularity → CPU –  Parallel work with the uniform granularity → GPU
  • 42. 42 | Eurographics 2012 | Takahiro Harada METHOD
  • 43. 43 | Eurographics 2012 | Takahiro Harada TWO SIMULATIONS § Small particles § Large particles Build Acc. Structure SS Collision S Integration Build Acc. Structure LL Collision L Integration LS Collision Position Velocity Force Grid Position Velocity Force
  • 44. 44 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles Uniform Work Non Uniform Work CLASSIFY BY WORK GRANULARITY Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Build Acc. Structure
  • 45. 45 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles GPU CPU CLASSIFY BY WORK GRANULARITY, ASSIGN PROCESSOR Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Build Acc. Structure
  • 46. 46 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § Grid, small particle data has to be shared with the CPU for LS collision –  Allocated as zero copy buffer GPU CPU DATA SHARING Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision Build Acc. Structure Position Velocity Grid Force LS Collision
  • 47. 47 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § Grid, small particle data has to be shared with the CPU for LS collision –  Allocated as zero copy buffer GPU CPU SYNCHRONIZATION Position Velocity Force Grid Position Velocity Force SS Collision S Integration L Integration LL Collision Position Velocity Grid Force Synchronization LS Collision Build Acc. Structure Build Acc. Structure Synchronization
  • 48. 48 | Eurographics 2012 | Takahiro Harada GPU CPU VISUALIZING WORKLOADS Build Acc. Structure SS Collision S Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Synchronization L Integration § Small particles § Large particles § Grid construction can be moved at the end of the pipeline –  Unbalanced workload
  • 49. 49 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § To get better load balancing –  The sync is for passing the force buffer filled by the CPU to the GPU –  Move the LL collision after the sync GPU CPU LOAD BALANCING Build Acc. Structure SS Collision S Integration Position Velocity Force Grid Position Velocity Force LL Collision Synchronization L Integration LS Collision
  • 50. 50 | Eurographics 2012 | Takahiro Harada MULTI THREADING (4 THREADS)
  • 51. 51 | Eurographics 2012 | Takahiro Harada OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION GPU Build Acc. Structure SS Collision S Integ. CPU0 CPU1 CPU2 LS Collision LS Collision LS Collision Synchronization MergeMergeMerge LL Coll. L Integ. Synchronization S Sorting S Sorting S Sorting Synchronization Harada, Heterogeneous Particle-Based Simulation, SIG ASIA Talk(2011)
  • 52. 52 | Eurographics 2012 | Takahiro Harada DEMO GPUWork CPUWork
  • 53. 53 | Eurographics 2012 | Takahiro Harada QUESTIONS?