SlideShare a Scribd company logo
1 of 53
Download to read offline
BULLET 3 OPENCL™ RIGID BODY SIMULATION
ERWIN COUMANS, AMD
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. OpenCL™ is a trademark of Apple Inc. Windows® and DirectX® are trademarks of Microsoft Corp. Linux is
a trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

2 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
AGENDA

Introduction, Particles, Rigid Bodies

GPU Collision Detection

GPU Constraint Solving

3 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
BULLET 2.82 AND BULLET 3 OPENCL™ ALPHA
 Real-time C++ collision detection and rigid body dynamics library
 Used in movies
‒ Maya, Houdini, Cinema 4D, Blender, Lightwave, Carrara, Posed 3D, thinking Particles, etc
‒ Disney Animation (Bolt), PDI Dreamworks (Shrek, How to train your dragon), Sony Imageworks (2012),

 Games
‒ GTA IV, Disney Toystory 3, Cars 2, Riptide GP, GP2

 Industrial applications, Robotics
‒ Siemens NX9 MCD, Gazebo

4 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
PARTICLES AND RIGID BODIES

 Position (Center of mass, float3)
 Orientation
‒ (Inertia basis frame, float4)

5 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
UPDATING THE TRANSFORM

 Linear velocity (float3)

 Angular velocity (float3)

6 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
C/C++ VERSUS OPENCL™
void integrateTransforms(Body* bodies, int numNodes, float timeStep)
{
for (int nodeID=0;nodeId<numNodes;nodeID++) {
if( bodies[nodeID].m_invMass != 0.f) {
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;
}
}
__kernel void integrateTransformsKernel( __global Body* bodies, int numNodes, float timeStep)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f)) {
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;
}
}

7 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

One to One mapping

Read Compute Write
OPENCL™ PARTICLES

8 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
UPDATE ORIENTATION
__kernel void integrateTransformsKernel( __global Body* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f))
{
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;
//linear velocity
bodies[nodeID].m_linVel += gravityAcceleration * timeStep;
//apply gravity
float4 angvel = bodies[nodeID].m_angVel;
//angular velocity
bodies[nodeID].m_angVel *= angularDamping;
//add some angular damping
float4 axis;
float fAngle = native_sqrt(dot(angvel, angvel));
if(fAngle*timeStep> BT_GPU_ANGULAR_MOTION_THRESHOLD)
//limit the angular motion
fAngle = BT_GPU_ANGULAR_MOTION_THRESHOLD / timeStep;
if(fAngle < 0.001f)
axis = angvel * (0.5f*timeStep-(timeStep*timeStep*timeStep)*0.020833333333f * fAngle * fAngle);
else
axis = angvel * ( native_sin(0.5f * fAngle * timeStep) / fAngle);
float4 dorn = axis;
dorn.w = native_cos(fAngle * timeStep * 0.5f);
float4 orn0 = bodies[nodeID].m_quat;
float4 predictedOrn = quatMult(dorn, orn0);
predictedOrn = quatNorm(predictedOrn);
bodies[nodeID].m_quat=predictedOrn;
//update the orientation
}
}

See opencl/gpu_rigidbody/kernels/integrateKernel.cl
9 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
UPDATE TRANSFORMS, HOST SETUP

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 0, sizeof(cl_mem), &bodies);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(int), &numBodies);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &deltaTime);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &angularDamping);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float4), &gravityAcceleration);
size_t workGroupSize = 64;
size_t
numWorkItems = workGroupSize*((m_numPhysicsInstances + (workGroupSize)) / workGroupSize);
if (workGroupSize>numWorkItems)
workGroupSize=numWorkItems;
ciErrNum = clEnqueueNDRangeKernel(g_cqCommandQue, g_integrateTransformsKernel, 1, NULL, &numWorkItems, &workGroupSize,0 ,0 ,0);

10 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
MOVING THE CODE TO GPU
 Create an OpenCL™ wrapper
‒ Easier use, fits code style, extra features, learn the API

 Replace C++ by C
 Move data to contiguous memory
 Replace pointers by indices
 Exploit the GPU hardware…

11 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
SHARING DATA STRUCTURES AND CODE BETWEEN OPENCL™ AND C/C++
#include "Bullet3Collision/NarrowPhaseCollision/shared/b3RigidBodyData.h"
#include "Bullet3Dynamics/shared/b3IntegrateTransforms.h"
__kernel void integrateTransformsKernel( __global b3RigidBodyData_t* bodies,const int numNodes, float timeStep, float angularDamping,
float4 gravityAcceleration)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes)
{
integrateSingleTransform(bodies,nodeID, timeStep, angularDamping,gravityAcceleration);
}

}

12 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
PREPROCESSING OF KERNELS WITH INCLUDES IN SINGLE HEADER FILE
 We want the option of embedding kernels in our C/C++ program
 Expand all #include files, recursively into a single stringified header file
‒ This header can be used in OpenCL™ kernels and in regular C/C++ files too
‒ Kernel binary is cached and cached version is unvalidated based on time stamp of embedded kernel file

 Premake, Lua and a lcpp: very small and simple C pre-processor written in Lua
‒ See https://github.com/willsteel/lcpp

13 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
HOST, DEVICE, KERNELS, WORK ITEMS
Host

Device (GPU)

CPU

L2 cache

Global Host
Memory

14 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

Global Device Memory
GPU Collision
Detection
RIGID BODY PIPELINE

Collision Data
Collision
shapes

Compute
world space
Object AABB

Object
AABB

Object
local space
BVH

Detect
pairs

Broad Phase
Collision Detection (CD)

Dynamics Data
Overlapping
pairs

Contact
points

Cull complex
shapes
local space

Mid Phase
CD

Start

16 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

Compute
contact
points

Narrow
Phase CD

time

Constraints
(contacts,
joints)

Setup
constraints

Mass
Inertia

Forces,
Gravity

Solve
constraints

Constraint
Solving

World
transforms
velocities

Integrate
position

Integration

End
BOUNDING VOLUMES AND DETECT PAIRS
X min
Y min
Z min
*

MAX (X,Y,Z)

X max
Y max
Z max
Object ID
MIN (X,Y,Z)

Output pairs
Object ID A
Object ID A
Object ID A
17 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

Object ID B
Object ID B
Object ID B
COMPUTE PAIRS BRUTE FORCE
__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,
__global int2* pairsOut, volatile __global int* pairCount,
int numObjects, int axis, int maxPairs)
{
int i = get_global_id(0);
if (i>=numObjects)
return;
for (int j=0;j<numObjects;j++)
{
if ( i != j && TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {
int2 myPair;
myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];
int curPair = atomic_inc (pairCount);
if (curPair<maxPairs)
pairsOut[curPair] = myPair; //flush to main memory
}
}

18 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

Scatter operation
DETECT PAIRS
 Uniform Grid
‒ Very fast
‒ Suitable for GPU
‒ Object size restrictions

0

1

2

F

C

E

5

D

B

A

8

3

10

7

11

 Can be mixed with other algorithms
12

13

 See bullet3srcBullet3OpenCLBroadphaseCollisionb3GpuGridBroadphase.cpp

19 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

14

15
UNIFORM GRID AND PARALLEL PRIMITIVES

 Radix Sort the particles based on their cell index
 Use a prefix scan to compute the cell size and offset

 Fast OpenCL™ and DirectX® 11 Direct Compute implementation
20 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
1 AXIS SORT, SWEEP AND PRUNE

 Find best sap axis
 Sort aabbs along this axis

 For each object, find and add overlapping pairs

21 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
COMPUTE PAIRS 1-AXIS SORT
__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,
__global int2* pairsOut, volatile __global int* pairCount,
int numObjects, int axis, int maxPairs)
{
int i = get_global_id(0);
if (i>=numObjects)
return;
for (int j=i+1;j<numObjects;j++)
{
if(aabbs[i].m_maxElems[axis] < (aabbs[j].m_minElems[axis]))
break;
if (TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {
int2 myPair;
myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];
int curPair = atomic_inc (pairCount);
if (curPair<maxPairs)
pairsOut[curPair] = myPair; //flush to main memory
}
}

22 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU MEMORY HIERARCHY

Private
Memory
(registers)

Shared Local Memory

Compute Unit

Shared Local Memory

Shared Local Memory

Global Device Memory

23 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
BARRIER

A point in the program where all threads stop and wait
When all threads in the Work Group have reached the barrier,
they can proceed

Barrier

24 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
KERNEL OPTIMIZATIONS FOR 1-AXIS SORT
CONTENT SUBHEADER

LOCAL MEMORY

block to fetch AABBs and re-use them within a workgroup (barrier)

AVOID GLOBAL
ATOMICS

Use private memory to accumulate overlapping pairs (append buffer)

LOCAL ATOMICS

Determine early exit condition for all work items within a workgroup

25 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
KERNEL OPTIMIZATIONS (1-AXIS SORT)
 Load balancing
‒ One work item per object, multiple work items for large objects

 See opencl/gpu_broadphase/kernels/sapFast.cl and sap.cl

(contains un-optimized and optimized version of the kernel for comparison)

26 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
SEQUENTIAL INCREMENTAL 3-AXIS SAP

27 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
PARALLEL INCREMENTAL 3-AXIS SAP

Parallel sort 3 axis
Keep old and new sorted axis
‒6 sorted axis in total

28 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
PARALLEL INCREMENTAL 3-AXIS SAP

Sorted x-axis old
Sorted x-axis new

 If begin or endpoint has same index do nothing
 Otherwise, range scan on old AND new axis
‒adding or removing pairs, similar to original SAP

 Read-only scan is embarrassingly parallel
29 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
HYBRID CPU/GPU PAIR SEARCH

0

1

2

F

C
D

12

E

5

B
A

8

13

3

Small Large
Small GPU either
Large either CPU

10

14

7

11

15

30 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
TRIANGLE MESH COLLISION DETECTION

31 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU BVH TRAVERSAL
 Create skip indices for
faster traversal
 Create subtrees that
fit in Local Memory
 Stream subtrees for
entire wavefront/warp

 Quantize Nodes
‒ 16 bytes/node

32 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
COMPOUND VERSUS COMPOUND COLLISION DETECTION

33 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
TREE VERSUS TREE: TANDEM TRAVERSAL
for (int p=0;p<numSubTreesA;p++) {
for (int q=0;q<numSubTreesB;q++) {
b3Int2 node0; node0.x = startNodeIndexA;node0.y = startNodeIndexB;
nodeStack[depth++]=node0; depth = 1;
do {
b3Int2 node = nodeStack[--depth];
if (nodeOverlap){
if(isInternalA && isInternalB){
nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBleftChild);
nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBleftChild);
nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBrightChild);
nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBrightChild);
} else {
if (isLeafA && isLeafB) processLeaf(…)
else { …} //see actual code
}
} while (depth);

 See __kernel void findCompoundPairsKernel( __global const int4* pairs … in
‒ in bullet3srcBullet3OpenCLNarrowphaseCollisionkernels/sat.cl
34 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
CONTACT GENERATION: GPU CONVEX HEIGHTFIELD
 Dual representation

 SATHE, R. 2006. Collision detection shader using cube-maps. In ShaderX5, Charles River Media

35 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
SEPARATING AXIS TEST
 Face normal A

 Face normal B
 Edge-edge normal
plane

A

B

axis

 Uniform work suits GPU very well: one work unit processes all SAT tests for one pair
 Precise solution and faster than height field approximation for low-resolution convex shapes
 See opencl/gpu_sat/kernels/sat.cl
36 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
COMPUTING CONTACT POSITIONS
clipping planes
 Given the separating normal find incident face
 Clip incident face using Sutherland Hodgman clipping

incident

n
n

reference face

 One work unit performs clipping for one pair, reduces contacts and appends to contact buffer
 See opencl/gpu_sat/kernels/satClipHullContacts.cl
37 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
SAT ON GPU
 Break the algorithm into pipeline stages, separated into many kernels
‒ findSeparatingAxisKernel
‒ findClippingFacesKernel
‒ clipFacesKernel
‒ contactReductionKernel

 Concave and compound cases produce even more stages
‒ bvhTraversalKernel,findConcaveSeparatingAxisKernel,findCompoundPairsKernel,processCompoundPairsPrimitiv
esKernel,processCompoundPairsKernel,findConcaveSphereContactsKernel,clipHullHullConcaveConvexKernel

38 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU CONTACT REDUCTION

 See newContactReductionKernel in opencl/gpu_sat/kernels/satClipHullContacts.cl
39 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU Constraint
Solving
REORDERING CONSTRAINTS REVISITED

B

1

A

B

1

1
2

C

A

4

D

A

B

C

D

Batch 0

1

1

3

3

Batch 1

4

2

2

4

2
3

4

D

3

4

41 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
CPU SEQUENTIAL BATCH CREATION
while( nIdxSrc ) {
nIdxDst = 0;
int nCurrentBatch = 0;
for(int i=0; i<N_FLG/32; i++) flg[i] = 0; //clear flag
for(int i=0; i<nIdxSrc; i++)
{
int idx = idxSrc[i]; btAssert( idx < n );
//check if it can go
int aIdx = cs[idx].m_bodyAPtr & FLG_MASK;
int bIdx = cs[idx].m_bodyBPtr & FLG_MASK;
u32 aUnavailable = flg[ aIdx/32 ] & (1<<(aIdx&31));u32 bUnavailable = flg[ bIdx/32 ] & (1<<(bIdx&31));
if( aUnavailable==0 && bUnavailable==0 )
{
flg[ aIdx/32 ] |= (1<<(aIdx&31));
flg[ bIdx/32 ] |= (1<<(bIdx&31));
cs[idx].getBatchIdx() = batchIdx;
sortData[idx].m_key = batchIdx; sortData[idx].m_value = idx;
nCurrentBatch++;
if( nCurrentBatch == simdWidth ) {
nCurrentBatch = 0;
for(int i=0; i<N_FLG/32; i++) flg[i] = 0;
}
}
else {
idxDst[nIdxDst++] = idx;
}
}
swap2( idxSrc, idxDst ); swap2( nIdxSrc, nIdxDst );
batchIdx ++;
}
42 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU ITERATIVE BATCHING

D

1

4

A

For each unassigned constraint

B

B

C

D

unused

For each batch

A

unused

unused

unused

Try to reserve bodies
1

1

A

B

Batch 0 1

1

 Before locking attempt, first check if bodies are already used in previous iterations
 See “A parallel constraint solver for a rigid body simulation”, Takahiro Harada,
http://dl.acm.org/citation.cfm?id=2077378.2077406
43 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

3

Append constraint to batch

 Parallel threads in workgroup (same SIMD) use local atomics to lock rigid bodies

and openclgpu_rigidbodykernelsbatchingKernels.cl

2

C

D
GPU PARALLEL TWO STAGE BATCH CREATION

 Cell size > maximum dynamic object size
 Constraint are assigned to a cell
‒ based on the center-of-mass location of the first active rigid body of the pair-wise constraint

 Non-neighboring cells can be processed in parallel

44 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
MASS SPLITTING+JACOBI ~= PGS
2

3

B
1

D
4
A

A

B0

B1

C0

C1

D1

D1

A

1

1

2

2

3

3

4

4

C

B

B1

C0

C0

B0

Averaging velocities

D

C1

Parallel Jacobi

C1

 See “Mass Splitting for Jitter-Free Parallel Rigid Body Simulation” by Tonge et. al.
45 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU NON-CONTACT CONSTRAINTS, JOINTS

46 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
GPU NON-CONTACT CONSTRAINTS, JOINTS
__kernel void getInfo1Kernel(__global unsigned int* infos, __global b3GpuGenericConstraint* constraints, int numConstraints)
__kernel void getInfo2Kernel(__global b3SolverConstraint* solverConstraintRows, ..
switch (constraint->m_constraintType)
{
case B3_GPU_POINT2POINT_CONSTRAINT_TYPE:
case B3_GPU_FIXED_CONSTRAINT_TYPE:
}

 getInfo1Kernel and getInfo2Kernel with switch statement replaces virtual methods in Bullet 2.x
 See bullet3srcBullet3OpenCLRigidBodykernelsjointSolver.cl

47 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
DETERMINISTIC RESULTS
 Projected Gauss Seidel requires solving rows in the same order
 Sort the constraint rows (contacts, joints)

 Solve constraint batches in the same order

48 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
DYNAMICA PLUGIN FOR MAYA WITH OPENCL™

49 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
AMD CODEXL OPENCL™ DEBUGGER AND PROFILER

50 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
STACKING TEST

51 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
FUTURE WORK
 DirectX®11 DirectCompute port
 Multi GPU, multi-core, MPI

 Move over Bullet 2 to Bullet 3, hybrid of CPU and GPU
‒ Featherstone, direct solvers on CPU

 Cloth and Fluid simulation, TressFX hair, with two-way interaction
 Extend GPU-PGS solver to GPU-NNCG
‒ Non-smooth non-linear conjugate gradient solver

 Improve GPU Ray intersection tests

52 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
THANK YOU!
Visit http://bulletphysics.org for more information. All source code is available:
 http://github.com/erwincoumans/bullet3
‒ Lets you fork, report issues and request features

 Windows®, Linux®, Mac OSX
 AMD and NVIDIA GPU
‒ Preferably high-end desktop GPU

53 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL

More Related Content

What's hot

OOP in C++
OOP in C++OOP in C++
OOP in C++ppd1961
 
UE4 Garbage Collection
UE4 Garbage CollectionUE4 Garbage Collection
UE4 Garbage CollectionQooJuice
 
C++ 프로젝트에 단위 테스트 도입하기
C++ 프로젝트에 단위 테스트 도입하기C++ 프로젝트에 단위 테스트 도입하기
C++ 프로젝트에 단위 테스트 도입하기OnGameServer
 
Android Multimedia Player Project Presentation
Android Multimedia Player Project PresentationAndroid Multimedia Player Project Presentation
Android Multimedia Player Project PresentationRashmi Gupta
 
Level Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeLevel Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeElectronic Arts / DICE
 
Multimedia on android
Multimedia on androidMultimedia on android
Multimedia on androidRamesh Prasad
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Intel® Software
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
Getting started with High-Definition Render Pipeline for games- Unite Copenha...
Getting started with High-Definition Render Pipeline for games- Unite Copenha...Getting started with High-Definition Render Pipeline for games- Unite Copenha...
Getting started with High-Definition Render Pipeline for games- Unite Copenha...Unity Technologies
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Android summer training report
Android summer training reportAndroid summer training report
Android summer training reportShashendra Singh
 
Location-Based Services on Android
Location-Based Services on AndroidLocation-Based Services on Android
Location-Based Services on AndroidJomar Tigcal
 
Reusing your existing software on Android
Reusing your existing software on AndroidReusing your existing software on Android
Reusing your existing software on AndroidTetsuyuki Kobayashi
 
Android presantation
Android presantationAndroid presantation
Android presantationUdayJethva
 
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규ChangKyu Song
 
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019Unity Technologies
 

What's hot (20)

OOP in C++
OOP in C++OOP in C++
OOP in C++
 
UE4 Garbage Collection
UE4 Garbage CollectionUE4 Garbage Collection
UE4 Garbage Collection
 
C++ 프로젝트에 단위 테스트 도입하기
C++ 프로젝트에 단위 테스트 도입하기C++ 프로젝트에 단위 테스트 도입하기
C++ 프로젝트에 단위 테스트 도입하기
 
Android Multimedia Player Project Presentation
Android Multimedia Player Project PresentationAndroid Multimedia Player Project Presentation
Android Multimedia Player Project Presentation
 
Level Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeLevel Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's Edge
 
Multimedia on android
Multimedia on androidMultimedia on android
Multimedia on android
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
Object detection
Object detectionObject detection
Object detection
 
Getting started with High-Definition Render Pipeline for games- Unite Copenha...
Getting started with High-Definition Render Pipeline for games- Unite Copenha...Getting started with High-Definition Render Pipeline for games- Unite Copenha...
Getting started with High-Definition Render Pipeline for games- Unite Copenha...
 
Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Android Camera
Android CameraAndroid Camera
Android Camera
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Android summer training report
Android summer training reportAndroid summer training report
Android summer training report
 
Location-Based Services on Android
Location-Based Services on AndroidLocation-Based Services on Android
Location-Based Services on Android
 
Reusing your existing software on Android
Reusing your existing software on AndroidReusing your existing software on Android
Reusing your existing software on Android
 
Android presantation
Android presantationAndroid presantation
Android presantation
 
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규
[NDC13] 70명이 커밋하는 라이브 개발하기 (해외 진출 라이브 프로젝트의 개발 관리) - 송창규
 
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
 

Similar to GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
10 ways to make your code rock
10 ways to make your code rock10 ways to make your code rock
10 ways to make your code rockmartincronje
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
CUDA by Example : Graphics Interoperability : Notes
CUDA by Example : Graphics Interoperability : NotesCUDA by Example : Graphics Interoperability : Notes
CUDA by Example : Graphics Interoperability : NotesSubhajit Sahu
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
How to write clean & testable code without losing your mind
How to write clean & testable code without losing your mindHow to write clean & testable code without losing your mind
How to write clean & testable code without losing your mindAndreas Czakaj
 
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NC
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NCAndroid Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NC
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NCJim Tochterman
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseShuai Yuan
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Divekrasul
 
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)#MBLTdev: Разработка первоклассных SDK для Android (Twitter)
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)e-Legion
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core ModuleKatie Gulley
 

Similar to GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans (20)

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
10 ways to make your code rock
10 ways to make your code rock10 ways to make your code rock
10 ways to make your code rock
 
Deep Learning Edge
Deep Learning Edge Deep Learning Edge
Deep Learning Edge
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
CUDA by Example : Graphics Interoperability : Notes
CUDA by Example : Graphics Interoperability : NotesCUDA by Example : Graphics Interoperability : Notes
CUDA by Example : Graphics Interoperability : Notes
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Games 3 dl4-example
Games 3 dl4-exampleGames 3 dl4-example
Games 3 dl4-example
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
How to write clean & testable code without losing your mind
How to write clean & testable code without losing your mindHow to write clean & testable code without losing your mind
How to write clean & testable code without losing your mind
 
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NC
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NCAndroid Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NC
Android Development w/ ArcGIS Server - Esri Dev Meetup - Charlotte, NC
 
COLLADA & WebGL
COLLADA & WebGLCOLLADA & WebGL
COLLADA & WebGL
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Dive
 
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)#MBLTdev: Разработка первоклассных SDK для Android (Twitter)
#MBLTdev: Разработка первоклассных SDK для Android (Twitter)
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

  • 1. BULLET 3 OPENCL™ RIGID BODY SIMULATION ERWIN COUMANS, AMD
  • 2. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL™ is a trademark of Apple Inc. Windows® and DirectX® are trademarks of Microsoft Corp. Linux is a trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners. 2 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 3. AGENDA Introduction, Particles, Rigid Bodies GPU Collision Detection GPU Constraint Solving 3 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 4. BULLET 2.82 AND BULLET 3 OPENCL™ ALPHA  Real-time C++ collision detection and rigid body dynamics library  Used in movies ‒ Maya, Houdini, Cinema 4D, Blender, Lightwave, Carrara, Posed 3D, thinking Particles, etc ‒ Disney Animation (Bolt), PDI Dreamworks (Shrek, How to train your dragon), Sony Imageworks (2012),  Games ‒ GTA IV, Disney Toystory 3, Cars 2, Riptide GP, GP2  Industrial applications, Robotics ‒ Siemens NX9 MCD, Gazebo 4 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 5. PARTICLES AND RIGID BODIES  Position (Center of mass, float3)  Orientation ‒ (Inertia basis frame, float4) 5 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 6. UPDATING THE TRANSFORM  Linear velocity (float3)  Angular velocity (float3) 6 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 7. C/C++ VERSUS OPENCL™ void integrateTransforms(Body* bodies, int numNodes, float timeStep) { for (int nodeID=0;nodeId<numNodes;nodeID++) { if( bodies[nodeID].m_invMass != 0.f) { bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep; } } __kernel void integrateTransformsKernel( __global Body* bodies, int numNodes, float timeStep) { int nodeID = get_global_id(0); if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f)) { bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep; } } 7 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL One to One mapping Read Compute Write
  • 8. OPENCL™ PARTICLES 8 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 9. UPDATE ORIENTATION __kernel void integrateTransformsKernel( __global Body* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration) { int nodeID = get_global_id(0); if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f)) { bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep; //linear velocity bodies[nodeID].m_linVel += gravityAcceleration * timeStep; //apply gravity float4 angvel = bodies[nodeID].m_angVel; //angular velocity bodies[nodeID].m_angVel *= angularDamping; //add some angular damping float4 axis; float fAngle = native_sqrt(dot(angvel, angvel)); if(fAngle*timeStep> BT_GPU_ANGULAR_MOTION_THRESHOLD) //limit the angular motion fAngle = BT_GPU_ANGULAR_MOTION_THRESHOLD / timeStep; if(fAngle < 0.001f) axis = angvel * (0.5f*timeStep-(timeStep*timeStep*timeStep)*0.020833333333f * fAngle * fAngle); else axis = angvel * ( native_sin(0.5f * fAngle * timeStep) / fAngle); float4 dorn = axis; dorn.w = native_cos(fAngle * timeStep * 0.5f); float4 orn0 = bodies[nodeID].m_quat; float4 predictedOrn = quatMult(dorn, orn0); predictedOrn = quatNorm(predictedOrn); bodies[nodeID].m_quat=predictedOrn; //update the orientation } } See opencl/gpu_rigidbody/kernels/integrateKernel.cl 9 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 10. UPDATE TRANSFORMS, HOST SETUP ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 0, sizeof(cl_mem), &bodies); ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(int), &numBodies); ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &deltaTime); ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &angularDamping); ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float4), &gravityAcceleration); size_t workGroupSize = 64; size_t numWorkItems = workGroupSize*((m_numPhysicsInstances + (workGroupSize)) / workGroupSize); if (workGroupSize>numWorkItems) workGroupSize=numWorkItems; ciErrNum = clEnqueueNDRangeKernel(g_cqCommandQue, g_integrateTransformsKernel, 1, NULL, &numWorkItems, &workGroupSize,0 ,0 ,0); 10 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 11. MOVING THE CODE TO GPU  Create an OpenCL™ wrapper ‒ Easier use, fits code style, extra features, learn the API  Replace C++ by C  Move data to contiguous memory  Replace pointers by indices  Exploit the GPU hardware… 11 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 12. SHARING DATA STRUCTURES AND CODE BETWEEN OPENCL™ AND C/C++ #include "Bullet3Collision/NarrowPhaseCollision/shared/b3RigidBodyData.h" #include "Bullet3Dynamics/shared/b3IntegrateTransforms.h" __kernel void integrateTransformsKernel( __global b3RigidBodyData_t* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration) { int nodeID = get_global_id(0); if( nodeID < numNodes) { integrateSingleTransform(bodies,nodeID, timeStep, angularDamping,gravityAcceleration); } } 12 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 13. PREPROCESSING OF KERNELS WITH INCLUDES IN SINGLE HEADER FILE  We want the option of embedding kernels in our C/C++ program  Expand all #include files, recursively into a single stringified header file ‒ This header can be used in OpenCL™ kernels and in regular C/C++ files too ‒ Kernel binary is cached and cached version is unvalidated based on time stamp of embedded kernel file  Premake, Lua and a lcpp: very small and simple C pre-processor written in Lua ‒ See https://github.com/willsteel/lcpp 13 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 14. HOST, DEVICE, KERNELS, WORK ITEMS Host Device (GPU) CPU L2 cache Global Host Memory 14 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL Global Device Memory
  • 16. RIGID BODY PIPELINE Collision Data Collision shapes Compute world space Object AABB Object AABB Object local space BVH Detect pairs Broad Phase Collision Detection (CD) Dynamics Data Overlapping pairs Contact points Cull complex shapes local space Mid Phase CD Start 16 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL Compute contact points Narrow Phase CD time Constraints (contacts, joints) Setup constraints Mass Inertia Forces, Gravity Solve constraints Constraint Solving World transforms velocities Integrate position Integration End
  • 17. BOUNDING VOLUMES AND DETECT PAIRS X min Y min Z min * MAX (X,Y,Z) X max Y max Z max Object ID MIN (X,Y,Z) Output pairs Object ID A Object ID A Object ID A 17 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL Object ID B Object ID B Object ID B
  • 18. COMPUTE PAIRS BRUTE FORCE __kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs, __global int2* pairsOut, volatile __global int* pairCount, int numObjects, int axis, int maxPairs) { int i = get_global_id(0); if (i>=numObjects) return; for (int j=0;j<numObjects;j++) { if ( i != j && TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) { int2 myPair; myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3]; int curPair = atomic_inc (pairCount); if (curPair<maxPairs) pairsOut[curPair] = myPair; //flush to main memory } } 18 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL Scatter operation
  • 19. DETECT PAIRS  Uniform Grid ‒ Very fast ‒ Suitable for GPU ‒ Object size restrictions 0 1 2 F C E 5 D B A 8 3 10 7 11  Can be mixed with other algorithms 12 13  See bullet3srcBullet3OpenCLBroadphaseCollisionb3GpuGridBroadphase.cpp 19 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL 14 15
  • 20. UNIFORM GRID AND PARALLEL PRIMITIVES  Radix Sort the particles based on their cell index  Use a prefix scan to compute the cell size and offset  Fast OpenCL™ and DirectX® 11 Direct Compute implementation 20 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 21. 1 AXIS SORT, SWEEP AND PRUNE  Find best sap axis  Sort aabbs along this axis  For each object, find and add overlapping pairs 21 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 22. COMPUTE PAIRS 1-AXIS SORT __kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs, __global int2* pairsOut, volatile __global int* pairCount, int numObjects, int axis, int maxPairs) { int i = get_global_id(0); if (i>=numObjects) return; for (int j=i+1;j<numObjects;j++) { if(aabbs[i].m_maxElems[axis] < (aabbs[j].m_minElems[axis])) break; if (TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) { int2 myPair; myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3]; int curPair = atomic_inc (pairCount); if (curPair<maxPairs) pairsOut[curPair] = myPair; //flush to main memory } } 22 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 23. GPU MEMORY HIERARCHY Private Memory (registers) Shared Local Memory Compute Unit Shared Local Memory Shared Local Memory Global Device Memory 23 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 24. BARRIER A point in the program where all threads stop and wait When all threads in the Work Group have reached the barrier, they can proceed Barrier 24 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 25. KERNEL OPTIMIZATIONS FOR 1-AXIS SORT CONTENT SUBHEADER LOCAL MEMORY block to fetch AABBs and re-use them within a workgroup (barrier) AVOID GLOBAL ATOMICS Use private memory to accumulate overlapping pairs (append buffer) LOCAL ATOMICS Determine early exit condition for all work items within a workgroup 25 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 26. KERNEL OPTIMIZATIONS (1-AXIS SORT)  Load balancing ‒ One work item per object, multiple work items for large objects  See opencl/gpu_broadphase/kernels/sapFast.cl and sap.cl (contains un-optimized and optimized version of the kernel for comparison) 26 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 27. SEQUENTIAL INCREMENTAL 3-AXIS SAP 27 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 28. PARALLEL INCREMENTAL 3-AXIS SAP Parallel sort 3 axis Keep old and new sorted axis ‒6 sorted axis in total 28 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 29. PARALLEL INCREMENTAL 3-AXIS SAP Sorted x-axis old Sorted x-axis new  If begin or endpoint has same index do nothing  Otherwise, range scan on old AND new axis ‒adding or removing pairs, similar to original SAP  Read-only scan is embarrassingly parallel 29 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 30. HYBRID CPU/GPU PAIR SEARCH 0 1 2 F C D 12 E 5 B A 8 13 3 Small Large Small GPU either Large either CPU 10 14 7 11 15 30 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 31. TRIANGLE MESH COLLISION DETECTION 31 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 32. GPU BVH TRAVERSAL  Create skip indices for faster traversal  Create subtrees that fit in Local Memory  Stream subtrees for entire wavefront/warp  Quantize Nodes ‒ 16 bytes/node 32 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 33. COMPOUND VERSUS COMPOUND COLLISION DETECTION 33 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 34. TREE VERSUS TREE: TANDEM TRAVERSAL for (int p=0;p<numSubTreesA;p++) { for (int q=0;q<numSubTreesB;q++) { b3Int2 node0; node0.x = startNodeIndexA;node0.y = startNodeIndexB; nodeStack[depth++]=node0; depth = 1; do { b3Int2 node = nodeStack[--depth]; if (nodeOverlap){ if(isInternalA && isInternalB){ nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBleftChild); nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBleftChild); nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBrightChild); nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBrightChild); } else { if (isLeafA && isLeafB) processLeaf(…) else { …} //see actual code } } while (depth);  See __kernel void findCompoundPairsKernel( __global const int4* pairs … in ‒ in bullet3srcBullet3OpenCLNarrowphaseCollisionkernels/sat.cl 34 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 35. CONTACT GENERATION: GPU CONVEX HEIGHTFIELD  Dual representation  SATHE, R. 2006. Collision detection shader using cube-maps. In ShaderX5, Charles River Media 35 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 36. SEPARATING AXIS TEST  Face normal A  Face normal B  Edge-edge normal plane A B axis  Uniform work suits GPU very well: one work unit processes all SAT tests for one pair  Precise solution and faster than height field approximation for low-resolution convex shapes  See opencl/gpu_sat/kernels/sat.cl 36 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 37. COMPUTING CONTACT POSITIONS clipping planes  Given the separating normal find incident face  Clip incident face using Sutherland Hodgman clipping incident n n reference face  One work unit performs clipping for one pair, reduces contacts and appends to contact buffer  See opencl/gpu_sat/kernels/satClipHullContacts.cl 37 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 38. SAT ON GPU  Break the algorithm into pipeline stages, separated into many kernels ‒ findSeparatingAxisKernel ‒ findClippingFacesKernel ‒ clipFacesKernel ‒ contactReductionKernel  Concave and compound cases produce even more stages ‒ bvhTraversalKernel,findConcaveSeparatingAxisKernel,findCompoundPairsKernel,processCompoundPairsPrimitiv esKernel,processCompoundPairsKernel,findConcaveSphereContactsKernel,clipHullHullConcaveConvexKernel 38 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 39. GPU CONTACT REDUCTION  See newContactReductionKernel in opencl/gpu_sat/kernels/satClipHullContacts.cl 39 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 41. REORDERING CONSTRAINTS REVISITED B 1 A B 1 1 2 C A 4 D A B C D Batch 0 1 1 3 3 Batch 1 4 2 2 4 2 3 4 D 3 4 41 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 42. CPU SEQUENTIAL BATCH CREATION while( nIdxSrc ) { nIdxDst = 0; int nCurrentBatch = 0; for(int i=0; i<N_FLG/32; i++) flg[i] = 0; //clear flag for(int i=0; i<nIdxSrc; i++) { int idx = idxSrc[i]; btAssert( idx < n ); //check if it can go int aIdx = cs[idx].m_bodyAPtr & FLG_MASK; int bIdx = cs[idx].m_bodyBPtr & FLG_MASK; u32 aUnavailable = flg[ aIdx/32 ] & (1<<(aIdx&31));u32 bUnavailable = flg[ bIdx/32 ] & (1<<(bIdx&31)); if( aUnavailable==0 && bUnavailable==0 ) { flg[ aIdx/32 ] |= (1<<(aIdx&31)); flg[ bIdx/32 ] |= (1<<(bIdx&31)); cs[idx].getBatchIdx() = batchIdx; sortData[idx].m_key = batchIdx; sortData[idx].m_value = idx; nCurrentBatch++; if( nCurrentBatch == simdWidth ) { nCurrentBatch = 0; for(int i=0; i<N_FLG/32; i++) flg[i] = 0; } } else { idxDst[nIdxDst++] = idx; } } swap2( idxSrc, idxDst ); swap2( nIdxSrc, nIdxDst ); batchIdx ++; } 42 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 43. GPU ITERATIVE BATCHING D 1 4 A For each unassigned constraint B B C D unused For each batch A unused unused unused Try to reserve bodies 1 1 A B Batch 0 1 1  Before locking attempt, first check if bodies are already used in previous iterations  See “A parallel constraint solver for a rigid body simulation”, Takahiro Harada, http://dl.acm.org/citation.cfm?id=2077378.2077406 43 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL 3 Append constraint to batch  Parallel threads in workgroup (same SIMD) use local atomics to lock rigid bodies and openclgpu_rigidbodykernelsbatchingKernels.cl 2 C D
  • 44. GPU PARALLEL TWO STAGE BATCH CREATION  Cell size > maximum dynamic object size  Constraint are assigned to a cell ‒ based on the center-of-mass location of the first active rigid body of the pair-wise constraint  Non-neighboring cells can be processed in parallel 44 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 45. MASS SPLITTING+JACOBI ~= PGS 2 3 B 1 D 4 A A B0 B1 C0 C1 D1 D1 A 1 1 2 2 3 3 4 4 C B B1 C0 C0 B0 Averaging velocities D C1 Parallel Jacobi C1  See “Mass Splitting for Jitter-Free Parallel Rigid Body Simulation” by Tonge et. al. 45 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 46. GPU NON-CONTACT CONSTRAINTS, JOINTS 46 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 47. GPU NON-CONTACT CONSTRAINTS, JOINTS __kernel void getInfo1Kernel(__global unsigned int* infos, __global b3GpuGenericConstraint* constraints, int numConstraints) __kernel void getInfo2Kernel(__global b3SolverConstraint* solverConstraintRows, .. switch (constraint->m_constraintType) { case B3_GPU_POINT2POINT_CONSTRAINT_TYPE: case B3_GPU_FIXED_CONSTRAINT_TYPE: }  getInfo1Kernel and getInfo2Kernel with switch statement replaces virtual methods in Bullet 2.x  See bullet3srcBullet3OpenCLRigidBodykernelsjointSolver.cl 47 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 48. DETERMINISTIC RESULTS  Projected Gauss Seidel requires solving rows in the same order  Sort the constraint rows (contacts, joints)  Solve constraint batches in the same order 48 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 49. DYNAMICA PLUGIN FOR MAYA WITH OPENCL™ 49 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 50. AMD CODEXL OPENCL™ DEBUGGER AND PROFILER 50 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 51. STACKING TEST 51 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 52. FUTURE WORK  DirectX®11 DirectCompute port  Multi GPU, multi-core, MPI  Move over Bullet 2 to Bullet 3, hybrid of CPU and GPU ‒ Featherstone, direct solvers on CPU  Cloth and Fluid simulation, TressFX hair, with two-way interaction  Extend GPU-PGS solver to GPU-NNCG ‒ Non-smooth non-linear conjugate gradient solver  Improve GPU Ray intersection tests 52 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL
  • 53. THANK YOU! Visit http://bulletphysics.org for more information. All source code is available:  http://github.com/erwincoumans/bullet3 ‒ Lets you fork, report issues and request features  Windows®, Linux®, Mac OSX  AMD and NVIDIA GPU ‒ Preferably high-end desktop GPU 53 | BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL