2. ABOUT
OpenCL™
OpenCL™
is
FUN!
! Parallel
compute
programming
language
! Exposes
the
massively
mulPthreaded
GPU
! A
lot
of
horsepower,
opPmized
for
parallel
compuPng
! Order-‐of-‐magnitude
performance
improvement!
2
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
3. OpenCL™
DEBUGGING
AND
PROFILING
CHALLENGES
However,
! Debugging
and
profiling
parallel
processing
applicaPons
is
hard
! On-‐Pme
delivery
of
robust
(bug-‐free)
OpenCL™
applicaPons
is
challenging
! It
is
almost
impossible
to
opPmize
an
OpenCL™-‐based
applicaPon
to
fully
uPlize
the
available
parallel
processing
system
resources
3
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
4. OpenCL™
DEBUGGING
AND
PROFILING
CHALLENGES
OpenCL™
is
a
“Black
Box”
! The
applicaPon
enqueues
OpenCL™
commands
!
The
OpenCL™
runPme
executes
the
commands
ApplicaPon
! Using
a
host
profiler
and
debugger,
the
developer
cannot
‒ Debug
and
profile
the
OpenCL™
kernels
‒ See
the
execuPon
details
‒ View
runPme
loads
OpenCL™
4
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
5. AMD
CodeXL
! APU
and
GPU
Debugging
FuncPonality
‒ OpenCL™
and
OpenGL
API-‐Level
‒ OpenCL™
Kernel
Source
Code
! APU,
CPU
and
GPU
Profiling
! OpenCL™
StaPc
Kernel
Analysis
! Provides
the
informaPon
a
developer
needs
to
help
find
bugs
and
opPmize
the
applicaPon’s
performance
! Integrated
into
Microsoa®
Visual
Studio®
! Standalone
applicaPon
for
Windows®
and
Linux®
5
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
7. GPU
DEBUGGING
WITH
AMD
CodeXL
|
DEMO
AMDTTEAPOT
! Sample
provided
with
CodeXL
tools
suite
‒ API-‐Level
debugging
‒ PinpoinPng
OpenCL
™
Errors
‒ Entering
Kernel
debugging
‒ Locals
and
Watch
views
‒ Kernel
Source
breakpoints
‒ Finding
problemaPc
work
items
‒ OpenCL
™
Kernel
MulPwatch
view
7
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
8. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
API
CALLS
HISTORY
VIEW
! Displays
OpenCL
™
and
OpenGL
API
calls
‒ Supports
funcPon
calls
from
OpenCL™
up
to
version
1.2
and
OpenGL
up
to
version
4.3
‒ FuncPon
parameters
‒ Object
links
in
properPes
‒ API
calls
are
divided
per
Compute
/
Render
context.
‒ Calls
history
recording
to
an
HTML
log
file
8
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
9. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
CODEXL
EXPLORER
! Displays
OpenCL™
and
OpenGL
allocated
objects
calls
‒ Object
Hierarchy
and
counts
‒ Object
properPes
‒ For
objects
with
data
/
sources
-‐
double
click
to
open
a
main
view
‒ Display
detected
memory
leaks
if
"Break
on
Memory
Leaks"
is
selected.
9
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
10. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
SOURCE
AND
CALL
STACK
VIEWS
! Displays
host
code,
OpenCL™
kernel
source,
and
OpenGL
shader
source
‒ Set
source-‐level
breakpoints
in
OpenCL™
kernels
‒ Display
host
thread
and
OpenCL™
kernel
wavefront
call
stacks
‒ Visual
Studio®
integraPon
10
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
11. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
OBJECT
VIEWS
! Displays
image,
buffer
and
texture
data
‒ Image
view
for
OpenCL™
images
and
OpenGL
textures
and
render
buffers
‒ 3D
image
support
with
layer
selecPon
slider
‒ Non-‐RGB
images
mapped
to
grayscale
range,
with
selecPon
of
minimum
and
maximum
values
clearly
displaying
out-‐of-‐range
values
‒ Data
view
for
all
objects
‒ Channel
order
/
type
selecPon
for
buffer
data
‒ ConnecPon
to
image
view
for
objects
that
support
it
11
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
12. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
LOCALS
AND
WATCH
VIEWS
! Display
OpenCL™
kernel
variables
‒ Structure
and
vector
types
support
‒ Global
and
Private
memory
array
dereferencing
‒ Local
and
Constant
memory
support
planned
for
future
releases
‒ Visual
Studio®
integraPon
12
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
13. GPU
DEBUGGING
WITH
AMD
CodeXL
|
VIEWS
MULTIWATCH
VIEWS
! Display
a
single
OpenCL™
kernel
variable
value
across
the
current
work
items
‒ Image
and
Data
visualizaPon
‒ Range
slider,
like
Object
image
view
‒ Current
work
item
is
highlighted
and
can
be
changed
by
double-‐clicking
the
data
view.
13
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
14. GPU
DEBUGGING
WITH
AMD
CodeXL
|
FEATURES
NEW
IN
CODEXL
1.3
! Remote
debugging
‒ Debug
capabiliPes
on
a
remote
machine
‒ API-‐level
debugging
‒ Kernel
debugging
‒ Requires
a
CodeXL
agent
running
on
the
target
machine
‒ The
agent
is
included
as
an
opPon
in
the
CodeXL
installer
‒ Same
agent
for
remote
GPU
debugging
and
remote
GPU
profiling
‒ Currently
only
supports
Windows-‐to-‐Windows
and
Linux-‐to-‐Linux
debugging
! OpenCL™
API
support
increased
up
to
OpenCL™
1.2
‒ New
API
funcPons
‒ New
deprecated
funcPons
and
behaviors
! OpenGL
API
support
increased
up
to
OpenGL
4.3
‒ New
API
funcPons
and
tokens
‒ New
shader
types
14
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
15. GPU
DEBUGGING
WITH
AMD
CodeXL
|
FEATURES
UPCOMING
RELEASES
! Hardware-‐based
kernel
debugging
‒ Current
implementaPon
retrieves
hardware
values
but
performs
kernel
playback
for
breakpoint
implementaPon
‒ Display
data
for
the
enPre
grid
‒ OpPmized
for
small-‐
and
medium-‐sized
kernels
‒ Does
not
support
debugging
kernels
that
can't
be
replayed
consistently
(such
as
kernels
using
atomics)
‒ New
implementaPon
will
use
hardware
breakpoints
‒ Display
data
according
to
the
wavefronts
executed
in
the
actual
hardware
‒ Faster
for
large
kernels
‒ Stop
and
resume
wavefront
execuPon
‒ Can
break
a
running
kernel
‒ Can
support
debugging
persistent
kernels
(aoach
to
kernel)
‒ Will
allow
data
breakpoints
‒ Working
development
build
in
the
demo
area!
15
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
17. GPU
PROFILING
WITH
AMD
CodeXL
KEY
FEATURES
! Analyze
and
profile
OpenCL™
host
and
device
code
‒ Collect
applicaPon
trace
mode
‒ Collect
GPU
performance
counter
mode
! Views:
‒ API
trace:
View
API
calls
with
inputs
and
outputs
‒ Timeline
visualizaPon:
View
host
and
device
synch
issue
‒ Summary
pages:
Find
top
booleneck
‒ Warnings/Errors:
View
performance
suggesPons
‒ Kernel
occupancy:
Find
kernel
resource
booleneck
‒ Performance
counter:
View
kernel
perf
booleneck
! Does
not
require
source
or
project
modificaPons
to
the
applicaPon
! Does
not
even
require
the
applicaPon
source
code
17
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
18. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
API
TRACE
! Analyze
and
profile
OpenCL™
applicaPons
‒ View
API
input
arguments
and
output
results
‒ Find
API
hotspots
‒ Determine
top
ten
data
transfer
and
kernel
execuPon
operaPons
‒ IdenPfy
failed
API
calls,
resource
leaks
and
best
pracPces
18
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
19. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
TIMELINE
VISUALIZATION
! Visualize
host
and
device
execuPon
in
a
Pmeline
chart
‒ View
number
of
OpenCL™
contexts
and
command
queues
created
and
the
relaPonships
between
these
items
‒ View
data
transfer
operaPons
and
kernel
execuPons
on
the
device
‒ Determine
proper
synchronizaPon
and
load
balancing
19
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
20. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
SUMMARY
PAGES
! Find
top
boolenecks
‒ I/O
bound
‒ Compute
bound
20
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
21. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
WARNING
AND
ERROR
MESSAGES
! Provide
performance
improvement
suggesPons
! Detect
errors
in
an
OpenCL™
applicaPon
21
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
22. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
PERFORMANCE
COUNTER
! Analyze
the
OpenCL™
kernel
execuPon
for
AMD
APUs
and
GPUs
‒ Collect
GPU
Performance
Counters
‒ The
number
of
ALU,
global
and
local
memory
instrucPons
executed
‒ GPU
uPlizaPon
and
memory
access
characterisPcs
‒ Show
the
kernel
resource
usages
‒ View
the
AMD
intermediate
language
(AMD
IL)
and
hardware
disassembly
(ISA)
22
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
23. GPU
PROFILING
WITH
AMD
CodeXL
|
Views
KERNEL
OCCUPANCY
! EsPmate
OpenCL™
kernel
occupancy
for
AMD
APUs
and
GPUs
‒ Visual
indicaPon
of
the
limiPng
kernel
resources
for
number
of
wavefronts
in
flight
‒ View
the
maximum
number
of
wavefronts
in
flight
limited
by
‒ Work
group
size
‒ Number
of
allocated
scalar
or
vector
registers
‒ Amount
of
allocated
LDS
‒ View
the
maximum
resource
limit
for
the
GPU
device
23
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
24. GPU
PROFILING
WITH
AMD
CodeXL
|
DEMO
OpPmizing
AMD
teapot
applicaPon
! Finding
and
fixing
non-‐opPmized
kernel
launch
parameters
‒ API
Trace
and
Warning
and
Error
Messages
View
! Visualizing
host
device
synchronizaPon
‒ Timeline
VisualizaPon
! NavigaPng
to
find
the
top
booleneck
‒ Summary
Pages
View
! OpPmizing
the
kernel
‒ Kernel
Occupancy
and
GPU
Performance
Counter
View
24
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
26. STATIC
KERNEL
ANALYSIS
WITH
AMD
CodeXL
KEY
FEATURES
! Compile,
analyze
and
disassemble
an
OpenCL™
kernel
for
AMD
APUs,
GPUs
and
CPUs.
‒ View
AMD
IL
and
hardware
disassembly
(ISA)
‒ View
compilaPon
warning
and
error
messages
! Generate
offline
compilaPon
of
OpenCL™
kernel
binary
! View
compiler
staPsPcs
and
esPmate
performance
! Only
require
the
OpenCL™
kernel
source
code
as
an
input
! Does
not
require
a
GPU
in
the
system
26
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
27. STATIC
KERNEL
ANALYSIS
WITH
AMD
CodeXL
|
FEATURES
NEW
IN
CODEXL
1.3
! Integrated
into
AMD
CodeXL
standalone
and
Visual
Studio®
extension
! Brand
new
user
experience
‒ View
OpenCL™
kernel
source,
IL
and
ISA
simultaneously
‒ View
overview
‒ Generate
analysis
for
SI
and
CI
families
of
GPUs
‒ EsPmated
cycle
count
with
isa
branch
execuPon
classificaPon
‒ Navigate
compilaPon
and
analysis
results
in
tree
view
! Support
compilaPon
for
the
latest
AMD
APUs,
GPUs
and
CPUs
27
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
29. CPU
PROFILING
WITH
AMD
CodeXL
! IdenPfy
and
invesPgate
CPU
performance
hot-‐spots
! Profiles
C,
C++,
FORTRAN,
Java,
.NET,
OpenCL™
applicaPons
! Profiles
soaware
components
‒ ApplicaPons,
Libraries,
Dynamically
loaded
modules
‒ OS
Kernel
modules
! Profile
modes
‒ Per
Process
(target
applicaPon
and
its
children)
‒ System
Wide
Profiling
! Uses
HW
Performance
Monitoring
counters
‒ Low
overhead
! No
change
to
source
code
required
‒ Symbolic
informaPon
required
to
aoribute
the
performance
data
at
funcPon/source
level
29
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
30. CPU
PROFILING
WITH
AMD
CodeXL
! Profiling
Types
‒ Time-‐based
profiling
‒ Event-‐based
profiling
‒ InstrucPon
Based
Sampling
(IBS)
‒ Cache
Line
UPlizaPon
‒ Call
Graph
! Pre-‐defined
profile
configuraPon
of
HW
performance
events
‒ Assess
Performance
‒ InvesPgate
Data
Access
‒ InvesPgate
Branching
30
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
31. CPU
PROFILING
WITH
AMD
CodeXL
! Performance
data
are
displayed
in
configurable
views
‒ Samples
aoributed
at
Process
and
Modules
level
‒ Drill
down
to
FuncPons,
Source
code
and
InstrucPons
level
31
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
32. CPU
PROFILING
WITH
AMD
CodeXL
! Call
Graph
view
displays
the
parents
and
children
of
hooest
funcPon
calls
32
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013
33. CPU
PROFILING
WITH
AMD
CodeXL
! IdenPfy
Hotspots
‒ Where
the
applicaPon
spends
its
Pme
‒ Source
level/algorithm
related
performance
issues
‒ Use
Time-‐base
profiling
! IdenPfy
the
cause
‒ How
well
the
applicaPon
is
using
the
CPU
and
Memory
resources
‒ Performance
boolenecks
due
to
the
micro-‐architectural
constraints
‒ Use
Event-‐based
profiling
or
InstrucPon
Based
Sampling
! Precise
instrucPon
level
profiling
‒ Use
InstrucPon
Based
Sampling
! Cache-‐Line
UPlizaPon
-‐
Data
access
paoern
33
|
ADVANCED
OPENCLTM
DEBUGGING
AND
PROFILING
USING
CODEXL
|
NOVEMBER
13,
2013