ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
15. • The following 6 slides are from an SC14 tutorial by:
• Damian Alvarez
– d.alvarez.mallon@fz-juelich.de
• Dr. Mike Ashworth
– mike.ashworth@stfc.ac.uk
• Vincent Betro, Ph. D.
– vbetro@utk.edu
• Chris Gottbrath
– Chris.Gottbrath@roguewave.com
• Nikolay Piskun, Ph.D.
– Nikolay.Piskun@roguewave.com
• Sandra Wienke
– Wienke@itc.rwth-aachen.de
11.17.2014 SC ‘14
16. • Setting breakpoints in CUDA kernels
– Start debugging (e.g. “Go”)
– Message box when
kernel is loaded:
– Set kernel
breakpoints as in
host code
11.17.2014 SC ‘14
17. • Debugger thread IDs in Linux CUDA process
– Host thread: positive no.
– CUDA thread: negative no.
• GPU thread navigation
– Logical coordinates: blocks (3 dimensions),
threads (3 dimensions)
– Physical coordinates: device, SM, warp, lane
– Only valid selections are permitted
11.17.2014 SC ‘14
18. • Single Stepping
– Advances all GPU hardware threads within same warp
– Stepping over a __syncthreads() call advances all threads within
the block
• Advancing more than just one warp
– “Run To” a selected line
number in the source pane
– Set a breakpoint and
“Continue” the process
• Halt
– Stops all the host and
device threads
11.17.2014 SC ‘14
…
t0 t1 t31
…
t32 t63
…
warp
group of 32 threads
same program counter (PC)
19. • Displaying CUDA device properties
– “Tools” - “CUDA Devices”
– Helps mapping between
logical & physical coordinates
• PCs across SMs, warps, lanes
– valid, active, divergent
11.17.2014 SC ‘14
program
counter (PC)
within warp
…
20. • Displaying GPU data
– “Dive” into variable or
watch “Type” in “Expression List”
– Device memory spaces: “@”
notation
11.17.2014 SC ‘14
Storage Qualifier Meaning of address
@global Offset within global storage
@shared Offset within shared storage
@local Offset within local storage
@register PTX register name
@generic Offset within generic address space (e.g.
pointer to global, local or shared
memory)
@constant Offset within constant storage
@parameter Offset within parameter storage (TV built-
in type)
21. • Checking GPU memory
– Enable “CUDA Memory checking” during startup or in the “Debug”
menu
– Detects global memory addressing violations and misaligned memory
accesses
• Further features
– Multi-device support
– Host-pinned memory support
– MPI-CUDA applications
11.17.2014 SC ‘14
Note: Recent cuda-memcheck versions are
also able to detect race conditions:
cuda-memcheck -–tool racecheck <prog>