SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Debugging GPU faults: QoL
tools for your driver
Danylo Piliaiev
2023-10-17
1
Who Am I?
Currently implementing Adreno 7XX GPU generation
in Turnip
My blog:
In the past
Worked on mobile video games
Debugging unruly games since 2018
At Igalia since November 2020
blogs.igalia.com/dpiliaiev
2
The Problem
"What if I was able to quickly edit this GPU packet?"
"What if I was able to dump this buffer here?"
Or "It would have been nice to print that shader
register!"
"I'll implement it later...."
3
Unrecoverable Hangs - Roadblocks
Computer completely locks up and has to be rebooted
Last few seconds of logs/anything else are lost
Existing tooling isn't of much use with such
constraints
GFR (Graphics Flight Recorder):
VK layer for breadcrumbs
Dumps command buffers with commands' status
4
Unrecoverable Hangs - Solution
More BREADCRUMBS!
GFR writes results to the disk
GFR logging is far behind what's actually runs on GPU
GFR could be too high level:
Blits/BeginRenderPass/EndRenderPass could
internally use a lot of different 2d and 3d blits
5
Unrecoverable Hangs - Solution
Observations
Unrecoverable hangs are rarely caused by sync issues
Cannot allow GPU to race ahead of the last know
breadcrumb
A hang may happen asynchronously to the GPU
packet that triggered it, e.g.
A job is scheduled to another GPU unit
That GPU unit hangs some time afterwards
There may not be a way to synchronize it
6
Unrecoverable Hangs - Solution
The current solution in Turnip is:
Breadcrumbs are inserted after each GPU command
GPU writes a breadcrumb and immediately waits for
this value to be acknowledged
CPU in a busy loop checks the breadcrumb value
If new one is found, it is sent over the network
The CPU acks the breadcrumb and GPU continues
execution
7
How Our Breadcrumbs Work
BR1 BR2 BR3
BR1
Send
GPU
CPU
GPU executes this
command and
hangs
Kernel BR1
Send over
network
Wait Wait Wait
Read Write
BR2
Send
BR2
Read Write
8
Asynchronous hangs?
Require explicit input in tty for each breadcrumb
GPU is on breadcrumb 18, continue?y
GPU is on breadcrumb 19, continue?y
GPU is on breadcrumb 20, continue?y
GPU is on breadcrumb 21, continue?
9
Breadcrumbs In Practice
Increase GPU hang timeout
Receive breadcrumbs on another machine via bash
spaghetti
nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | 
awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)}
Launch workload with TU_BREADCRUMBS envvar
TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI
Launch workload and break on the last known
10
Further Material
https://blogs.igalia.com/dpiliaiev/debugging-unrecoverable-hangs/
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15452
11
Faster Way To Debug Hangs
12
Breadcrumbs Shortcoming
Breadcrumbs are good for finding a command that
hangs
They cannot tell which part of the GPU state caused
it
They are useless for misrenderings
Some issues are not reproducible with breadcrumbs
13
Reproducing Hangs
Drivers are already able to capture command streams
And all used buffers
With this it is trivial to replay the submissions back
Ideally requires user-space specified GPU addresses
14
Replaying - Caveats
Multiple queues
When to re-upload memory?
Just force a single queue?
Timeline semaphores?
Recordings may be huge
15
Editing The Command Stream
Even the most minimalistic editing is useful:
/* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
/* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */
pkt4(cs, 0x88d1, (2), 0);
/* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
pkt7(cs, CP_MEM_WRITE, 20);
/* { ADDR_LO = 0x1f7580 } */
pkt(cs, 2061696);
/* { ADDR_HI = 0x40 } */
pkt(cs, 64);
pkt(cs, 1216352390);
pkt(cs, 1107296256);
16
Editing Shaders
const char *source = R"(
shps #l37
getone #l37
cov.u32f32 r1.w, c504.z
cov.u32f32 r2.x, c504.w
cov.u32f32 r1.y, c504.x
....
end
)";
upload_shader(&ctx, 0x100200d80, source);
emit_shader_iova(&ctx, cs, 0x100200d80);
17
Replaying Edited Command Stream
The decompiler emits C code with raw commands
The replay tool takes original submissions capture:
Finds unused memory range
Emits edited command stream there
Overrides target submission
18
Dumping GPU Memory
Dumping GPU memory is simple to implement
Act of copying may disturb GPU caches
Kernel cooperation is needed to implement it
properly:
GPU interrupts execution e.g. by faulting
Now memory could be read undisturbed
19
Dumping Shader's Registers
print %tmp_regs, %src_reg
%tmp_regs - 3 consecutive free regs
For 64b address and 32b tmp offset
%src_reg - a single register to print
Shader Log Entries: 6
[0] 00000004 0.0000
[1] 00000000 0.0000
[2] 00000000 0.0000
[3] 00000000 0.0000
[4] beadc429 -0.3394
[5] beadc429 -0.3394
20
Dumping Shader's Registers
Want a nicer print? Just print $src_reg?
You still need to allocate temporary registers
What if there are no free regs?
Spilling regs may not be that easy at this stage
Too much trouble for a little gain...
21
Short Summary
A tool to replay command stream submissions
A tool to decompile a command stream into C code
An option to replay edit command stream
Helpers to dump GPU memory from the command
stream
Helpers to dump shader registers
22
Stale Regs In Command Stream
23
Debugging Stale Registers
It could be hard to spot stale reg usage:
It may appear as a random geometry flicker
Game hanging at a random moment
Rare CTS test failure
24
Stomping Registers - Caveats
Could be a bit tricky if a combination of regs causes
an issue
VK pipelines could be set outside a renderpass
Doesn't help if stale regs are between draw calls
Default invalid value may be valid for some registers
25
Stomping Registers
We mark each register with where it is used:
<reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/>
<reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/>
<reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp
<reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp
To stomp register you need to specify:
export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01
export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass
26
Turnip Tooling - Summary
Unique tooling:
Driver breadcrumbs
Command stream replaying and editing
GPU memory dumping
Shader register dumping
Debug option to find stale reg usage
27
Other Drivers and Tooling
28
Generic
GFR - Graphics Flight Recorder
Instruments command buffers with completion tags
Uses VK_AMD_buffer_marker (nothing vendor
specific)
In vkd3d-proton:
Breadcrumbs
Shader printf
Descriptor debugging
29
Other Mesa Drivers
Feature toggles and debug flags
Shader assembly replacement for debugging
GPU submissions decoding
GPU crash dumps decoding
30
Radeon - UMR
GPU register dumps
SGPR / VGPR shader register dumps
Shader wavefront Debugging
Shader disassembly around the crash site
See Maister's blog post for it in action
https://themaister.net/blog/2023/08/20/hardcor
e-vulkan-debugging-digging-deep-on-linux-amd
gpu/
31
Unreleased - Radeon - Shader Debugger
32
Proprietary - Radeon™ GPU Detective
Postmortem analysis of GPU crashes
Information about page faults
Breadcrumbs reflecting done and in-progress GPU
work
Command Buffer ID: 0x107c
=========================
[>] "Frame 1040 CL0"
├─[X] "Depth + Normal + Motion Vector PrePass"
├─[>] "DownSamplePS"
│ ├─[X] Draw
│ ├─[>] Draw
└─[>] "Bloom"
├─[>] "BlurPS"
│ ├─[>] Draw
│ └─[>] Draw
├─[ ] Draw
├─[ ] "BlurPS"
33
Proprietary - NVIDIA Aftermath
34
Proprietary - NVIDIA Aftermath
Collects GPU “mini-dumps”
Visualizes GPU state at the moment of crash
Collects breadcrumbs
Shows crashing shader and it registers
35
Q&A
Any good tools I haven't mentioned?
Maybe you tried something before?
Maybe you have an idea for a tool to implement?
We're hiring!
igalia.com/jobs/
36
37
Debugging GPU faults: QoL tools for your driver – XDC 2023

Contenu connexe

Similaire à Debugging GPU faults: QoL tools for your driver – XDC 2023

Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
Jian-Hong Pan
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
Peter Griffin
 

Similaire à Debugging GPU faults: QoL tools for your driver – XDC 2023 (20)

eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
XS Boston 2008 Debugging Xen
XS Boston 2008 Debugging XenXS Boston 2008 Debugging Xen
XS Boston 2008 Debugging Xen
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEK
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Code Red Security
Code Red SecurityCode Red Security
Code Red Security
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
Writing bios
Writing biosWriting bios
Writing bios
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 

Plus de Igalia

Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 

Plus de Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 

Dernier

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Dernier (20)

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 

Debugging GPU faults: QoL tools for your driver – XDC 2023

  • 1. Debugging GPU faults: QoL tools for your driver Danylo Piliaiev 2023-10-17 1
  • 2. Who Am I? Currently implementing Adreno 7XX GPU generation in Turnip My blog: In the past Worked on mobile video games Debugging unruly games since 2018 At Igalia since November 2020 blogs.igalia.com/dpiliaiev 2
  • 3. The Problem "What if I was able to quickly edit this GPU packet?" "What if I was able to dump this buffer here?" Or "It would have been nice to print that shader register!" "I'll implement it later...." 3
  • 4. Unrecoverable Hangs - Roadblocks Computer completely locks up and has to be rebooted Last few seconds of logs/anything else are lost Existing tooling isn't of much use with such constraints GFR (Graphics Flight Recorder): VK layer for breadcrumbs Dumps command buffers with commands' status 4
  • 5. Unrecoverable Hangs - Solution More BREADCRUMBS! GFR writes results to the disk GFR logging is far behind what's actually runs on GPU GFR could be too high level: Blits/BeginRenderPass/EndRenderPass could internally use a lot of different 2d and 3d blits 5
  • 6. Unrecoverable Hangs - Solution Observations Unrecoverable hangs are rarely caused by sync issues Cannot allow GPU to race ahead of the last know breadcrumb A hang may happen asynchronously to the GPU packet that triggered it, e.g. A job is scheduled to another GPU unit That GPU unit hangs some time afterwards There may not be a way to synchronize it 6
  • 7. Unrecoverable Hangs - Solution The current solution in Turnip is: Breadcrumbs are inserted after each GPU command GPU writes a breadcrumb and immediately waits for this value to be acknowledged CPU in a busy loop checks the breadcrumb value If new one is found, it is sent over the network The CPU acks the breadcrumb and GPU continues execution 7
  • 8. How Our Breadcrumbs Work BR1 BR2 BR3 BR1 Send GPU CPU GPU executes this command and hangs Kernel BR1 Send over network Wait Wait Wait Read Write BR2 Send BR2 Read Write 8
  • 9. Asynchronous hangs? Require explicit input in tty for each breadcrumb GPU is on breadcrumb 18, continue?y GPU is on breadcrumb 19, continue?y GPU is on breadcrumb 20, continue?y GPU is on breadcrumb 21, continue? 9
  • 10. Breadcrumbs In Practice Increase GPU hang timeout Receive breadcrumbs on another machine via bash spaghetti nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)} Launch workload with TU_BREADCRUMBS envvar TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI Launch workload and break on the last known 10
  • 12. Faster Way To Debug Hangs 12
  • 13. Breadcrumbs Shortcoming Breadcrumbs are good for finding a command that hangs They cannot tell which part of the GPU state caused it They are useless for misrenderings Some issues are not reproducible with breadcrumbs 13
  • 14. Reproducing Hangs Drivers are already able to capture command streams And all used buffers With this it is trivial to replay the submissions back Ideally requires user-space specified GPU addresses 14
  • 15. Replaying - Caveats Multiple queues When to re-upload memory? Just force a single queue? Timeline semaphores? Recordings may be huge 15
  • 16. Editing The Command Stream Even the most minimalistic editing is useful: /* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */ pkt(cs, 4128831); /* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */ pkt4(cs, 0x88d1, (2), 0); /* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */ pkt(cs, 4128831); pkt7(cs, CP_MEM_WRITE, 20); /* { ADDR_LO = 0x1f7580 } */ pkt(cs, 2061696); /* { ADDR_HI = 0x40 } */ pkt(cs, 64); pkt(cs, 1216352390); pkt(cs, 1107296256); 16
  • 17. Editing Shaders const char *source = R"( shps #l37 getone #l37 cov.u32f32 r1.w, c504.z cov.u32f32 r2.x, c504.w cov.u32f32 r1.y, c504.x .... end )"; upload_shader(&ctx, 0x100200d80, source); emit_shader_iova(&ctx, cs, 0x100200d80); 17
  • 18. Replaying Edited Command Stream The decompiler emits C code with raw commands The replay tool takes original submissions capture: Finds unused memory range Emits edited command stream there Overrides target submission 18
  • 19. Dumping GPU Memory Dumping GPU memory is simple to implement Act of copying may disturb GPU caches Kernel cooperation is needed to implement it properly: GPU interrupts execution e.g. by faulting Now memory could be read undisturbed 19
  • 20. Dumping Shader's Registers print %tmp_regs, %src_reg %tmp_regs - 3 consecutive free regs For 64b address and 32b tmp offset %src_reg - a single register to print Shader Log Entries: 6 [0] 00000004 0.0000 [1] 00000000 0.0000 [2] 00000000 0.0000 [3] 00000000 0.0000 [4] beadc429 -0.3394 [5] beadc429 -0.3394 20
  • 21. Dumping Shader's Registers Want a nicer print? Just print $src_reg? You still need to allocate temporary registers What if there are no free regs? Spilling regs may not be that easy at this stage Too much trouble for a little gain... 21
  • 22. Short Summary A tool to replay command stream submissions A tool to decompile a command stream into C code An option to replay edit command stream Helpers to dump GPU memory from the command stream Helpers to dump shader registers 22
  • 23. Stale Regs In Command Stream 23
  • 24. Debugging Stale Registers It could be hard to spot stale reg usage: It may appear as a random geometry flicker Game hanging at a random moment Rare CTS test failure 24
  • 25. Stomping Registers - Caveats Could be a bit tricky if a combination of regs causes an issue VK pipelines could be set outside a renderpass Doesn't help if stale regs are between draw calls Default invalid value may be valid for some registers 25
  • 26. Stomping Registers We mark each register with where it is used: <reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/> <reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/> <reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp <reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp To stomp register you need to specify: export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01 export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass 26
  • 27. Turnip Tooling - Summary Unique tooling: Driver breadcrumbs Command stream replaying and editing GPU memory dumping Shader register dumping Debug option to find stale reg usage 27
  • 28. Other Drivers and Tooling 28
  • 29. Generic GFR - Graphics Flight Recorder Instruments command buffers with completion tags Uses VK_AMD_buffer_marker (nothing vendor specific) In vkd3d-proton: Breadcrumbs Shader printf Descriptor debugging 29
  • 30. Other Mesa Drivers Feature toggles and debug flags Shader assembly replacement for debugging GPU submissions decoding GPU crash dumps decoding 30
  • 31. Radeon - UMR GPU register dumps SGPR / VGPR shader register dumps Shader wavefront Debugging Shader disassembly around the crash site See Maister's blog post for it in action https://themaister.net/blog/2023/08/20/hardcor e-vulkan-debugging-digging-deep-on-linux-amd gpu/ 31
  • 32. Unreleased - Radeon - Shader Debugger 32
  • 33. Proprietary - Radeon™ GPU Detective Postmortem analysis of GPU crashes Information about page faults Breadcrumbs reflecting done and in-progress GPU work Command Buffer ID: 0x107c ========================= [>] "Frame 1040 CL0" ├─[X] "Depth + Normal + Motion Vector PrePass" ├─[>] "DownSamplePS" │ ├─[X] Draw │ ├─[>] Draw └─[>] "Bloom" ├─[>] "BlurPS" │ ├─[>] Draw │ └─[>] Draw ├─[ ] Draw ├─[ ] "BlurPS" 33
  • 34. Proprietary - NVIDIA Aftermath 34
  • 35. Proprietary - NVIDIA Aftermath Collects GPU “mini-dumps” Visualizes GPU state at the moment of crash Collects breadcrumbs Shows crashing shader and it registers 35
  • 36. Q&A Any good tools I haven't mentioned? Maybe you tried something before? Maybe you have an idea for a tool to implement? We're hiring! igalia.com/jobs/ 36
  • 37. 37