HKG18-TR14 - Postmortem Debugging with Coresight

Linaro
LinaroLinaro
Postmortem Debugging with Coresight
HKG18-TR14
Leo Yan, Linaro Support and Solutions Engineering
Introduction
This session discusses postmortem
debugging techniques in the Linux kernel.
Firstly we will review ramoops (aka.
pstore) with software tracing for
postmortem debugging, then go through
two Coresight debugging methods with
hardware assisted tracing.
We will finish this material + 3 demos in
25 minutes.
Kernel
panic
System
hang
Dump
capture
kernel
Second
boot
kernel
Crash
Coresight tracer
Perf + OpenCSD
ramoops
Coresight
CPU debug
Working flow with debugging tools
Overview
● Discussion for practical scenarios
○ Finding program execution flow for system hang
○ CPU is dead, how to read current CPU state?
○ Offline analysis of program execution flow
● You can extend for your SoC
The scenario - Finding program execution flow for system hang
When a system hangs, we cannot always rely
on the console for printing log messages
Modern CPUs improve performance by using
asynchronous operations on external bus or
memory system (early acknowledge, OoO,
etc), such systems may not hang immediately
after running a ‘bad’ instruction
To narrow down the cause, we need to
reverse program execution flow to find hints.
● Ramoops (aka. pstore) is general
framework to dump logs into
persistent RAM and which survive
after a restart
● Ramoops can dump:
○ Console message
○ Oops and panic log
○ Function tracing
● Ramoops function tracing is used
to reverse program execution flow
Demo for ramoops
Step 1: Enable all required kconfig options
CONFIG_PSTORE=y
CONFIG_PSTORE_FTRACE=y
CONFIG_DEBUG_FS=y
Step 2: Prepare for testing
Enable ramoops for function tracing
# mount -t debugfs debugfs /sys/kernel/debug/
# echo 1 > /sys/kernel/debug/pstore/record_ftrace
Enable watchdog
# echo 1 > /dev/watchdog
Step 3: Run testing case until system hangs
Step 4: Reboot with watchdog timeout
Step 5: Analyze tracing data
# mount -t pstore pstore /mnt
# cat /mnt/ftrace-ramoops-0 > tracing.log
Demo and ramoops log
CPU:0 ts:1000329 ffff0000082c87ec ffff0000082c9030 single_start <- seq_read+0x1a0/0x4c0
CPU:0 ts:1000330 ffff0000087214d4 ffff0000082c9058 dbg_ws_hang_show <- seq_read+0x1c8/0x4c0
CPU:0 ts:1000331 ffff0000080a428c ffff00000872151c __ioremap <- dbg_ws_hang_show+0x54/0xa8
CPU:0 ts:1000332 ffff0000080a41b0 ffff0000080a42a0 __ioremap_caller <- __ioremap+0x38/0x50
CPU:0 ts:1000333 ffff0000080a3c44 ffff0000080a41d8 pfn_valid <- __ioremap_caller+0x60/0xf0
CPU:0 ts:1000334 ffff0000082569c4 ffff0000080a3c4c memblock_is_map_memory <- pfn_valid+0x1c/0x30
CPU:0 ts:1000335 ffff0000082517f8 ffff0000080a41ec get_vm_area_caller <- __ioremap_caller+0x74/0xf0
CPU:0 ts:1000336 ffff000008251438 ffff00000825182c __get_vm_area_node <- get_vm_area_caller+0x54/0x68
https://youtu.be/2_SIdeQrG-Y
Overview
● Discussion for practical scenarios
○ Finding program execution flow for system hang
○ CPU is dead, how to read current CPU state?
○ Offline analysis of program execution flow
● You can extend for your SoC
The scenario - CPU is dead, how to read current CPU state?
When a CPU dies, perhaps locked up with
IRQ/FIQ masked or a stuck waiting for a bus
access, the dead CPU cannot react to SMP
inter-processor interrupt to dump backtrace.
JTAG debugger could be used to check dead
CPU state, but we can explore a more
convenient method in kernel for systems that
have other CPUs are still alive so can dump
PC value for dead CPU.
● CORESIGHT_CPU_DEBUG:
Coresight CPU debug module
allows Linux SMP partners to
watch each other (enhanced
LOCKUP_DETECTOR).
● Can be used to dump CPU state
based on ARM sample-based
profiling extension.
● Last CPU PC before failure is
stored in its debug unit and other
processors can extract this too (no
cache problems)!
Demo for Coresight CPU debug
Step 1: Enable all required kconfig options
CONFIG_CORESIGHT_CPU_DEBUG=y
Step 2: Prepare for testing
Disable CPU idle states in command line (if need): nohlt
Enable panic on RCU stall if system doesn’t support
hard lock detection:
# sysctl -w kernel.panic_on_rcu_stall=1
Step 3: Run test case until system panic
Step 4: Read debug info logged during panic
Demo video and Coresight CPU debug log
[ 79.171171] coresight-cpu-debug f65d0000.debug: CPU[4]:
[ 79.176408] coresight-cpu-debug f65d0000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
[ 79.184512] coresight-cpu-debug f65d0000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8
[ 79.193743] coresight-cpu-debug f65d0000.debug: EDCIDSR: 00000000
[ 79.199931] coresight-cpu-debug f65d0000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
[ 79.210468] coresight-cpu-debug f65d2000.debug: CPU[5]:
[ 79.215705] coresight-cpu-debug f65d2000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
[ 79.223810] coresight-cpu-debug f65d2000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8
[ 79.233041] coresight-cpu-debug f65d2000.debug: EDCIDSR: 00000000
[ 79.239229] coresight-cpu-debug f65d2000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
[ 79.249765] coresight-cpu-debug f65d4000.debug: CPU[6]:
[ 79.255003] coresight-cpu-debug f65d4000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
[ 79.263107] coresight-cpu-debug f65d4000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8
[ 79.272338] coresight-cpu-debug f65d4000.debug: EDCIDSR: 00000000
[ 79.278526] coresight-cpu-debug f65d4000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
[ 79.289062] coresight-cpu-debug f65d6000.debug: CPU[7]:
[ 79.294299] coresight-cpu-debug f65d6000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
[ 79.302407] coresight-cpu-debug f65d6000.debug: EDPCSR: [<ffff00000871ee54>] cpu_lock+0x44/0x50
[ 79.311290] coresight-cpu-debug f65d6000.debug: EDCIDSR: 00000000
[ 79.317478] coresight-cpu-debug f65d6000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
https://youtu.be/mFuvWrUrlwo
Overview
● Discussion for practical scenarios
○ Finding program execution flow for system hang
○ CPU is dead, how to know CPU current state?
○ Offline analysis of program execution flow
● You can extend for your SoC
The scenario - Offline analysis of program execution flow
If hang or panic issues happen in production
release, we usually need to use analyse
program execution flow offline using host tools.
Ramoops with function tracing is often no
enabled for production testing because it might
downgrade performance seriously; and thus
alter or prevent reproduction for heisenbugs.
Coresight hardware tracer is a good candidate
to record program execution flow with
minimising overhead.
● Coresight + kdump can store hardware
tracing data in dump file for kernel panic
and we can rely on crash tool to extract
tracing data
● Perf + OpenCSD tool decodes tracing
data and outputs readable program flow
● It smoothly supports kernel panic
debugging but so far it isn’t for
debugging system hang
● Trick: If Coresight RAM is preserved after
reset then useful Coresight tracing data
can still be extracted for debugging hang
issues
Demo for Coresight + Kdump
Step 1: Enable all required kconfig options
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINKS_AND_SINKS=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_SOURCE_ETM4X=y
CONFIG_CORESIGHT_PANIC_KDUMP=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
Step 2: Prepare for testing
Enable Coresight tracer and sink
Use kexec to load capture-dump kernel and dtb
Step 3: Run test case until system panic
Step 4: Boot capture-dump kernel, save core file
Step 5: Extract Coresight tracing data with crash
# crash vmlinux vmcore
crash> extend csdump.so
crash> csdump out_folder
Step 6: Boot original kernel for analysis with perf
To avoid kernel build id mismatch when analysing
coresight trace data, we can run original kernel with
kernel symbol file:
# perf script -v -a -F cpu,event,ip,sym,symoff
-i perf.data --kallsyms /proc/kallsyms
Demo video and perf decodes program flow
# ./perf script -f -v -a -F cpu,event,ip,sym,symoff --kallsyms /proc/kallsyms
build id event received for [kernel.kallsyms]: 32eeb5c9c99d00a63d0921cbd815c32385c36710
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
Frame deformatter: Found 4 FSYNCS
[000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc
[000] branches: ffff000008afa6e8 __delay+0x90
[000] branches: ffff000008afa6dc __delay+0x84
[000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc
[000] branches: ffff000008afa6e8 __delay+0x90
[000] branches: ffff000008afa6dc __delay+0x84
[000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc
[000] branches: ffff000008afa6e8 __delay+0x90
[000] branches: ffff000008afa6dc __delay+0x84
[000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc
[000] branches: ffff000008afa6e8 __delay+0x90
[000] branches: ffff000008afa6dc __delay+0x84
[000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc
[000] branches: ffff000008afa6e8 __delay+0x90
[000] branches: ffff000008afa6dc __delay+0x84
https://youtu.be/oLlBpzVGeFU
Overview
● Discussion for practical scenarios
○ Finding program execution flow for system hang
○ CPU is dead, how to know CPU current state?
○ Offline analysis of program execution flow
● You can extend for your SoC
You can extend for your SoC
You can extend to use other bus masters
(DSP, MCU, etc) to access coresight CPU
debug module to recover last PC in case
all CPUs for Linux are dead.
You can extend other bus masters to
support support a kdump-like workflow to
preserve DDR and coresight trace data
(preserved after reset) which can then be
analysed using open source tools.
Kernel
panic
System
hang
Dump
capture
kernel
Second
boot
kernel
Crash
Coresight tracer
Perf + OpenCSD
ramoops
Coresight
CPU debug
Extended working flow
MCU
Related materials
● Coresight related patches and testing case
○ https://git.linaro.org/people/leo.yan/linux-debug-workshop.git/log/?h=acme_perf_core_cs_dev
○ Ramoops and coresight CPU debug module has been merged into mainline kernel
○ Coresight for kdump supporting patches are working in progress
● This session adds new ideas not found in our previous debugging session:
BUD17-TR04: Kernel Debug Stories
○ Presented by Daniel Thompson: http://connect.linaro.org/resource/bud17/bud17-tr04/
○ Kernel Debug Stories covers far too many techniques to allow time for live demos ;-)
Linaro Limited Lifetime Warranty
This training presentation comes with a
lifetime warranty.
Everyone here today can send questions
about today’s session and suggestions for
future topics to support@linaro.org .
Member engineers can also use this address to get support on
any other Linaro output. Engineers from club and core members
are welcome to contact us to discuss how Support and Solutions
Engineering can help you with additional services and training.
With thanks to Andrew Hennigan for introducing the idea of placing a guarantee on training.
Graphic by Jost, CC0-PD
Thank You
#HKG18
HKG18 keynotes and videos on: connect.linaro.org
For further information: support@linaro.org or www.linaro.org
1 sur 19

Recommandé

Advanced Diagnostics 2 par
Advanced Diagnostics 2Advanced Diagnostics 2
Advanced Diagnostics 2Aero Plane
1.9K vues45 diapositives
OSMC 2014: Server Hardware Monitoring done right | Werner Fischer par
OSMC 2014: Server Hardware Monitoring done right | Werner FischerOSMC 2014: Server Hardware Monitoring done right | Werner Fischer
OSMC 2014: Server Hardware Monitoring done right | Werner FischerNETWAYS
643 vues85 diapositives
Embedded Recipes 2019 - Introduction to JTAG debugging par
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
1.3K vues27 diapositives
Kernel Features for Reducing Power Consumption on Embedded Devices par
Kernel Features for Reducing Power Consumption on Embedded DevicesKernel Features for Reducing Power Consumption on Embedded Devices
Kernel Features for Reducing Power Consumption on Embedded DevicesRyo Jin
3.7K vues42 diapositives
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools par
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsXiaozhe Wang
11.4K vues44 diapositives
Linux Performance Profiling and Monitoring par
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
8.5K vues79 diapositives

Contenu connexe

Tendances

Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust... par
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...Linaro
3.4K vues56 diapositives
SCADA Strangelove: взлом во имя par
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяEkaterina Melnik
718 vues127 diapositives
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2) par
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)Simen Li
3.3K vues96 diapositives
建構嵌入式Linux系統於SD Card par
建構嵌入式Linux系統於SD Card建構嵌入式Linux系統於SD Card
建構嵌入式Linux系統於SD Card艾鍗科技
5.9K vues39 diapositives
Demystifying Secure enclave processor par
Demystifying Secure enclave processorDemystifying Secure enclave processor
Demystifying Secure enclave processorPriyanka Aash
791 vues116 diapositives
Esp with thepic16f877 par
Esp with thepic16f877Esp with thepic16f877
Esp with thepic16f877ravi shankar ambati
493 vues196 diapositives

Tendances(20)

Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust... par Linaro
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
Q2.12: Idling ARMs in a busy world: Linux Power Management for ARM Multiclust...
Linaro3.4K vues
SCADA Strangelove: взлом во имя par Ekaterina Melnik
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имя
Ekaterina Melnik718 vues
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2) par Simen Li
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
Simen Li3.3K vues
建構嵌入式Linux系統於SD Card par 艾鍗科技
建構嵌入式Linux系統於SD Card建構嵌入式Linux系統於SD Card
建構嵌入式Linux系統於SD Card
艾鍗科技5.9K vues
Demystifying Secure enclave processor par Priyanka Aash
Demystifying Secure enclave processorDemystifying Secure enclave processor
Demystifying Secure enclave processor
Priyanka Aash791 vues
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU par Linaro
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
Linaro1.6K vues
Breaking hardware enforced security with hypervisors par Priyanka Aash
Breaking hardware enforced security with hypervisorsBreaking hardware enforced security with hypervisors
Breaking hardware enforced security with hypervisors
Priyanka Aash626 vues
⭐⭐⭐⭐⭐ PROTEUS PCB DESIGN par Victor Asanza
⭐⭐⭐⭐⭐ PROTEUS PCB DESIGN⭐⭐⭐⭐⭐ PROTEUS PCB DESIGN
⭐⭐⭐⭐⭐ PROTEUS PCB DESIGN
Victor Asanza80.3K vues
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC par DefconRussia
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC [DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
DefconRussia1.4K vues
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks par inside-BigData.com
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other AttacksExploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Stop the Guessing: Performance Methodologies for Production Systems par Brendan Gregg
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg32.1K vues
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ... par Orgad Kimchi
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Orgad Kimchi1.6K vues
Learn How to Develop Embedded System for ARM @ 2014.12.22 JuluOSDev par Jian-Hong Pan
Learn How to Develop Embedded System for ARM @ 2014.12.22 JuluOSDevLearn How to Develop Embedded System for ARM @ 2014.12.22 JuluOSDev
Learn How to Develop Embedded System for ARM @ 2014.12.22 JuluOSDev
Jian-Hong Pan4.3K vues

Similaire à HKG18-TR14 - Postmortem Debugging with Coresight

Performance Analysis Tools for Linux Kernel par
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
1.6K vues59 diapositives
Android Boot Time Optimization par
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
13.1K vues59 diapositives
Troubleshooting Linux Kernel Modules And Device Drivers par
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
13.2K vues35 diapositives
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1 par
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Jagadisha Maiya
845 vues35 diapositives
Linux Kernel Crashdump par
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel CrashdumpMarian Marinov
2.4K vues41 diapositives
Debugging 2013- Jesper Brouer par
Debugging 2013- Jesper BrouerDebugging 2013- Jesper Brouer
Debugging 2013- Jesper BrouerMediehuset Ingeniøren Live
320 vues28 diapositives

Similaire à HKG18-TR14 - Postmortem Debugging with Coresight(20)

Performance Analysis Tools for Linux Kernel par lcplcp1
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
lcplcp11.6K vues
Android Boot Time Optimization par Kan-Ru Chen
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
Kan-Ru Chen13.1K vues
Troubleshooting Linux Kernel Modules And Device Drivers par Satpal Parmar
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
Satpal Parmar13.2K vues
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1 par Jagadisha Maiya
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Jagadisha Maiya845 vues
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring par NETWAYS
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
NETWAYS141 vues
Kernel Recipes 2015 - Kernel dump analysis par Anne Nicolas
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
Anne Nicolas2.5K vues
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer par NETWAYS
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS134 vues
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer par NETWAYS
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS45 vues
Debugging linux kernel tools and techniques par Satpal Parmar
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniques
Satpal Parmar9.1K vues
Crash_Report_Mechanism_In_Tizen par Lex Yu
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
Lex Yu538 vues
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick par 44CON
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON2.7K vues
YOW2020 Linux Systems Performance par Brendan Gregg
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg1.9K vues
Linux Crash Dump Capture and Analysis par Paul V. Novarese
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
Paul V. Novarese14.6K vues
OSSNA 2017 Performance Analysis Superpowers with Linux BPF par Brendan Gregg
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg5.1K vues
Advanced Root Cause Analysis par Eric Sloof
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause Analysis
Eric Sloof3.3K vues

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo par
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
7K vues54 diapositives
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria par
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
3K vues8 diapositives
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora par
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
3.7K vues20 diapositives
Bud17 113: distribution ci using qemu and open qa par
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
662 vues63 diapositives
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018 par
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
2.3K vues16 diapositives
HPC network stack on ARM - Linaro HPC Workshop 2018 par
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
2.6K vues21 diapositives

Plus de Linaro(20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo par Linaro
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro7K vues
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria par Linaro
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Linaro3K vues
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora par Linaro
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro3.7K vues
Bud17 113: distribution ci using qemu and open qa par Linaro
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
Linaro662 vues
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018 par Linaro
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
Linaro2.3K vues
HPC network stack on ARM - Linaro HPC Workshop 2018 par Linaro
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro2.6K vues
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ... par Linaro
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Linaro1.9K vues
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ... par Linaro
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro2.4K vues
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant... par Linaro
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Linaro2.4K vues
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su... par Linaro
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro2.7K vues
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline par Linaro
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro4.4K vues
HKG18-100K1 - George Grey: Opening Keynote par Linaro
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
Linaro839 vues
HKG18-318 - OpenAMP Workshop par Linaro
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro608 vues
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline par Linaro
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro866 vues
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all par Linaro
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
Linaro219 vues
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor par Linaro
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro1.5K vues
HKG18-TR08 - Upstreaming SVE in QEMU par Linaro
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro476 vues
HKG18-113- Secure Data Path work with i.MX8M par Linaro
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
Linaro1.4K vues
HKG18-120 - Devicetree Schema Documentation and Validation par Linaro
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro1.3K vues
HKG18-223 - Trusted FirmwareM: Trusted boot par Linaro
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro1.4K vues

Dernier

Info Session November 2023.pdf par
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
13 vues15 diapositives
Piloting & Scaling Successfully With Microsoft Viva par
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft VivaRichard Harbridge
12 vues160 diapositives
Design Driven Network Assurance par
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
15 vues42 diapositives
Democratising digital commerce in India-Report par
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
18 vues161 diapositives
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors par
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
21 vues15 diapositives
Data Integrity for Banking and Financial Services par
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial ServicesPrecisely
25 vues26 diapositives

Dernier(20)

Piloting & Scaling Successfully With Microsoft Viva par Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors par sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab21 vues
Data Integrity for Banking and Financial Services par Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely25 vues
PharoJS - Zürich Smalltalk Group Meetup November 2023 par Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 vues
Future of AR - Facebook Presentation par ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 vues
Unit 1_Lecture 2_Physical Design of IoT.pdf par StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 vues
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... par Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 vues
Serverless computing with Google Cloud (2023-24) par wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 vues
Special_edition_innovator_2023.pdf par WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 vues
STPI OctaNE CoE Brochure.pdf par madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 vues
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... par James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson92 vues

HKG18-TR14 - Postmortem Debugging with Coresight

  • 1. Postmortem Debugging with Coresight HKG18-TR14 Leo Yan, Linaro Support and Solutions Engineering
  • 2. Introduction This session discusses postmortem debugging techniques in the Linux kernel. Firstly we will review ramoops (aka. pstore) with software tracing for postmortem debugging, then go through two Coresight debugging methods with hardware assisted tracing. We will finish this material + 3 demos in 25 minutes. Kernel panic System hang Dump capture kernel Second boot kernel Crash Coresight tracer Perf + OpenCSD ramoops Coresight CPU debug Working flow with debugging tools
  • 3. Overview ● Discussion for practical scenarios ○ Finding program execution flow for system hang ○ CPU is dead, how to read current CPU state? ○ Offline analysis of program execution flow ● You can extend for your SoC
  • 4. The scenario - Finding program execution flow for system hang When a system hangs, we cannot always rely on the console for printing log messages Modern CPUs improve performance by using asynchronous operations on external bus or memory system (early acknowledge, OoO, etc), such systems may not hang immediately after running a ‘bad’ instruction To narrow down the cause, we need to reverse program execution flow to find hints. ● Ramoops (aka. pstore) is general framework to dump logs into persistent RAM and which survive after a restart ● Ramoops can dump: ○ Console message ○ Oops and panic log ○ Function tracing ● Ramoops function tracing is used to reverse program execution flow
  • 5. Demo for ramoops Step 1: Enable all required kconfig options CONFIG_PSTORE=y CONFIG_PSTORE_FTRACE=y CONFIG_DEBUG_FS=y Step 2: Prepare for testing Enable ramoops for function tracing # mount -t debugfs debugfs /sys/kernel/debug/ # echo 1 > /sys/kernel/debug/pstore/record_ftrace Enable watchdog # echo 1 > /dev/watchdog Step 3: Run testing case until system hangs Step 4: Reboot with watchdog timeout Step 5: Analyze tracing data # mount -t pstore pstore /mnt # cat /mnt/ftrace-ramoops-0 > tracing.log
  • 6. Demo and ramoops log CPU:0 ts:1000329 ffff0000082c87ec ffff0000082c9030 single_start <- seq_read+0x1a0/0x4c0 CPU:0 ts:1000330 ffff0000087214d4 ffff0000082c9058 dbg_ws_hang_show <- seq_read+0x1c8/0x4c0 CPU:0 ts:1000331 ffff0000080a428c ffff00000872151c __ioremap <- dbg_ws_hang_show+0x54/0xa8 CPU:0 ts:1000332 ffff0000080a41b0 ffff0000080a42a0 __ioremap_caller <- __ioremap+0x38/0x50 CPU:0 ts:1000333 ffff0000080a3c44 ffff0000080a41d8 pfn_valid <- __ioremap_caller+0x60/0xf0 CPU:0 ts:1000334 ffff0000082569c4 ffff0000080a3c4c memblock_is_map_memory <- pfn_valid+0x1c/0x30 CPU:0 ts:1000335 ffff0000082517f8 ffff0000080a41ec get_vm_area_caller <- __ioremap_caller+0x74/0xf0 CPU:0 ts:1000336 ffff000008251438 ffff00000825182c __get_vm_area_node <- get_vm_area_caller+0x54/0x68 https://youtu.be/2_SIdeQrG-Y
  • 7. Overview ● Discussion for practical scenarios ○ Finding program execution flow for system hang ○ CPU is dead, how to read current CPU state? ○ Offline analysis of program execution flow ● You can extend for your SoC
  • 8. The scenario - CPU is dead, how to read current CPU state? When a CPU dies, perhaps locked up with IRQ/FIQ masked or a stuck waiting for a bus access, the dead CPU cannot react to SMP inter-processor interrupt to dump backtrace. JTAG debugger could be used to check dead CPU state, but we can explore a more convenient method in kernel for systems that have other CPUs are still alive so can dump PC value for dead CPU. ● CORESIGHT_CPU_DEBUG: Coresight CPU debug module allows Linux SMP partners to watch each other (enhanced LOCKUP_DETECTOR). ● Can be used to dump CPU state based on ARM sample-based profiling extension. ● Last CPU PC before failure is stored in its debug unit and other processors can extract this too (no cache problems)!
  • 9. Demo for Coresight CPU debug Step 1: Enable all required kconfig options CONFIG_CORESIGHT_CPU_DEBUG=y Step 2: Prepare for testing Disable CPU idle states in command line (if need): nohlt Enable panic on RCU stall if system doesn’t support hard lock detection: # sysctl -w kernel.panic_on_rcu_stall=1 Step 3: Run test case until system panic Step 4: Read debug info logged during panic
  • 10. Demo video and Coresight CPU debug log [ 79.171171] coresight-cpu-debug f65d0000.debug: CPU[4]: [ 79.176408] coresight-cpu-debug f65d0000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) [ 79.184512] coresight-cpu-debug f65d0000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8 [ 79.193743] coresight-cpu-debug f65d0000.debug: EDCIDSR: 00000000 [ 79.199931] coresight-cpu-debug f65d0000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) [ 79.210468] coresight-cpu-debug f65d2000.debug: CPU[5]: [ 79.215705] coresight-cpu-debug f65d2000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) [ 79.223810] coresight-cpu-debug f65d2000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8 [ 79.233041] coresight-cpu-debug f65d2000.debug: EDCIDSR: 00000000 [ 79.239229] coresight-cpu-debug f65d2000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) [ 79.249765] coresight-cpu-debug f65d4000.debug: CPU[6]: [ 79.255003] coresight-cpu-debug f65d4000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) [ 79.263107] coresight-cpu-debug f65d4000.debug: EDPCSR: [<ffff00000809560c>] handle_IPI+0x27c/0x2d8 [ 79.272338] coresight-cpu-debug f65d4000.debug: EDCIDSR: 00000000 [ 79.278526] coresight-cpu-debug f65d4000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) [ 79.289062] coresight-cpu-debug f65d6000.debug: CPU[7]: [ 79.294299] coresight-cpu-debug f65d6000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) [ 79.302407] coresight-cpu-debug f65d6000.debug: EDPCSR: [<ffff00000871ee54>] cpu_lock+0x44/0x50 [ 79.311290] coresight-cpu-debug f65d6000.debug: EDCIDSR: 00000000 [ 79.317478] coresight-cpu-debug f65d6000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) https://youtu.be/mFuvWrUrlwo
  • 11. Overview ● Discussion for practical scenarios ○ Finding program execution flow for system hang ○ CPU is dead, how to know CPU current state? ○ Offline analysis of program execution flow ● You can extend for your SoC
  • 12. The scenario - Offline analysis of program execution flow If hang or panic issues happen in production release, we usually need to use analyse program execution flow offline using host tools. Ramoops with function tracing is often no enabled for production testing because it might downgrade performance seriously; and thus alter or prevent reproduction for heisenbugs. Coresight hardware tracer is a good candidate to record program execution flow with minimising overhead. ● Coresight + kdump can store hardware tracing data in dump file for kernel panic and we can rely on crash tool to extract tracing data ● Perf + OpenCSD tool decodes tracing data and outputs readable program flow ● It smoothly supports kernel panic debugging but so far it isn’t for debugging system hang ● Trick: If Coresight RAM is preserved after reset then useful Coresight tracing data can still be extracted for debugging hang issues
  • 13. Demo for Coresight + Kdump Step 1: Enable all required kconfig options CONFIG_CORESIGHT=y CONFIG_CORESIGHT_LINKS_AND_SINKS=y CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y CONFIG_CORESIGHT_SOURCE_ETM4X=y CONFIG_CORESIGHT_PANIC_KDUMP=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y Step 2: Prepare for testing Enable Coresight tracer and sink Use kexec to load capture-dump kernel and dtb Step 3: Run test case until system panic Step 4: Boot capture-dump kernel, save core file Step 5: Extract Coresight tracing data with crash # crash vmlinux vmcore crash> extend csdump.so crash> csdump out_folder Step 6: Boot original kernel for analysis with perf To avoid kernel build id mismatch when analysing coresight trace data, we can run original kernel with kernel symbol file: # perf script -v -a -F cpu,event,ip,sym,symoff -i perf.data --kallsyms /proc/kallsyms
  • 14. Demo video and perf decodes program flow # ./perf script -f -v -a -F cpu,event,ip,sym,symoff --kallsyms /proc/kallsyms build id event received for [kernel.kallsyms]: 32eeb5c9c99d00a63d0921cbd815c32385c36710 Using /proc/kcore for kernel object code Using /proc/kallsyms for symbols Frame deformatter: Found 4 FSYNCS [000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc [000] branches: ffff000008afa6e8 __delay+0x90 [000] branches: ffff000008afa6dc __delay+0x84 [000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc [000] branches: ffff000008afa6e8 __delay+0x90 [000] branches: ffff000008afa6dc __delay+0x84 [000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc [000] branches: ffff000008afa6e8 __delay+0x90 [000] branches: ffff000008afa6dc __delay+0x84 [000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc [000] branches: ffff000008afa6e8 __delay+0x90 [000] branches: ffff000008afa6dc __delay+0x84 [000] branches: ffff000008972e7c arch_counter_get_cntpct+0xc [000] branches: ffff000008afa6e8 __delay+0x90 [000] branches: ffff000008afa6dc __delay+0x84 https://youtu.be/oLlBpzVGeFU
  • 15. Overview ● Discussion for practical scenarios ○ Finding program execution flow for system hang ○ CPU is dead, how to know CPU current state? ○ Offline analysis of program execution flow ● You can extend for your SoC
  • 16. You can extend for your SoC You can extend to use other bus masters (DSP, MCU, etc) to access coresight CPU debug module to recover last PC in case all CPUs for Linux are dead. You can extend other bus masters to support support a kdump-like workflow to preserve DDR and coresight trace data (preserved after reset) which can then be analysed using open source tools. Kernel panic System hang Dump capture kernel Second boot kernel Crash Coresight tracer Perf + OpenCSD ramoops Coresight CPU debug Extended working flow MCU
  • 17. Related materials ● Coresight related patches and testing case ○ https://git.linaro.org/people/leo.yan/linux-debug-workshop.git/log/?h=acme_perf_core_cs_dev ○ Ramoops and coresight CPU debug module has been merged into mainline kernel ○ Coresight for kdump supporting patches are working in progress ● This session adds new ideas not found in our previous debugging session: BUD17-TR04: Kernel Debug Stories ○ Presented by Daniel Thompson: http://connect.linaro.org/resource/bud17/bud17-tr04/ ○ Kernel Debug Stories covers far too many techniques to allow time for live demos ;-)
  • 18. Linaro Limited Lifetime Warranty This training presentation comes with a lifetime warranty. Everyone here today can send questions about today’s session and suggestions for future topics to support@linaro.org . Member engineers can also use this address to get support on any other Linaro output. Engineers from club and core members are welcome to contact us to discuss how Support and Solutions Engineering can help you with additional services and training. With thanks to Andrew Hennigan for introducing the idea of placing a guarantee on training. Graphic by Jost, CC0-PD
  • 19. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: support@linaro.org or www.linaro.org