SlideShare a Scribd company logo
1 of 47
Kernel Debugging
Hao-Ran Liu
Choices of debugging tools
• Add debug code, recompile and run
– printk, but bug may disappear if it's timing sensitive and data is
written to a serial console
– Set console log level to 0 and use dmesg instead
• Patch code at runtime to print or gather data
– Ftrace, Kprobes
• Patch code at runtime to stop kernel and analyze
– KDB, KGDB
• Run the kernel under the control of VM like QEMU,
VirtualBox
printk()
• Kernel-space equivalent of printf()
• Each kernel message are prepended a
string representing its loglevel n
– “<n>Hello world!”
• Loglevel determines the severity of the
message
Printk loglevel
• Messages with level lower than console_loglevel are
shown to the console
• console_loglevel can be changed via
– dmesg -n level
– syslog system call
– echo n > /proc/sys/kernel/printk
Name String Meaning Alias macro
KERN_EMERG "0" Emergency messages, system is about to crash or is unstable pr_emerg()
KERN_ALERT "1" Something bad happened and action must be taken immediately pr_alert()
KERN_CRIT "2" A serious hardware/software failure pr_crit()
KERN_ERR "3" Often used by drivers to indicate difficulties with the hardware pr_err()
KERN_WARNING "4" nothing serious by itself but might indicate problems pr_warning()
KERN_NOTICE "5" Nothing serious. Often used to report security events. pr_notice()
KERN_INFO "6" Informational message e.g. startup info. at driver initialization pr_info()
KERN_DEBUG "7" Debug messages
pr_debug()if DEBUG is
defined
KERN_DEFAULT "d" The default kernel loglevel
KERN_CONT "" "continued" line after a line that had no enclosing n pr_cont()
Kernel log buffer
• kernel log buffer stores kernel messages
• It is a circular buffer. Old messages are
overwritten when the buffer is full
– Use klogd daemon to keep old msgs in a file
– Log buffer size is configurable
• Kernel log buffer can be manipulated via
syslog system call
– or dmesg command line tool
syslog system call
• int syslog(int type, char *bufp, int len)
/*
* Commands to sys_syslog:
*
* 0 -- Close the log. Currently a NOP.
* 1 -- Open the log. Currently a NOP.
* 2 -- Read from the log (wait until the buffer is nonempty)
* 3 -- Read all messages remaining in the ring buffer
* 4 -- Read and clear all messages remaining in the ring buffer
* 5 -- Clear ring buffer.
* 6 -- Disable printk to console
* 7 -- Enable printk to console
* 8 -- Set level of messages printed to console
* 9 -- Return number of unread characters in the log buffer
*/
Klogd and syslogd
• Klogd is “kernel log daemon”. It receives kernel
messages via syslog system call (or /proc/kmsg) and
redirect them to syslogd
• syslogd differentiate messages by facility.priority (ex.
LOG_KERN.LOG_ERR) and consults /etc/syslog.conf to
know how to deal with them (discard or save in a file)
Kernel
Log buffer
/proc/kmsg
sys_syslog()
klogd
syslogd
file
files
Kernel space User space
C library:
openlog()
closelog()
syslog()
other
daemons
Use printk macros
• Do not remove debug printk
– you may need it later to debug another related issue
• Undefine DEBUG to remove debug messages in
a production kernel
• For drivers, use dev_dbg() instead
Limit the rate of your printk
• Printk may overwhelm the console if
– printk in a code which get executed very often
– printk in a frequently-triggered IRQ handler (eg. Timer)
• printk_ratelimit() return 0 when message to be
printed should be surpressed
• printk_once()
– no matter how often you call it, it prints once and
never again
if (printk_ratelimit( ))
printk(KERN_NOTICE "The printer is still on firen");
printk_ratelimit() implementation
• The two variable can be modified via
/proc/sys/kernel/
/* minimum time in jiffies between messages */
int printk_ratelimit_jiffies = 5*HZ;
/* number of messages we send before ratelimiting */
int printk_ratelimit_burst = 10;
int printk_ratelimit(void)
{
return __printk_ratelimit(printk_ratelimit_jiffies,
printk_ratelimit_burst);
}
/proc file system
• A software-created, pseudo file system
• Contains many system information, ex:
– /proc/<pid>/maps
– /proc/sys/kernel/*
– /proc/interrupts
– /proc/meminfo
• Use of /proc fs is discouraged, they should
contain only information about process
• You should use sysfs or debugfs instead
debugfs
• a simple way to make information
available to user space
– Unlike sysfs, which has strict one-value-per-
file rules
– NOT a stable API for user space
– mount -t debugfs none /sys/kernel/debug
debugfs example
#include <linux/module.h>
#include <linux/debugfs.h>
#define len 200
u64 intvalue, hexvalue;
struct dentry *dirret, *fileret, *u64int, *u64hex;
char _buf[len];
static ssize_t myreader(struct file *fp,
char __user *user_buffer, size_t count, loff_t *pos)
{
char *kbuf = (char *)file_inode(fp)->i_private;
return simple_read_from_buffer(user_buffer, count, pos,
kbuf, len);
}
static ssize_t mywriter(struct file *fp,
const char __user *user_buffer, size_t count, loff_t *pos)
{
char *kbuf = (char *)file_inode(fp)->i_private;
return simple_write_to_buffer(kbuf, len, pos,
user_buffer, count);
}
static const struct file_operations fops_debug = {
.read = myreader,
.write = mywriter,
};
static int __init init_debug(void) {
/* create a directory in /sys/kernel/debug */
dirret = debugfs_create_dir(“mydebug", NULL);
if (IS_ERR_OR_NULL(dirret))
return -ENODEV;
/* create a file in the above directory
This requires read and write file operations */
fileret = debugfs_create_file("text", 0644, dirret,
_buf, &fops_debug);
/* create a file which takes in a int(64) value */
u64int = debugfs_create_u64("number", 0644, dirret,
&intvalue);
/* takes a hex decimal value */
u64hex = debugfs_create_x64("hexnum", 0644, dirret,
&hexvalue);
return 0;
}
static void __exit exit_debug(void) {
/* remove mydebug dir recursively */
debugfs_remove_recursive(dirret);
}
module_init(init_debug);
module_exit(exit_debug);
strace: system call trace
• Intercepts and records
– system calls issued by a process
– signals a process received
• Where to use
– Have a in indepth understanding of the exactly behavior of a program
– Debug the exactly argument or system call a program issued
– When you don’t have access to the source code
• Syntax
– strace [option] <command [args]>
• Common option
– -c -- count time, calls, and errors for each syscall and report summary
– -f -- follow forks
– -T -- print time spent in each syscall
– -e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...
(options: trace, abbrev, verbose, raw, signal, read, or write)
strace output example
execve("/bin/dmesg", ["dmesg"], [/* 22 vars */]) = 0
...
syslog(0x3, 0x95d3858, 0x4008) = 16384
write(1, "amily 2nIP: routing cache hash t"..., 4096amily
write(1, "to accept 2 bytes to c1bd7f9e fr"...,
...
munmap(0xb7d6b000, 4096) = 0
exit_group(0) = ?
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
92.75 0.013263 35 374 write
5.02 0.000718 718 1 syslog
0.51 0.000073 18 4 1 open
0.47 0.000067 34 2 munmap
0.41 0.000058 12 5 old_mmap
0.34 0.000048 24 2 mmap2
0.11 0.000016 4 4 fstat64
0.10 0.000015 15 1 read
0.10 0.000015 8 2 mprotect
0.08 0.000012 3 4 brk
0.04 0.000006 2 3 close
0.04 0.000006 6 1 uname
0.02 0.000003 3 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.014300 404 1 total
Kernel oops
• When kernel detects some bug in itself
– Fault: Kernel kill faulting process and try to continue
• Some locks and data structures may not be released
properly; the system cannot be trusted anymore
– Panic: system halts, usually in interrupt context or in
idle, init task where kernel think it cannot recover itself
• Oops message contains
– Error message
– Contents of registers
– Stack dump
– Function call trace
• Enable CONFIG_KALLSYMS at kernel
configuration to have symbolic call trace
(otherwise all you see are binary addresses)
Kernel Oops Example
• Code below will trigger an oops
ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
return 0;
}
struct file_operations faulty_fops = {
.read = faulty_read,
.write = faulty_write,
.owner = THIS_MODULE
};
Kernel Oops Example
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: faulty(O) bnep hci_uart btbcm bluetooth brcmfmac brcmutil
CPU: 1 PID: 835 Comm: bash Tainted: G O 4.4.21-v7+ #911
task: b6a605c0 ti: b6ae8000 task.ti: b6ae8000
PC is at faulty_write+0x18/0x20 [faulty]
pc : [<7f33c018>] lr : [<8015736c>] sp : b6ae9ed0 ip : b6ae9ee0 fp : b6ae9edc
r10: 00000000 r9 : b6ae8000 r8 : 8000fd08
r7 : b6ae9f80 r6 : 01493c08 r5 : b6ae9f80 r4 : b93953c0
r3 : b6ae9f80 r2 : 00000002 r1 : 01493c08 r0 : 00000000
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5383d Table: 36b3806a DAC: 00000055
Process bash (pid: 835, stack limit = 0xb6ae8210)
Stack: (0xb6ae9ed0 to 0xb6aea000)
9ec0: b6ae9f4c b6ae9ee0 8015736c 7f33c00c
9ee0: 00000000 0000000a b934f600 80174fc0 b6ae9f3c b6ae9f00 80174fc0 805b66fc
9f00: b6ae9f3c 801559f8 00000000 80157c34 00000000 00000000 b6ae9f44 b6ae9f28
9f20: 80155a0c 80159158 b93953c0 b93953c0 00000002 01493c08 b6ae9f80 8000fd08
9f40: b6ae9f7c b6ae9f50 80157c64 80157344 80155a0c 801752e8 b93953c0 b93953c0
9f60: 00000002 01493c08 8000fd08 b6ae8000 b6ae9fa4 b6ae9f80 801585d4 80157bd0
[<7f33c018>] (faulty_write [faulty]) from [<8015736c>] (__vfs_write+0x34/0xe8)
[<8015736c>] (__vfs_write) from [<80157c64>] (vfs_write+0xa0/0x1a8)
[<80157c64>] (vfs_write) from [<801585d4>] (SyS_write+0x54/0xb0)
[<801585d4>] (SyS_write) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c)
Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
Calling convention
• An low-level scheme for how subroutines
receive parameters from their caller and
how they return a result
• ARM 32 register allocation:
Register Use Comment
r15 Program counter
r14 Link register Used by BL instruction
r13 Stack pointer Must 8 bytes aligned
r12 For Intra procedure call
r4 to r11: For local variables Callee saved
r0 to r3 For arguments and return values Caller saved
ARM32 Calling convention
decodecode
• A script for disassembling oops code
pi@raspberrypi:~/linux $ dmesg | scripts/decodecode
[ 80.573075] Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
All code
========
0: e24cb004 sub fp, ip, #4
4: e52de004 push {lr} ; (str lr, [sp, #-4]!)
8: e8bd4000 ldmfd sp!, {lr}
c: e3a00000 mov r0, #0
10:* e5800000 str r0, [r0] <-- trapping instruction
Code starting with the faulting instruction
===========================================
0: e5800000 str r0, [r0]
Finding oops code with GDB
• Module should be compiled with “-g”
– Add “ccflags-y := -g” to module’s Makefile
pi@raspberrypi:~/sunplus/oops $ cat /proc/modules
faulty 1367 0 - Live 0x7f33c000 (O)
bnep 10340 2 - Live 0x7f335000
...
pi@raspberrypi:~/sunplus/oops $ gdb
GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1
(gdb) add-symbol-file faulty.ko 0x7f33c000
add symbol table from file "faulty.ko" at
.text_addr = 0x7f33c000
(y or n) y
Reading symbols from faulty.ko...done.
(gdb) list *0x7f33c018
0x7f33c018 is in faulty_write (/home/pi/sunplus/oops/faulty.c:51).
46
47 ssize_t faulty_write (struct file *filp, const char __user *buf,
size_t count,
48 loff_t *pos)
49 {
50 /* make a simple fault by dereferencing a NULL pointer */
51 *(int *)0 = 0;
52 return 0;
53 }
gdb – observe kernel variables
• gdb can observe variables in the kernel
• How to use?
– gdb /usr/src/linux/vmlinux /proc/kcore
– p jiffies /* print the value of jiffies variable */
– p jiffies /* you get the same value, since gdb cache value readed
from the core file */
– core-file /proc/kcore /* flush gdb cache */
– p jiffies /* you get a different value of jiffies */
• vmlinux is the name of the uncompressed ELF kernel
executable, not bzImage
• kcore represent the kernel executable in the format of a
core file
• Disadvantage
– Read-only access to the kernel
Introduction of KGDB and KDB
●
●
Linux kernel has two different debugger front ends
(kdb and kgdb) which interface to the debug core
KDB
– Use on a system console or serial console
– Not a source level debugger, aimed at doing simple
analysis or diagnosis
– Function
●
●
●
Data: Read/write memory, registers
Linux: process lists, backtrace, dmesg.
Control: set breakpoints, single step instruction
KGDB
●
●
source level debugger, used with GDB to debug a
Linux kernel
Two machines (physical or virtual) are required for
using KGDB
– Communicate via network or serial connection
– Target machine runs the kernel to be debugged
– Development machine runs a instance of GDB against
vmlinux file which contains the symbols.
KGDB Kernel Configuration (1)
●
●
●
●
CONFIG_DEBUG_INFO=y
– Required by GDB for source level debugging. This
adds debug symbols to kernel and modules (gcc -g)
CONFIG_KALLSYMS=y
– Required by KDB to access symbols by name
CONFIG_FRAME_POINTER=y
– Save frame info. in registers or stack to allows GDB to
construct stack back traces more accurately
CONFIG_DEBUG_RODATA=n
– Page tables will disallow write to kernel read-only data.
If this is enabled, you cannot use software breakpoints
KGDB Kernel Configuration (2)
●
●
●
●
●
CONFIG_EXPERIMENTAL=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
– kgdboc is a KGDB I/O driver for use KGDB/KDB over
serial console
CONFIG_SERIAL_8250=y
– Driver for standard serial ports
CONFIG_SERIAL_8250_CONSOLE=y
– Allow the use of a serial port as system console
KDB Kernel Configuration
●
●
●
KGDB must first be enabled before KDB is
enabled. To use KDB on a serial console, kgdboc
and a serial port driver are also needed
CONFIG_KGDB_KDB=y
– include kdb frontend for kgdb
CONFIG_KDB_KEYBOARD=y
– KDB can use a PS/2 type keyboard for an input device
Kernel Parameters - kgdboc
●
●
kgdboc=[kms][[,]kbd][[,]serial_device][,baud]
– Designed to work with a single serial port which is used
for your primary console and for kernel debugging
– kms (kernel mode setting) integration to allow entering
kdb on a graphic console
– Can be configured in kernel boot parameters or at
runtime with sysfs
– does not support interrupting the target via the gdb
remote protocol. You must manually send a sysrq-g
Enable / Disable kgdboc
– echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc
– echo “” > /sys/module/kgdboc/parameters/kgdboc
Kernel Parameters - kgdbwait
●
●
●
●
It makes kernel stop as early as I/O driver supports
and wait for a debugger connection during booting
of a kernel
Useful for debugging kernel initialization
Note
– A KGDB I/O driver must be compiled into kernel and
kgdbwait should always follow the parameter for
KGDB I/O driver in kernel command line
Example
– kgdboc=ttyS0,115200 kgdbwait
Using KDB on serial port
●
●
●
Configure I/O driver
– Boot kernel with kgdboc parameters or
– Configure kgdboc via sysfs
Enter the kernel debugger manually by sending a
sysrq-g or by waiting for an oops or fault
– echo g > /proc/sysrq-trigger
– Minicom: Ctrl-a, f, g
– Telnet: Ctrl-], send break<RET>, g
At KDB prompt, enter “help” to see a list of
commands, “go” to resume kernel execution
Some KDB commands
Command Usage Description
----------------------------------------------------------
md <vaddr> Display Memory Contents
mm <vaddr> <contents> Modify Memory Contents
go [<vaddr>] Continue Execution
rd Display Registers
rm <reg> <contents> Modify Registers
bt [<vaddr>] Stack traceback
help Display Help Message
kgdb Enter kgdb mode
ps [<flags>|A] Display active task list
pid <pidnum> Switch to another task
lsmod List loaded kernel modules
dmesg [lines] Display syslog buffer
kill <-signal> <pid> Send a signal to a process
summary Summarize the system
bp [<vaddr>] Set/Display breakpoints
ss Single Step
Screenshot of KDB with GDB
Using KGDB and GDB (1)
●
●
●
Configure kgdboc
– kgdb, like kdb will only hook up to the kernel trap
hooks if a KGDB I/O driver is loaded and configured
Stop kernel execution
– Send a sysrq-g, if you see a kdb prompt, enter “kgdb”
– or you can use kgdbwait for debugging kernel boot.
Connect from from gdb
Serial port TCP port
$ gdb ./vmlinux
(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyS0
$ gdb ./vmlinux
(gdb) target remote 192.168.1.99:1234
Using KGDB and GDB (2)
● Reminder
– If you “continue” in gdb, and need to "break in" again,
you need to issue another sysrq-g
– You can put a breakpoint at sys_sync and then run
"sync" from a shell to break into the debugger
Screenshot of KGDB and GDB
Kernel profiling with perf
• perf is a command-line profiling tool based on
perf_events kernel interface
– It’s event-based sampling.
When a PMU counter overflows, a sample is recorded.
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
data Data file related processing
diff Read perf.data files and display the differential profile
evlist List the event names in a perf.data file
kmem Tool to trace/measure kernel memory properties
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
trace strace inspired tool
probe Define new dynamic tracepoints
Use perf_events for CPU profiling
• Flame Graphs visualize profiled code
$ git clone --depth 1 https://github.com/brendangregg/FlameGraph
$ sudo perf record -F 99 -a -g -- sleep 30
$ perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > perf.svg
Example of perf report
$ pi@raspberrypi:~/sunplus $ sudo perf record -g -a sleep 10
$ pi@raspberrypi:~/sunplus $ sudo perf report
Samples: 5K of event 'cycles:ppp', Event count (approx.): 184814613
Children Self Command Shared Object Symbol
+ 83.86% 1.97% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 70.22% 0.00% swapper [kernel.kallsyms] [k] secondary_start_kernel
+ 70.22% 0.00% swapper [unknown] [k] 0x000095ac
+ 67.09% 0.45% swapper [kernel.kallsyms] [k] default_idle_call
+ 66.11% 61.30% swapper [kernel.kallsyms] [k] arch_cpu_idle
...
$pi@raspberrypi:~/sunplus $ sudo perf kmem record
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.199 MB perf.data (814 samples) ]
pi@raspberrypi:~/sunplus $ sudo perf kmem stat --caller
Failed to read max nodes, using default of 8
---------------------------------------------------------------------------------------------------------
Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag
---------------------------------------------------------------------------------------------------------
kthread_create_on_node+5c | 64/64 | 28/28 | 1 | 0 | 56.250%
bcm2835_dma_create_cb_chain+54 | 832/277 | 560/186 | 3 | 3 | 32.692%
alloc_worker+30 | 128/128 | 88/88 | 1 | 0 | 31.250%
alloc_skb_with_frags+58 | 512/512 | 384/384 | 1 | 0 | 25.000%
...
SUMMARY (SLAB allocator)
========================
Total bytes requested: 419,528
Total bytes allocated: 420,216
Total bytes wasted on internal fragmentation: 688
Internal fragmentation: 0.163725%
Cross CPU allocations: 0/326
ftrace
• Useful for event tracing, analyzing latencies and performance issues
• The proc sysctl ftrace_enable is a big on/off switch. Default is enabled
– To disable: echo 0 > /proc/sys/kernel/ftrace_enabled
• Summary of /sys/kernel/debug/tracing
Filename Description
current_tracer Set or display the current tracer that is configured
available_tracers Tracers listed here can be configured by echoing their name into current_tracer
tracing_on Enable or disables writing to the ring buffer (tracing overhead may still be occurring)
trace Output of the trace in a human readable format
tracing_max_latency Some of the tracers record the max latency. For example, the time interrupts are disabled.
tracing_thresh Latency tracers will record a trace whenever the latency is greater than the number (in ms)
in this file
set_ftrace_pid Have the function tracer only trace a single thread
set_graph_function Set a "trigger" function where tracing should start with the function graph tracer
stack_trace The stack back trace of the largest stack that was encountered when the stack tracer is
activated
trace_marker This is a very useful file for synchronizing user space with events happening in the kernel.
Writing strings into this file will be written into the ftrace buffer
List of tracers
Name of tracers Description
function Function call tracer to trace all kernel functions
function_graph Trace both entry and exit of the functions. It then provides the ability to draw a
graph of function calls like C source code
irqsoff Traces the areas that disable interrupts and saves the trace with the longest
max latency. See tracing_max_latency.
preemptoff Traces and records the amount of time for which preemption is disabled.
preemptirqsoff Traces and records the largest time for which irqs and/or preemption is
disabled.
wakeup Traces and records the max latency that it takes for the highest priority task to
get scheduled after it has been woken up.
wakeup_rt Traces and records the max latency that it takes for just RT tasks
nop To remove all tracers from tracing simply echo "nop" into current_tracer
Example of function tracer
# echo SyS_nanosleep hrtimer_interrupt > set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 5/5 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath
<idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt
usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
<idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
<idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt
Note: function tracer uses ring buffers to store
entries. The newest data may overwrite the
oldest data.Sometimes using echo to stop the
trace is not sufficient because the tracing could
have overwritten the data that you wanted to
record. For this reason, it is sometimes better
to disable tracing directly from a program.
Example of function-graph
tracer
• This tracer can also measure execution time of a function
• To trace only one function and all of its children:
# echo __do_fault > set_graph_function
# echo function_graph > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
#
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | __do_fault() {
0) | filemap_fault() {
0) 0.408 us | find_get_page();
0) 0.085 us | _cond_resched();
0) 2.462 us | }
0) 0.087 us | _raw_spin_lock();
0) 0.104 us | add_mm_counter_fast();
0) 0.106 us | page_add_file_rmap();
0) 0.090 us | _raw_spin_unlock();
0) | unlock_page() {
0) 0.103 us | page_waitqueue();
0) 0.146 us | __wake_up_bit();
0) 1.508 us | }
0) 8.403 us | }
Example of irqsoff tracer
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: run_timer_softirq
# => ended at: run_timer_softirq
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
#  / |||||  | /
<idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq
<idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq
<idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq
<idle>-0 0dNs3 25us : <stack trace>
=> _raw_spin_unlock_irq
=> run_timer_softirq
=> __do_softirq
...
# echo 0 > options/function-trace
# echo irqsoff > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
Note the above example had function-trace not set. If we set
function-trace, we get a much larger output
Example of stack tracer
• ftrace makes it convenient to check the stack size at
every function call
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
After running it for a few minutes, the output looks like:
# cat stack_max_size
2928
# cat stack_trace
Depth Size Location (18 entries)
----- ---- --------
0) 2928 224 update_sd_lb_stats+0xbc/0x4ac
1) 2704 160 find_busiest_group+0x31/0x1f1
2) 2544 256 load_balance+0xd9/0x662
3) 2288 80 idle_balance+0xbb/0x130
4) 2208 128 __schedule+0x26e/0x5b9
5) 2080 16 schedule+0x64/0x66
6) 2064 128 schedule_timeout+0x34/0xe0
7) 1936 112 wait_for_common+0x97/0xf1
8) 1824 16 wait_for_completion+0x1d/0x1f
9) 1808 128 flush_work+0xfe/0x119
10) 1680 16 tty_flush_to_ldisc+0x1e/0x20
11) 1664 48 input_available_p+0x1d/0x5c
12) 1616 48 n_tty_poll+0x6d/0x134
13) 1568 64 tty_poll+0x64/0x7f
14) 1504 880 do_select+0x31e/0x511
15) 624 400 core_sys_select+0x177/0x216
16) 224 96 sys_select+0x91/0xb9
17) 128 128 system_call_fastpath+0x16/0x1b
ftrace homework
• Read https://www.kernel.org/doc/Documentation/trace/events.txt
This document is about event tracing (static tracepoints)
• perf-tools is a collection of performance analysis tools for Linux
ftrace and perf_events. Try to find a good use of it in your work. You
can download it from https://github.com/brendangregg/perf-tools.git
• Write a small program using ftrace to track the number of context
switches per second for each CPU.
$ sudo ./ftrace_ctxt_switches.py
...
Duration (sec): 61.386, Context switches (per sec): CPU0: 1130 ( 18) CPU1: 5875 ( 96) CPU2: 183 ( 3) CPU3: 230 ( 4)
Duration (sec): 63.784, Context switches (per sec): CPU0: 1138 ( 18) CPU1: 6028 ( 95) CPU2: 188 ( 3) CPU3: 230 ( 4)
...
References
• Linux Device Drivers, 3rd Edition, Jonathan
Corbet
• Linux kernel source,
http://lxr.free-electrons.com
• Choose a Linux tracer, Brendan Gregg
– http://www.brendangregg.com/blog/2015-07-
08/choosing-a-linux-tracer.html
• KDB and KGDB kernel documentation
– http://kernel.org/pub/linux/kernel/people/jwessel/kdb/

More Related Content

What's hot

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardKernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardAnne Nicolas
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisPaul V. Novarese
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Introduction to char device driver
Introduction to char device driverIntroduction to char device driver
Introduction to char device driverVandana Salve
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_BootingRashila Rr
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data RepresentationWang Hsiangkai
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device driversHoucheng Lin
 
linux device driver
linux device driverlinux device driver
linux device driverRahul Batra
 
Jagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratchJagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratchlinuxlab_conf
 

What's hot (20)

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime RipardKernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
Kernel Recipes 2017 - An introduction to the Linux DRM subsystem - Maxime Ripard
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Introduction to char device driver
Introduction to char device driverIntroduction to char device driver
Introduction to char device driver
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_Booting
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data Representation
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
linux device driver
linux device driverlinux device driver
linux device driver
 
Jagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratchJagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratch
 

Similar to Kernel Debugging Tools and Techniques

Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSESUSE Labs Taipei
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentMohammed Farrag
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxTushar B Kute
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricksdhorvath
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questionsTeja Bheemanapally
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and TricksDavid Horvath
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelVitaly Nikolenko
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreKim Phillips
 
Writing Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxWriting Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxRajKumar Rampelli
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 

Similar to Kernel Debugging Tools and Techniques (20)

Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSE
 
Driver_linux
Driver_linuxDriver_linux
Driver_linux
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
LINUX Device Drivers
LINUX Device DriversLINUX Device Drivers
LINUX Device Drivers
 
202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Ui disk & terminal drivers
Ui disk & terminal driversUi disk & terminal drivers
Ui disk & terminal drivers
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
 
Writing Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxWriting Character driver (loadable module) in linux
Writing Character driver (loadable module) in linux
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 

Recently uploaded

KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 

Recently uploaded (20)

KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 

Kernel Debugging Tools and Techniques

  • 2. Choices of debugging tools • Add debug code, recompile and run – printk, but bug may disappear if it's timing sensitive and data is written to a serial console – Set console log level to 0 and use dmesg instead • Patch code at runtime to print or gather data – Ftrace, Kprobes • Patch code at runtime to stop kernel and analyze – KDB, KGDB • Run the kernel under the control of VM like QEMU, VirtualBox
  • 3. printk() • Kernel-space equivalent of printf() • Each kernel message are prepended a string representing its loglevel n – “<n>Hello world!” • Loglevel determines the severity of the message
  • 4. Printk loglevel • Messages with level lower than console_loglevel are shown to the console • console_loglevel can be changed via – dmesg -n level – syslog system call – echo n > /proc/sys/kernel/printk Name String Meaning Alias macro KERN_EMERG "0" Emergency messages, system is about to crash or is unstable pr_emerg() KERN_ALERT "1" Something bad happened and action must be taken immediately pr_alert() KERN_CRIT "2" A serious hardware/software failure pr_crit() KERN_ERR "3" Often used by drivers to indicate difficulties with the hardware pr_err() KERN_WARNING "4" nothing serious by itself but might indicate problems pr_warning() KERN_NOTICE "5" Nothing serious. Often used to report security events. pr_notice() KERN_INFO "6" Informational message e.g. startup info. at driver initialization pr_info() KERN_DEBUG "7" Debug messages pr_debug()if DEBUG is defined KERN_DEFAULT "d" The default kernel loglevel KERN_CONT "" "continued" line after a line that had no enclosing n pr_cont()
  • 5. Kernel log buffer • kernel log buffer stores kernel messages • It is a circular buffer. Old messages are overwritten when the buffer is full – Use klogd daemon to keep old msgs in a file – Log buffer size is configurable • Kernel log buffer can be manipulated via syslog system call – or dmesg command line tool
  • 6. syslog system call • int syslog(int type, char *bufp, int len) /* * Commands to sys_syslog: * * 0 -- Close the log. Currently a NOP. * 1 -- Open the log. Currently a NOP. * 2 -- Read from the log (wait until the buffer is nonempty) * 3 -- Read all messages remaining in the ring buffer * 4 -- Read and clear all messages remaining in the ring buffer * 5 -- Clear ring buffer. * 6 -- Disable printk to console * 7 -- Enable printk to console * 8 -- Set level of messages printed to console * 9 -- Return number of unread characters in the log buffer */
  • 7. Klogd and syslogd • Klogd is “kernel log daemon”. It receives kernel messages via syslog system call (or /proc/kmsg) and redirect them to syslogd • syslogd differentiate messages by facility.priority (ex. LOG_KERN.LOG_ERR) and consults /etc/syslog.conf to know how to deal with them (discard or save in a file) Kernel Log buffer /proc/kmsg sys_syslog() klogd syslogd file files Kernel space User space C library: openlog() closelog() syslog() other daemons
  • 8. Use printk macros • Do not remove debug printk – you may need it later to debug another related issue • Undefine DEBUG to remove debug messages in a production kernel • For drivers, use dev_dbg() instead
  • 9. Limit the rate of your printk • Printk may overwhelm the console if – printk in a code which get executed very often – printk in a frequently-triggered IRQ handler (eg. Timer) • printk_ratelimit() return 0 when message to be printed should be surpressed • printk_once() – no matter how often you call it, it prints once and never again if (printk_ratelimit( )) printk(KERN_NOTICE "The printer is still on firen");
  • 10. printk_ratelimit() implementation • The two variable can be modified via /proc/sys/kernel/ /* minimum time in jiffies between messages */ int printk_ratelimit_jiffies = 5*HZ; /* number of messages we send before ratelimiting */ int printk_ratelimit_burst = 10; int printk_ratelimit(void) { return __printk_ratelimit(printk_ratelimit_jiffies, printk_ratelimit_burst); }
  • 11. /proc file system • A software-created, pseudo file system • Contains many system information, ex: – /proc/<pid>/maps – /proc/sys/kernel/* – /proc/interrupts – /proc/meminfo • Use of /proc fs is discouraged, they should contain only information about process • You should use sysfs or debugfs instead
  • 12. debugfs • a simple way to make information available to user space – Unlike sysfs, which has strict one-value-per- file rules – NOT a stable API for user space – mount -t debugfs none /sys/kernel/debug
  • 13. debugfs example #include <linux/module.h> #include <linux/debugfs.h> #define len 200 u64 intvalue, hexvalue; struct dentry *dirret, *fileret, *u64int, *u64hex; char _buf[len]; static ssize_t myreader(struct file *fp, char __user *user_buffer, size_t count, loff_t *pos) { char *kbuf = (char *)file_inode(fp)->i_private; return simple_read_from_buffer(user_buffer, count, pos, kbuf, len); } static ssize_t mywriter(struct file *fp, const char __user *user_buffer, size_t count, loff_t *pos) { char *kbuf = (char *)file_inode(fp)->i_private; return simple_write_to_buffer(kbuf, len, pos, user_buffer, count); } static const struct file_operations fops_debug = { .read = myreader, .write = mywriter, }; static int __init init_debug(void) { /* create a directory in /sys/kernel/debug */ dirret = debugfs_create_dir(“mydebug", NULL); if (IS_ERR_OR_NULL(dirret)) return -ENODEV; /* create a file in the above directory This requires read and write file operations */ fileret = debugfs_create_file("text", 0644, dirret, _buf, &fops_debug); /* create a file which takes in a int(64) value */ u64int = debugfs_create_u64("number", 0644, dirret, &intvalue); /* takes a hex decimal value */ u64hex = debugfs_create_x64("hexnum", 0644, dirret, &hexvalue); return 0; } static void __exit exit_debug(void) { /* remove mydebug dir recursively */ debugfs_remove_recursive(dirret); } module_init(init_debug); module_exit(exit_debug);
  • 14. strace: system call trace • Intercepts and records – system calls issued by a process – signals a process received • Where to use – Have a in indepth understanding of the exactly behavior of a program – Debug the exactly argument or system call a program issued – When you don’t have access to the source code • Syntax – strace [option] <command [args]> • Common option – -c -- count time, calls, and errors for each syscall and report summary – -f -- follow forks – -T -- print time spent in each syscall – -e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]... (options: trace, abbrev, verbose, raw, signal, read, or write)
  • 15. strace output example execve("/bin/dmesg", ["dmesg"], [/* 22 vars */]) = 0 ... syslog(0x3, 0x95d3858, 0x4008) = 16384 write(1, "amily 2nIP: routing cache hash t"..., 4096amily write(1, "to accept 2 bytes to c1bd7f9e fr"..., ... munmap(0xb7d6b000, 4096) = 0 exit_group(0) = ? % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 92.75 0.013263 35 374 write 5.02 0.000718 718 1 syslog 0.51 0.000073 18 4 1 open 0.47 0.000067 34 2 munmap 0.41 0.000058 12 5 old_mmap 0.34 0.000048 24 2 mmap2 0.11 0.000016 4 4 fstat64 0.10 0.000015 15 1 read 0.10 0.000015 8 2 mprotect 0.08 0.000012 3 4 brk 0.04 0.000006 2 3 close 0.04 0.000006 6 1 uname 0.02 0.000003 3 1 set_thread_area ------ ----------- ----------- --------- --------- ---------------- 100.00 0.014300 404 1 total
  • 16. Kernel oops • When kernel detects some bug in itself – Fault: Kernel kill faulting process and try to continue • Some locks and data structures may not be released properly; the system cannot be trusted anymore – Panic: system halts, usually in interrupt context or in idle, init task where kernel think it cannot recover itself • Oops message contains – Error message – Contents of registers – Stack dump – Function call trace • Enable CONFIG_KALLSYMS at kernel configuration to have symbolic call trace (otherwise all you see are binary addresses)
  • 17. Kernel Oops Example • Code below will trigger an oops ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count, loff_t *pos) { /* make a simple fault by dereferencing a NULL pointer */ *(int *)0 = 0; return 0; } struct file_operations faulty_fops = { .read = faulty_read, .write = faulty_write, .owner = THIS_MODULE };
  • 18. Kernel Oops Example Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 817 [#1] SMP ARM Modules linked in: faulty(O) bnep hci_uart btbcm bluetooth brcmfmac brcmutil CPU: 1 PID: 835 Comm: bash Tainted: G O 4.4.21-v7+ #911 task: b6a605c0 ti: b6ae8000 task.ti: b6ae8000 PC is at faulty_write+0x18/0x20 [faulty] pc : [<7f33c018>] lr : [<8015736c>] sp : b6ae9ed0 ip : b6ae9ee0 fp : b6ae9edc r10: 00000000 r9 : b6ae8000 r8 : 8000fd08 r7 : b6ae9f80 r6 : 01493c08 r5 : b6ae9f80 r4 : b93953c0 r3 : b6ae9f80 r2 : 00000002 r1 : 01493c08 r0 : 00000000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5383d Table: 36b3806a DAC: 00000055 Process bash (pid: 835, stack limit = 0xb6ae8210) Stack: (0xb6ae9ed0 to 0xb6aea000) 9ec0: b6ae9f4c b6ae9ee0 8015736c 7f33c00c 9ee0: 00000000 0000000a b934f600 80174fc0 b6ae9f3c b6ae9f00 80174fc0 805b66fc 9f00: b6ae9f3c 801559f8 00000000 80157c34 00000000 00000000 b6ae9f44 b6ae9f28 9f20: 80155a0c 80159158 b93953c0 b93953c0 00000002 01493c08 b6ae9f80 8000fd08 9f40: b6ae9f7c b6ae9f50 80157c64 80157344 80155a0c 801752e8 b93953c0 b93953c0 9f60: 00000002 01493c08 8000fd08 b6ae8000 b6ae9fa4 b6ae9f80 801585d4 80157bd0 [<7f33c018>] (faulty_write [faulty]) from [<8015736c>] (__vfs_write+0x34/0xe8) [<8015736c>] (__vfs_write) from [<80157c64>] (vfs_write+0xa0/0x1a8) [<80157c64>] (vfs_write) from [<801585d4>] (SyS_write+0x54/0xb0) [<801585d4>] (SyS_write) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c) Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
  • 19. Calling convention • An low-level scheme for how subroutines receive parameters from their caller and how they return a result • ARM 32 register allocation: Register Use Comment r15 Program counter r14 Link register Used by BL instruction r13 Stack pointer Must 8 bytes aligned r12 For Intra procedure call r4 to r11: For local variables Callee saved r0 to r3 For arguments and return values Caller saved
  • 21. decodecode • A script for disassembling oops code pi@raspberrypi:~/linux $ dmesg | scripts/decodecode [ 80.573075] Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000) All code ======== 0: e24cb004 sub fp, ip, #4 4: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8: e8bd4000 ldmfd sp!, {lr} c: e3a00000 mov r0, #0 10:* e5800000 str r0, [r0] <-- trapping instruction Code starting with the faulting instruction =========================================== 0: e5800000 str r0, [r0]
  • 22. Finding oops code with GDB • Module should be compiled with “-g” – Add “ccflags-y := -g” to module’s Makefile pi@raspberrypi:~/sunplus/oops $ cat /proc/modules faulty 1367 0 - Live 0x7f33c000 (O) bnep 10340 2 - Live 0x7f335000 ... pi@raspberrypi:~/sunplus/oops $ gdb GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1 (gdb) add-symbol-file faulty.ko 0x7f33c000 add symbol table from file "faulty.ko" at .text_addr = 0x7f33c000 (y or n) y Reading symbols from faulty.ko...done. (gdb) list *0x7f33c018 0x7f33c018 is in faulty_write (/home/pi/sunplus/oops/faulty.c:51). 46 47 ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count, 48 loff_t *pos) 49 { 50 /* make a simple fault by dereferencing a NULL pointer */ 51 *(int *)0 = 0; 52 return 0; 53 }
  • 23. gdb – observe kernel variables • gdb can observe variables in the kernel • How to use? – gdb /usr/src/linux/vmlinux /proc/kcore – p jiffies /* print the value of jiffies variable */ – p jiffies /* you get the same value, since gdb cache value readed from the core file */ – core-file /proc/kcore /* flush gdb cache */ – p jiffies /* you get a different value of jiffies */ • vmlinux is the name of the uncompressed ELF kernel executable, not bzImage • kcore represent the kernel executable in the format of a core file • Disadvantage – Read-only access to the kernel
  • 24. Introduction of KGDB and KDB ● ● Linux kernel has two different debugger front ends (kdb and kgdb) which interface to the debug core KDB – Use on a system console or serial console – Not a source level debugger, aimed at doing simple analysis or diagnosis – Function ● ● ● Data: Read/write memory, registers Linux: process lists, backtrace, dmesg. Control: set breakpoints, single step instruction
  • 25. KGDB ● ● source level debugger, used with GDB to debug a Linux kernel Two machines (physical or virtual) are required for using KGDB – Communicate via network or serial connection – Target machine runs the kernel to be debugged – Development machine runs a instance of GDB against vmlinux file which contains the symbols.
  • 26. KGDB Kernel Configuration (1) ● ● ● ● CONFIG_DEBUG_INFO=y – Required by GDB for source level debugging. This adds debug symbols to kernel and modules (gcc -g) CONFIG_KALLSYMS=y – Required by KDB to access symbols by name CONFIG_FRAME_POINTER=y – Save frame info. in registers or stack to allows GDB to construct stack back traces more accurately CONFIG_DEBUG_RODATA=n – Page tables will disallow write to kernel read-only data. If this is enabled, you cannot use software breakpoints
  • 27. KGDB Kernel Configuration (2) ● ● ● ● ● CONFIG_EXPERIMENTAL=y CONFIG_KGDB=y CONFIG_KGDB_SERIAL_CONSOLE=y – kgdboc is a KGDB I/O driver for use KGDB/KDB over serial console CONFIG_SERIAL_8250=y – Driver for standard serial ports CONFIG_SERIAL_8250_CONSOLE=y – Allow the use of a serial port as system console
  • 28. KDB Kernel Configuration ● ● ● KGDB must first be enabled before KDB is enabled. To use KDB on a serial console, kgdboc and a serial port driver are also needed CONFIG_KGDB_KDB=y – include kdb frontend for kgdb CONFIG_KDB_KEYBOARD=y – KDB can use a PS/2 type keyboard for an input device
  • 29. Kernel Parameters - kgdboc ● ● kgdboc=[kms][[,]kbd][[,]serial_device][,baud] – Designed to work with a single serial port which is used for your primary console and for kernel debugging – kms (kernel mode setting) integration to allow entering kdb on a graphic console – Can be configured in kernel boot parameters or at runtime with sysfs – does not support interrupting the target via the gdb remote protocol. You must manually send a sysrq-g Enable / Disable kgdboc – echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc – echo “” > /sys/module/kgdboc/parameters/kgdboc
  • 30. Kernel Parameters - kgdbwait ● ● ● ● It makes kernel stop as early as I/O driver supports and wait for a debugger connection during booting of a kernel Useful for debugging kernel initialization Note – A KGDB I/O driver must be compiled into kernel and kgdbwait should always follow the parameter for KGDB I/O driver in kernel command line Example – kgdboc=ttyS0,115200 kgdbwait
  • 31. Using KDB on serial port ● ● ● Configure I/O driver – Boot kernel with kgdboc parameters or – Configure kgdboc via sysfs Enter the kernel debugger manually by sending a sysrq-g or by waiting for an oops or fault – echo g > /proc/sysrq-trigger – Minicom: Ctrl-a, f, g – Telnet: Ctrl-], send break<RET>, g At KDB prompt, enter “help” to see a list of commands, “go” to resume kernel execution
  • 32. Some KDB commands Command Usage Description ---------------------------------------------------------- md <vaddr> Display Memory Contents mm <vaddr> <contents> Modify Memory Contents go [<vaddr>] Continue Execution rd Display Registers rm <reg> <contents> Modify Registers bt [<vaddr>] Stack traceback help Display Help Message kgdb Enter kgdb mode ps [<flags>|A] Display active task list pid <pidnum> Switch to another task lsmod List loaded kernel modules dmesg [lines] Display syslog buffer kill <-signal> <pid> Send a signal to a process summary Summarize the system bp [<vaddr>] Set/Display breakpoints ss Single Step
  • 33. Screenshot of KDB with GDB
  • 34. Using KGDB and GDB (1) ● ● ● Configure kgdboc – kgdb, like kdb will only hook up to the kernel trap hooks if a KGDB I/O driver is loaded and configured Stop kernel execution – Send a sysrq-g, if you see a kdb prompt, enter “kgdb” – or you can use kgdbwait for debugging kernel boot. Connect from from gdb Serial port TCP port $ gdb ./vmlinux (gdb) set remotebaud 115200 (gdb) target remote /dev/ttyS0 $ gdb ./vmlinux (gdb) target remote 192.168.1.99:1234
  • 35. Using KGDB and GDB (2) ● Reminder – If you “continue” in gdb, and need to "break in" again, you need to issue another sysrq-g – You can put a breakpoint at sys_sync and then run "sync" from a shell to break into the debugger
  • 37. Kernel profiling with perf • perf is a command-line profiling tool based on perf_events kernel interface – It’s event-based sampling. When a PMU counter overflows, a sample is recorded. usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file data Data file related processing diff Read perf.data files and display the differential profile evlist List the event names in a perf.data file kmem Tool to trace/measure kernel memory properties list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) script Read perf.data (created by perf record) and display trace output stat Run a command and gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace strace inspired tool probe Define new dynamic tracepoints
  • 38. Use perf_events for CPU profiling • Flame Graphs visualize profiled code $ git clone --depth 1 https://github.com/brendangregg/FlameGraph $ sudo perf record -F 99 -a -g -- sleep 30 $ perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > perf.svg
  • 39. Example of perf report $ pi@raspberrypi:~/sunplus $ sudo perf record -g -a sleep 10 $ pi@raspberrypi:~/sunplus $ sudo perf report Samples: 5K of event 'cycles:ppp', Event count (approx.): 184814613 Children Self Command Shared Object Symbol + 83.86% 1.97% swapper [kernel.kallsyms] [k] cpu_startup_entry + 70.22% 0.00% swapper [kernel.kallsyms] [k] secondary_start_kernel + 70.22% 0.00% swapper [unknown] [k] 0x000095ac + 67.09% 0.45% swapper [kernel.kallsyms] [k] default_idle_call + 66.11% 61.30% swapper [kernel.kallsyms] [k] arch_cpu_idle ... $pi@raspberrypi:~/sunplus $ sudo perf kmem record ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.199 MB perf.data (814 samples) ] pi@raspberrypi:~/sunplus $ sudo perf kmem stat --caller Failed to read max nodes, using default of 8 --------------------------------------------------------------------------------------------------------- Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag --------------------------------------------------------------------------------------------------------- kthread_create_on_node+5c | 64/64 | 28/28 | 1 | 0 | 56.250% bcm2835_dma_create_cb_chain+54 | 832/277 | 560/186 | 3 | 3 | 32.692% alloc_worker+30 | 128/128 | 88/88 | 1 | 0 | 31.250% alloc_skb_with_frags+58 | 512/512 | 384/384 | 1 | 0 | 25.000% ... SUMMARY (SLAB allocator) ======================== Total bytes requested: 419,528 Total bytes allocated: 420,216 Total bytes wasted on internal fragmentation: 688 Internal fragmentation: 0.163725% Cross CPU allocations: 0/326
  • 40. ftrace • Useful for event tracing, analyzing latencies and performance issues • The proc sysctl ftrace_enable is a big on/off switch. Default is enabled – To disable: echo 0 > /proc/sys/kernel/ftrace_enabled • Summary of /sys/kernel/debug/tracing Filename Description current_tracer Set or display the current tracer that is configured available_tracers Tracers listed here can be configured by echoing their name into current_tracer tracing_on Enable or disables writing to the ring buffer (tracing overhead may still be occurring) trace Output of the trace in a human readable format tracing_max_latency Some of the tracers record the max latency. For example, the time interrupts are disabled. tracing_thresh Latency tracers will record a trace whenever the latency is greater than the number (in ms) in this file set_ftrace_pid Have the function tracer only trace a single thread set_graph_function Set a "trigger" function where tracing should start with the function graph tracer stack_trace The stack back trace of the largest stack that was encountered when the stack tracer is activated trace_marker This is a very useful file for synchronizing user space with events happening in the kernel. Writing strings into this file will be written into the ftrace buffer
  • 41. List of tracers Name of tracers Description function Function call tracer to trace all kernel functions function_graph Trace both entry and exit of the functions. It then provides the ability to draw a graph of function calls like C source code irqsoff Traces the areas that disable interrupts and saves the trace with the longest max latency. See tracing_max_latency. preemptoff Traces and records the amount of time for which preemption is disabled. preemptirqsoff Traces and records the largest time for which irqs and/or preemption is disabled. wakeup Traces and records the max latency that it takes for the highest priority task to get scheduled after it has been woken up. wakeup_rt Traces and records the max latency that it takes for just RT tasks nop To remove all tracers from tracing simply echo "nop" into current_tracer
  • 42. Example of function tracer # echo SyS_nanosleep hrtimer_interrupt > set_ftrace_filter # echo function > current_tracer # echo 1 > tracing_on # usleep 1 # echo 0 > tracing_on # cat trace # tracer: function # # entries-in-buffer/entries-written: 5/5 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath <idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt <idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt <idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt Note: function tracer uses ring buffers to store entries. The newest data may overwrite the oldest data.Sometimes using echo to stop the trace is not sufficient because the tracing could have overwritten the data that you wanted to record. For this reason, it is sometimes better to disable tracing directly from a program.
  • 43. Example of function-graph tracer • This tracer can also measure execution time of a function • To trace only one function and all of its children: # echo __do_fault > set_graph_function # echo function_graph > current_tracer # echo 1 > tracing_on # usleep 1 # echo 0 > tracing_on # cat trace # # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) | __do_fault() { 0) | filemap_fault() { 0) 0.408 us | find_get_page(); 0) 0.085 us | _cond_resched(); 0) 2.462 us | } 0) 0.087 us | _raw_spin_lock(); 0) 0.104 us | add_mm_counter_fast(); 0) 0.106 us | page_add_file_rmap(); 0) 0.090 us | _raw_spin_unlock(); 0) | unlock_page() { 0) 0.103 us | page_waitqueue(); 0) 0.146 us | __wake_up_bit(); 0) 1.508 us | } 0) 8.403 us | }
  • 44. Example of irqsoff tracer # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 3.8.0-test+ # -------------------------------------------------------------------- # latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: run_timer_softirq # => ended at: run_timer_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # / ||||| | / <idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq <idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq <idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq <idle>-0 0dNs3 25us : <stack trace> => _raw_spin_unlock_irq => run_timer_softirq => __do_softirq ... # echo 0 > options/function-trace # echo irqsoff > current_tracer # echo 1 > tracing_on # echo 0 > tracing_max_latency # ls -ltr [...] # echo 0 > tracing_on # cat trace Note the above example had function-trace not set. If we set function-trace, we get a much larger output
  • 45. Example of stack tracer • ftrace makes it convenient to check the stack size at every function call # echo 1 > /proc/sys/kernel/stack_tracer_enabled After running it for a few minutes, the output looks like: # cat stack_max_size 2928 # cat stack_trace Depth Size Location (18 entries) ----- ---- -------- 0) 2928 224 update_sd_lb_stats+0xbc/0x4ac 1) 2704 160 find_busiest_group+0x31/0x1f1 2) 2544 256 load_balance+0xd9/0x662 3) 2288 80 idle_balance+0xbb/0x130 4) 2208 128 __schedule+0x26e/0x5b9 5) 2080 16 schedule+0x64/0x66 6) 2064 128 schedule_timeout+0x34/0xe0 7) 1936 112 wait_for_common+0x97/0xf1 8) 1824 16 wait_for_completion+0x1d/0x1f 9) 1808 128 flush_work+0xfe/0x119 10) 1680 16 tty_flush_to_ldisc+0x1e/0x20 11) 1664 48 input_available_p+0x1d/0x5c 12) 1616 48 n_tty_poll+0x6d/0x134 13) 1568 64 tty_poll+0x64/0x7f 14) 1504 880 do_select+0x31e/0x511 15) 624 400 core_sys_select+0x177/0x216 16) 224 96 sys_select+0x91/0xb9 17) 128 128 system_call_fastpath+0x16/0x1b
  • 46. ftrace homework • Read https://www.kernel.org/doc/Documentation/trace/events.txt This document is about event tracing (static tracepoints) • perf-tools is a collection of performance analysis tools for Linux ftrace and perf_events. Try to find a good use of it in your work. You can download it from https://github.com/brendangregg/perf-tools.git • Write a small program using ftrace to track the number of context switches per second for each CPU. $ sudo ./ftrace_ctxt_switches.py ... Duration (sec): 61.386, Context switches (per sec): CPU0: 1130 ( 18) CPU1: 5875 ( 96) CPU2: 183 ( 3) CPU3: 230 ( 4) Duration (sec): 63.784, Context switches (per sec): CPU0: 1138 ( 18) CPU1: 6028 ( 95) CPU2: 188 ( 3) CPU3: 230 ( 4) ...
  • 47. References • Linux Device Drivers, 3rd Edition, Jonathan Corbet • Linux kernel source, http://lxr.free-electrons.com • Choose a Linux tracer, Brendan Gregg – http://www.brendangregg.com/blog/2015-07- 08/choosing-a-linux-tracer.html • KDB and KGDB kernel documentation – http://kernel.org/pub/linux/kernel/people/jwessel/kdb/