printk() considered harmful

printk() considered harmful
KS/LPC tech topic
November 2016
Sergey Senozhatsky
Samsung Electronics

Table of contents 1. Deadlocks and recursion in printk()
2. Async printk()
3. pr_cont() *
4. Console semaphore *
* bonus topics

1. Deadlocks in printk()
● There are several scenarios that can cause deadlocks in printk()
● For instance:
○ Re-entrant printk() [resolved]
■ Per-cpu buffers for safe printk() from NMI context
○ Recursions in printk() [unresolved]
■ A proposal [RFC]

● An extension of NMI-printk() idea [1]
○ Use additional per-cpu buffers to handle printk() recursion
○ Mark unsafe regions with printk_safe_enter()/exit() calls, that
set special per-cpu flags
○ printk()->vprintk_funct() then checks those per-cpu flags
○ And “redirects” unsafe printk() call to one of the safe
callbacks:
■ vprintk_safe_nmi()
■ vprintk_safe()

● This handles printk() recursion only:
○ printk() -> /* anything that calls printk() again */ -> printk()
● But what about printk() deadlocks...

SyS_ioctl
tty_ioctl
tty_mode_ioctl
uart_set_termios
uart_change_speed
FOO_serial_set_termios
spin_lock_irqsave(&port->lock) // lock the output port
/* WARN_ON() / BUG_ON() / printk() */
vprintk_emit()
console_unlock()
call_console_drivers()
FOO_write()
spin_lock_irqsave(&port->lock) // deadlock

● The fundamental issue is that printk() depends
on two different lock groups - internal and external locks
● We can handle internal locks with printk_safe
● External locks are out of control: scheduler locks, clock locks, etc.
● printk_deferred(), basically, tells printk() to acquire ONLY
internal locks (logbuf spin lock), avoiding any external locks

● A proposal:
○ Make printk() behave like printk_deferred()
○ IOW, avoid any external locks in printk() if we are in atomic,
queueing irq_work for printing
○ So we can eventually remove printk_deferred(), which is
always soooo hard

● Thoughts/questions?

2. Async printk()
● Printing CPU does an “endless” loop in console_unlock()
● console_unlock() is not always preemptible
● Things get worse once you have a slow serial console
attached

2. Async printk()
● You never know how long will it take to printk() a message
● printk() can spin lockup, soft/hard lockup the system, etc.
● It’s not always safe to call printk() under spin_lock, RCU lock, etc.
○ It, thus, can provoke memory pressure, etc.
● It’s not always safe to call printk() from IRQ
● It’s not always safe to call printk_deferred()

2. Async printk()
● A proposal (started by Jan Kara) [2]:
○ Offload printing to a special printk kthread
○ printk() stores the message and queue irq_work
/* or wake_up() the printk kthread if the proposal from #1 won’t fly */
○ wake_up_klogd_work_func() wakes up the printk
kthread instead of doing console_unlock() from IRQ

2. Async printk()
● A proposal (continue):
○ So there is no time-unbound console_unlock() possibly
standing behind printk() and deferred_printk()
○ The printk kthread is always preemptible
○ We switch back to sync printk mode when in panic.
● printk() already depends on scheduler (wake_up() in semaphore)
● With printk_safe we also minimize the impact of wake_up()
○ Recursive printk() from wake_up()

2. Async printk()

3. pr_cont()
● pr_cont() is NOT SMP safe
● “That said, we should strive to avoid the need for KERN_CONT. It does result in
real problems for logging, and should generally not be seen as a good feature.
If we someday can get rid of the feature entirely, because nobody does any
fragmented printk calls, that would be lovely.”
Linus Torvalds

3. pr_cont()
● pr_cont() is not going to die
version git grep “pr_cont(“ git grep “KERN_CONT”
tags/v3.8 761 547
tags/v4.5 1123 515
tags/v4.9-rc2 1194 501

3. pr_cont()
● Proposals:
○ Make pr_cont buffer per-thread [hardly possible]
○ Use fake pr_cont buffers [Petr Mladek posted a PoC patch set [3]]
○ Add a new cont API that will use a temp kmalloc()-d or per-CPU
LOG_LINE_MAX buffer [an extended version of [4]]
■ kmalloc() might not always work. OOM pr_cont(), so
per-CPU buffers look a bit better
■ Stop caring about pr_cont() in the meantime
■ Probably add pr_cont() usage WARN to checkpatch.pl

3. pr_cont()

4. Console semaphore
● Console driver list is protected by a single console_sem semaphore
● Not every console_lock() caller:
○ Modifies the consoles list (read-only access)
○ Is a kernel thread (user-space can access that list in syscalls)
○ Want to print the messages from console_unlock()
○ Want to wait in TASK_UNINTERRUPTIBLE for current console_sem owner
■ Which possibly can be in console_unlock() printing loop
● Example:
○ cat /proc/consoles does console_lock()/console_unlock()

● Do we want to replace console sem with a READ/WRITE lock?
○ So processes that don’t modify the list can acquire console_sem
in read mode
● Direct console_unlock() callers will still do that “endless”
printing loop
● [flashback]
○ In async printk() we wake_up() printk kthread from wake_up_klogd_work_func()

● Do we want to split console_unlock() function into
○ async_flush_console_unlock() function
■ Basically, just unlock console_sem and wake_up()
printk kthread
○ sync_flush_console_unlock() function
■ Flush the logbuf and then unlock console_sem
■ There are paths that probably want to flush logbuf
from console_unlock()

Links
1. https://marc.info/?l=linux-kernel&m=147758348104960

printk() considered harmful

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to printk() considered harmful

Similar to printk() considered harmful (20)

Recently uploaded

Recently uploaded (20)

printk() considered harmful