1. Understanding a Kernel Oops
and a Kernel panic
Linux Kernel Version 4.9.8
Joseph Lu
joseph78715@gmail.com
2. Page 2
Understanding a Kernel Oops and a Kernel panic
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In user space : Segmentation fault
1
3. Page 3
Understanding a Kernel Oops and a Kernel panic
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
Oops
– In kernel space : kernel oops.
– In user space : Segmentation fault
2
4. Page 4
Understanding a Kernel Oops and a Kernel panic
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
• “kernel oops" will kill the process to keep system running, unless “kernel oops" are bad enough to cause
“kernel panic”.
Oops
moderate
do_exit()
– In kernel space : kernel oops.
– In user space : Segmentation fault
3
5. Page 5
Understanding a Kernel Oops and a Kernel panic
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
• “kernel oops" will kill the process to keep system running, unless “kernel oops" are bad enough to cause
“kernel panic”.
• “kernel panic” means the system decides to stop running immediately.
Oops
Panic
severe
moderate
do_exit()
– In kernel space : kernel oops.
– In user space : Segmentation fault
4
6. Page 6
Linux kernel oops
, illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
• illegal instruction (SIGILL)
5
7. Page 7
Linux kernel oops
, illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
• illegal instruction (SIGILL)
6
8. Page 8
Linux kernel oops
, illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
• bad_mode()
→ die(“Oops - bad mode”, …);
illegal instruction
• illegal instruction (SIGILL)
die()
Oops
7
9. Page 9
Linux kernel oops
, illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
• bad_mode()
→ die(“Oops - bad mode”, …);
illegal instruction
die()
Oops
Panic()
→ panic(“bad mode”);
• illegal instruction (SIGILL)
8
10. Page 10
Linux kernel oops
, illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
• bad_mode()
→ die(“Oops - bad mode”, …);
illegal instruction
die()
Oops
Panic()
→ panic(“bad mode”);
• illegal instruction (SIGILL)
9
11. Page 11
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
10
12. Page 12
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
arm_notify_die()
Oops
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
11
user_mode
No
Yes
force_sig_info()
13. Page 13
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
2. unrecognised system calls :
• arm_syscall() → arm_notify_die("Oops - bad syscall(2)", …)
12
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
14. Page 14
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
2. unrecognised system calls :
• arm_syscall() → arm_notify_die("Oops - bad syscall(2)", …)
3. undefined cpu instruction :
• do_undefinstr() → arm_notify_die("Oops - undefined
instruction", …)
13
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
15. Page 15
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
2. unrecognised system calls :
• arm_syscall() → arm_notify_die("Oops - bad syscall(2)", …)
3. undefined cpu instruction :
• do_undefinstr() → arm_notify_die("Oops - undefined
instruction", …)
4. unknown data abort : data accesses (load or store)
• alignment faults, translation faults, access bit faults, domain faults,
permission faults.
• E.g. Unaligned memory access, Memory access to reserved areas,
A write to ROM (flash) space
• E.g. baddataabort() → arm_notify_die(“unknown data abort
code”, …)
• E.g. do_DataAbort() → arm_notify_die("", ...);
14
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
16. Page 16
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
instruction translation lookaside buffer (ITLB) and
a data translation lookaside buffer (DTLB) aren't
seeing the same picture.
2. unrecognised system calls :
• arm_syscall() → arm_notify_die("Oops - bad syscall(2)", …)
3. undefined cpu instruction :
• do_undefinstr() → arm_notify_die("Oops - undefined
instruction", …)
4. unknown data abort : data accesses (load or store)
• alignment faults, translation faults, access bit faults, domain faults,
permission faults.
• E.g. Unaligned memory access, Memory access to reserved areas,
A write to ROM (flash) space
• E.g. baddataabort() → arm_notify_die(“unknown data abort
code”, …)
• E.g. do_DataAbort() → arm_notify_die("", ...);
15
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
17. Page 17
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
2. unrecognised system calls :
• arm_syscall() → arm_notify_die("Oops - bad syscall(2)", …)
3. undefined cpu instruction :
• do_undefinstr() → arm_notify_die("Oops - undefined
instruction", …)
4. unknown data abort : data accesses (load or store)
• alignment faults, translation faults, access bit faults, domain faults,
permission faults.
• E.g. Unaligned memory access, Memory access to reserved areas,
A write to ROM (flash) space
• E.g. baddataabort() → arm_notify_die(“unknown data abort
code”, …)
• E.g. do_DataAbort() → arm_notify_die("", ...);
5. prefetch-abort :
• do_PrefetchAbort() → arm_notify_die("",…);
16
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
18. Page 18
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
17
19. Page 19
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
arm_notify_die()
Oops
user_mode
No
Yes
force_sig_info()
__do_kernel_fault()
18
23. Page 23
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal instruction
illegal instruction cause oops :
1. Call BUG()
2. unrecognised system calls
3. undefined cpu instruction
4. unknown data abort
5. prefetch-abort
__do_kernel_fault()
Oops
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
22
24. Page 24
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal memory access
do_translation_fault
(E.g Null pointer exception)
__do_kernel_fault()
Oops
illegal instruction
illegal instruction cause oops :
1. Call BUG()
2. unrecognised system calls
3. undefined cpu instruction
4. unknown data abort
5. prefetch-abort
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
23
25. Page 25
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal memory access
do_translation_fault
(E.g Null pointer exception)
do_bad_area()
Where does it happen?
__do_kernel_fault()
Oops
illegal instruction
illegal instruction cause oops :
1. Call BUG()
2. unrecognised system calls
3. undefined cpu instruction
4. unknown data abort
5. prefetch-abort
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
24
No
fixup the exception
26. Page 26
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal memory access
do_translation_fault
(E.g Null pointer exception)
In user space
__do_user_fault()
Segmentation fault
do_bad_area()
Where does it happen?
__do_kernel_fault()
Oops
illegal instruction
illegal instruction cause oops :
1. Call BUG()
2. unrecognised system calls
3. undefined cpu instruction
4. unknown data abort
5. prefetch-abort
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
25
No
fixup the exception
27. Page 27
Linux kernel oops
• illegal instruction (SIGILL), illegal memory access (SIGSEGV)
– In kernel : Cause of an oops.
illegal memory access
do_translation_fault
(E.g Null pointer exception)
In user space
__do_user_fault()
Segmentation fault
do_bad_area()
Where does it happen?
in
Kernel
Space __do_kernel_fault()
Oops
illegal instruction
illegal instruction cause oops :
1. Call BUG()
2. unrecognised system calls
3. undefined cpu instruction
4. unknown data abort
5. prefetch-abort
illegal instruction cause panic:
It handles the impossible case in the interrupt vectors
26
No
fixup the exception
33. Page 33
Linux Kernel panic
__do_kernel_fault()
Oops
Oop_end()
do_exit()
panic()
…
moderate
severe
31
34. Page 34
Linux Kernel panic
__do_kernel_fault()
Oops
Oop_end()
In
process context
do_exit()
panic()
…
severe
32
35. Page 35
Linux Kernel panic
__do_kernel_fault()
Oops
Oop_end()
In Interrupt context
Or
panic_on_oops is set
In
process context
do_exit()
panic()
…
33
36. Page 36
Linux Kernel panic
__do_kernel_fault()
Oops
Oop_end()
In Interrupt context
Or
panic_on_oops is set
In
process context
do_exit()
panic()
…
34
37. Page 37
Linux Kernel panic
__do_kernel_fault()
Oops
Oop_end()
In Interrupt context
Or
panic_on_oops is set
In
process context
do_exit()
panic()
… console_verbose()
dump_stack()
panic_smp_self_stop() & smp_send_stop()
shut down other CPUs
panic_timeout ==0
Delay timeout seconds
while(1)
emergency_restart()
No
Yes
35
38. Page 38
Introduce Crash Dump Mechanism in Linux
1. Kdump : build a separate custom dump-capture kernel for capturing the
kernel core dump
2. Ramoops : log data into persistent the RAM storage
3. Mtdoops : log data into MTD partition
4. Reserved-memory
36