SlideShare une entreprise Scribd logo
1  sur  32
CRASH DUMP
     ANALYSIS 101
              JOHN S. HOWARD
    JOHN.HOWARD@NEXENTA.COM




1        © Copyright Nexenta 2012
AGENDA


!    
    Terminology
!    
    Core Dumps and Crash Dumps
!    
    C Language Basics
!    
    The Mechanism of a Panic
!   mdb Overview
!    
    Basic Crash Dump Analysis




2                    © Copyright Nexenta 2012
PROCESS, THREAD, LWP


! Process
   
     !  A program in execution
     !  May be comprised of threads or LWPs
!   Thread
     !  The smallest unit of scheduling
     !  Shared address space and resources
!   Light Weight Process (LWP)
     !  A many-to-1 mapping of user threads to a kernel thread
     !  Provides user-level multitasking


3                        © Copyright Nexenta 2012
INTERRUPTS AND TRAPS


! I nterrupts are asynchronous messages notifying the kernel of
    external device events
      !  Some interrupts are handled as traps
!    Traps are synchronous messages, essentially a software
    interrupt
!    Bus errors are issued to a processor when referencing a
    location that can’t be resolved or located




4                        © Copyright Nexenta 2012
HANGS, CRASHES, AND PANICS


! Hang
   
     !  Potentially limited or no forensic information
     !  System up, but unresponsive
!   Crash
     !  Potentially limited forensic information
     !  System down or rebooted
!   Panic
     !  Maximum potential forensic information
     !  System down or rebooted


5                          © Copyright Nexenta 2012
FORENSIC INFORMATION SOURCES


! Forensic Information Sources
   
    !  Console
    !  syslog, typically logged to
      /var/adm/messages
    !  Core file or crash dump




6                        © Copyright Nexenta 2012
CORE FILE


!    
    A dump of the contents of all memory allocated to the
    process
!    
    Inert and static record of state
!    
    Process core files are dumped to the working directory by
    default
!    
    Core file properties managed via coreadm
!    
    Requires the same libraries to be read




7                        © Copyright Nexenta 2012
CRASH DUMP


! A dump of the contents of all memory allocated to the kernel
    
!  Inert and static record of state
!  Written to the pre-specified dump device or swap partition
     !  Written “backwards”
!   Reading requires the same OS version
!   Kernel core file facility managed via dumpadm




8                        © Copyright Nexenta 2012
DUMPADM


!   dumpadm with no options shows current settings
    # dumpadm
        !Dump content: kernel pages!
        !Dump device: /dev/zvol/dsk/rpool/dump (dedicated)!
        !Savecore directory: /var/crash/myhost!
        !Savecore enabled: yes!
!   To force a crash dump:
    # savecore -L

!  Note that savecore does not quiesce system, so memory contents
    are changing
    # uadmin 5 0

    # reboot -dn
9                        © Copyright Nexenta 2012
PANIC


! Kernel detected inconsistency
    
!  Protect by exiting
!  Three major tasks to be performed in a system panic:
     !  record information about the panic in memory (making it
       part of the crash dump)
     !  synchronize the file systems to preserve user file data
     !  generate the crash dump




10                       © Copyright Nexenta 2012
C PROGRAMMING LANGUAGE DATATYPES


! Built-ins
   
      ! int, float,char
!    struct
      !  A grouping of data
!    union
      !  variant records
      !  All constituent data items are overlaid
!    typedef
!    Pointers
      !  A reference to a memory location

11                        © Copyright Nexenta 2012
C DATATYPES EXAMPLES




int ap;!
char buf[128];!
int *user = sr;!
typedef struct smb_mtype {!
    !    !char!    !*mt_name;!
    !    !int !    !mt_namelen;!
    !    !int !    !mt_flags;!
} smb_mtype_t


12                     © Copyright Nexenta 2012
C FUNCTIONS


! Declaration
    
!  Definition
!  Parameters are pass by value




13                     © Copyright Nexenta 2012
C FUNCTION EXAMPLES




Declaration
  static void smb_tree_log(smb_request_t *, const char *, !
                            const char *, ...);!
Definition

  smb_tree_log(smb_request_t *sr, const char *sharename,!
                const char *fmt, ...)

  {

  .

  .

  .

  }!



14                    © Copyright Nexenta 2012
PANIC()

! panic(),
                 cmn_err()
      !  Common entry points for vpanic()
      !  Responsible for providing panic information
!    die()
!    vpanic()
      !  Assembly language function for saving register state
!    ASSERT(condition)
      !  Halts execution of the kernel if condition is false
      !  Evaluated and executed only when the DEBUG compilation
         symbol is defined
!    VERIFY(condition)
      !  Similar to ASSERT, but active even when DEBUG isn’t defined
      !  Stack will contain assfail() near top

15                          © Copyright Nexenta 2012
EXAMPLE 1: PANIC STRING




panic[cpu1]/thread=ffffff000e4e7c60:
BAD TRAP: type=e (#pf Page fault)
rp=ffffff000e4e77c0 addr=0 occurred in module
"unix" due to a NULL pointer dereference




16                        © Copyright Nexenta 2012
EXAMPLE 1: STACK TRACE




ffffff000e4e76a0   unix:die+dd ()
ffffff000e4e77b0   unix:trap+177b ()
ffffff000e4e77c0   unix:cmntrap+e6 ()
ffffff000e4e78c0   unix:strcasecmp+16 ()
ffffff000e4e7a50   smbsrv:smb_tree_log+b3 ()
ffffff000e4e7a90   smbsrv:smb_tree_connect_core+14a ()
ffffff000e4e7ac0   smbsrv:smb_tree_connect+35 ()
ffffff000e4e7ae0   smbsrv:smb_com_tree_connect_andx+16 ()
ffffff000e4e7b80   smbsrv:smb_dispatch_request+4a9 ()
ffffff000e4e7bb0   smbsrv:smb_session_worker+6c ()
ffffff000e4e7c40   genunix:taskq_d_thread+b1 ()
ffffff000e4e7c50   unix:thread_start+8 ()

17                       © Copyright Nexenta 2012
MDB – MODULAR DEBUGGER


! Extensible utility for low-level debugging and editing
    
!  On live kernel:
     # mdb -k
     # mdb -kw to edit (VERY	
  DANGEROUS)
!    On a core file:
     mdb syseventd.core.125
!    On a crash dump:
     # mdb -k unix.3 vmcore.3




18                        © Copyright Nexenta 2012
ANALYZE-CRASH.SH


! Extracts the crash dump from the dump device
    
  (savecore -vf filename) if necessary
!  Scripted mdb commands for basic crash information:
      !  Panic string and registers
      ! dmesg buffer
      !  Stack
      !  Thread list
!     Executed automatically by the NMC `support` command
     (NS 3.1.2 and later)


19                      © Copyright Nexenta 2012
HAVE I SEEN THIS BEFORE?


! Footprints
    
!  Known problem or new?
       ! Redmine
       !  Search illumos Hg issues
         https://www.illumos.org/issues/
       ! SunSolve is gone, however “We Sun Solve” is rescuing
         the data from SunSolve.Sun.COM
         http://wesunsolve.net/bsearch
!    illumos Source browser
     http://src.illumos.org/source/

20                         © Copyright Nexenta 2012
EXAMPLE 1: PANIC STRING




panic[cpu1]/thread=ffffff000e4e7c60:
BAD TRAP: type=e (#pf Page fault)
rp=ffffff000e4e77c0 addr=0 occurred in module
"unix" due to a NULL pointer dereference




21                        © Copyright Nexenta 2012
EXAMPLE 1: STACK TRACE




ffffff000e4e76a0   unix:die+dd ()
ffffff000e4e77b0   unix:trap+177b ()
ffffff000e4e77c0   unix:cmntrap+e6 ()
ffffff000e4e78c0   unix:strcasecmp+16 ()
ffffff000e4e7a50   smbsrv:smb_tree_log+b3 ()
ffffff000e4e7a90   smbsrv:smb_tree_connect_core+14a ()
ffffff000e4e7ac0   smbsrv:smb_tree_connect+35 ()
ffffff000e4e7ae0   smbsrv:smb_com_tree_connect_andx+16 ()
ffffff000e4e7b80   smbsrv:smb_dispatch_request+4a9 ()
ffffff000e4e7bb0   smbsrv:smb_session_worker+6c ()
ffffff000e4e7c40   genunix:taskq_d_thread+b1 ()
ffffff000e4e7c50   unix:thread_start+8 ()

22                       © Copyright Nexenta 2012
EXAMPLE 2: PANIC INFO

panic[cpu5]/thread=ffffff000fd72c60:
BAD TRAP: type=0 (#de Divide error) rp=ffffff000fd72a40 addr=ffffff02da92e900

sched:
#de Divide error
addr=0xffffff02da92e900
pid=0, pc=0xfffffffff7ad977b, sp=0xffffff000fd72b30, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: fffffd7fff2a60c8
cr3: 5000000
cr8: c
        rdi: ffffff02d282e840 rsi:                0 rdx:                0
        rcx:               64  r8: ffffff000fd72c60  r9:                0
        rax:                0 rbx:                0 rbp: ffffff000fd72b90
        r10:                0 r11: ffffff02f46e8264 r12: ffffff02da316338
        r13: ffffff02da3163d0 r14: ffffff02d5061a50 r15: ffffff02da92e900
        fsb:                0 gsb: ffffff02da9a1540  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                0 err:                0 rip: fffffffff7ad977b
         cs:               30 rfl:            10246 rsp: ffffff000fd72b30
         ss:               38

23                             © Copyright Nexenta 2012
EXAMPLE 2: STACK




 ffffff000fd72920 unix:die+10f ()
 ffffff000fd72a30 unix:trap+1555 ()
 ffffff000fd72a40 unix:cmntrap+e6 ()
 ffffff000fd72b90 cpudrv:cpudrv_monitor+1cb ()
 ffffff000fd72c40 genunix:taskq_thread+285 ()
 ffffff000fd72c50 unix:thread_start+8 ()
 syncing file systems...
  done
 dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc

 STACK
 ---
 ffffff000fd72b90 cpudrv_monitor+0x1cb(ffffff02da316338)
 ffffff000fd72c40 taskq_thread+0x285(ffffff02da859140)
 ffffff000fd72c50 thread_start+8()



24                            © Copyright Nexenta 2012
EXAMPLE 2: THREAD LIST




ffffff000fd72c60 fffffffffbc2dbf0                0   0  60                0
  PC: panicsys+0x9b    TASKQ: cpudrv_cpudrv_monitor
  stack pointer for thread ffffff000fd72c60: ffffff000fd726e0
    xc_insert+0x36()
    0xffffff0200000000()
    cpudrv_monitor+0x1cb()
    taskq_thread+0x285()
    thread_start+8()




25                             © Copyright Nexenta 2012
EXAMPLE 2: SOURCE
CODE




From cpudrv_monitor()
   1109      /*
   1110       * Adjust counts based on the delay added by timeout and taskq.
   1111       */
   1112      idle_cnt = (idle_cnt * cur_spd->quant_cnt) / tick_cnt;
   1113      user_cnt = (user_cnt * cur_spd->quant_cnt) / tick_cnt;
   1114 




26                             © Copyright Nexenta 2012
HARDWARE, FIRMWARE, OR SOFTWARE?


!     
     Crash dumps are inconclusive on hardware errors
!     
     Correlate to fmdump output
!     
     PCI-X panics are the most common hardware caused panic
!     
     PCI Vendor Database http://pcidatabase.com
!     
     KB Article: “Understanding and decoding PCI(-X) Express
     Fatal Error panics”




27                       © Copyright Nexenta 2012
EXAMPLE 3: PANIC STRING
AND STACK TRACE


      panic[cpu7]/thread=ffffff005cbdbc60:
      pcieb-3: PCI(-X) Express Fatal Error. (0x101)

      ffffff005cbdbbb0     pcieb:pcieb_intr_handler+228 ()
      ffffff005cbdbc00     unix:av_dispatch_autovect+7c ()
      ffffff005cbdbc40     unix:dispatch_hardint+33 ()
      ffffff005cbaba80     unix:switch_sp_and_call+13 ()
      ffffff005cbabad0     unix:do_interrupt+b8 ()
      ffffff005cbabae0     unix:_interrupt+b8 ()
      ffffff005cbabbd0     unix:i86_mwait+d ()
      ffffff005cbabc20     unix:cpu_idle_mwait+f1 ()
      ffffff005cbabc40     unix:idle+114 ()
      ffffff005cbabc50     unix:thread_start+8 ()

28                        © Copyright Nexenta 2012
IDENTIFYING THE PCI-X
COMPONENT


     Mar 30 2011 00:53:53.606674454 ereport.io.pci.fabric
     nvlist version: 0
             class = ereport.io.pci.fabric
             ena = 0xbcd565541a801401
             detector = (embedded nvlist)
             nvlist version: 0
                     version = 0x0
                     scheme = dev
                     device-path = /pci@0,0/pci8086,3408@1
             (end detector)

             bdf = 0x8
             device_id = 0x3408
             vendor_id = 0x8086

29                       © Copyright Nexenta 2012
IDENTIFYING THE VENDOR

     Device ID      Chip Description                        Vendor ID   Vendor Name
     0x3408      Intel 7500 Chipset PCIe Root Port             0x8086   Intel Corporation


     device-path = /pci@0,0/pci8086,3408@1
     device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0
     device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0,1

If no entries in neither the PCI vendor database nor
`/usr/share/hwdata/pci.ids` then grep
`/etc/path_to_inst`:

   "/pci@0,0/pci8086,3408@1" 0 "pcie_pci"
   "/pci@0,0/pci8086,3408@1/pci108e,484c@0" 0 "igb"
   "/pci@0,0/pci8086,3408@1/pci108e,484c@0,1" 1 "igb“
igb is the intel Gigabit NIC driver
30                               © Copyright Nexenta 2012
DETERMINE DRIVER AND
 PACKAGE DETAILS

# dpkg -S igb | grep '/kernel’
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv/igb.conf
sunwigb: /kernel/drv/amd64/igb
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv
sunwigb: /kernel/drv/igb
sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel

Examine the package details:

# dpkg -l sunwigb
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name                    Version                 Description
+++-=======================-======================-======================================
ii sunwigb                  5.11.134-31-8234-1      Intel 82575 1Gb PCI Express NIC
Driver


 31                                © Copyright Nexenta 2012
A PCI-X CONCLUSION, OF SORTS


!  Searching redmine for “igb driver” will find a bug, but also
  check for any Intel 82575 gigabit issues
!  Next, determine:
      !  Is the driver is down revision?
      !  Is the firmware is down revision?
!    If the driver and firmware are current, then this is most likely
     a hardware problem
!    CDA is inconclusive for proving hardware failures




32                          © Copyright Nexenta 2012

Contenu connexe

Tendances

Debugging linux kernel tools and techniques
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniquesSatpal Parmar
 
Kernel Recipes 2019 - Kernel hacking behind closed doors
Kernel Recipes 2019 - Kernel hacking behind closed doorsKernel Recipes 2019 - Kernel hacking behind closed doors
Kernel Recipes 2019 - Kernel hacking behind closed doorsAnne Nicolas
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Camilo Alvarez Rivera
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Linux kernel debugging(ODP format)
Linux kernel debugging(ODP format)Linux kernel debugging(ODP format)
Linux kernel debugging(ODP format)yang firo
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionlinuxlab_conf
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugsDmitry Vyukov
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
How to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitHow to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitJiahong Fang
 
Introduction of unit test on android kernel
Introduction of unit test on android kernelIntroduction of unit test on android kernel
Introduction of unit test on android kernelJohnson Chou
 

Tendances (20)

Debugging linux kernel tools and techniques
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniques
 
VS Debugging Tricks
VS Debugging TricksVS Debugging Tricks
VS Debugging Tricks
 
Kernel Recipes 2019 - Kernel hacking behind closed doors
Kernel Recipes 2019 - Kernel hacking behind closed doorsKernel Recipes 2019 - Kernel hacking behind closed doors
Kernel Recipes 2019 - Kernel hacking behind closed doors
 
Proxy arp
Proxy arpProxy arp
Proxy arp
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Windows Crash Dump Analysis
Windows Crash Dump AnalysisWindows Crash Dump Analysis
Windows Crash Dump Analysis
 
SystemV vs systemd
SystemV vs systemdSystemV vs systemd
SystemV vs systemd
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
 
Android - ADB
Android - ADBAndroid - ADB
Android - ADB
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
C++ Production Debugging
C++ Production DebuggingC++ Production Debugging
C++ Production Debugging
 
Linux kernel debugging(ODP format)
Linux kernel debugging(ODP format)Linux kernel debugging(ODP format)
Linux kernel debugging(ODP format)
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruption
 
Systemd cheatsheet
Systemd cheatsheetSystemd cheatsheet
Systemd cheatsheet
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
Debugging linux
Debugging linuxDebugging linux
Debugging linux
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
How to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitHow to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One Exploit
 
Logging system of Android
Logging system of AndroidLogging system of Android
Logging system of Android
 
Introduction of unit test on android kernel
Introduction of unit test on android kernelIntroduction of unit test on android kernel
Introduction of unit test on android kernel
 

Similaire à Crash Dump Analysis 101

Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit
 
Driver Debugging Basics
Driver Debugging BasicsDriver Debugging Basics
Driver Debugging BasicsBala Subra
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackTomer Zait
 
Defcon 22-colby-moore-patrick-wardle-synack-drop cam
Defcon 22-colby-moore-patrick-wardle-synack-drop camDefcon 22-colby-moore-patrick-wardle-synack-drop cam
Defcon 22-colby-moore-patrick-wardle-synack-drop camPriyanka Aash
 
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackironSource
 
sponsorAVAST-VB2014
sponsorAVAST-VB2014sponsorAVAST-VB2014
sponsorAVAST-VB2014Martin Hron
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)Alexandre Moneger
 
Csw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCsw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCanSecWest
 
02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stackAlexandre Moneger
 
Writing Metasploit Plugins
Writing Metasploit PluginsWriting Metasploit Plugins
Writing Metasploit Pluginsamiable_indian
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathDennis Chung
 
Filip palian mateuszkocielski. simplest ownage human observed… routers
Filip palian mateuszkocielski. simplest ownage human observed… routersFilip palian mateuszkocielski. simplest ownage human observed… routers
Filip palian mateuszkocielski. simplest ownage human observed… routersYury Chemerkin
 
Simplest-Ownage-Human-Observed… - Routers
 Simplest-Ownage-Human-Observed… - Routers Simplest-Ownage-Human-Observed… - Routers
Simplest-Ownage-Human-Observed… - RoutersLogicaltrust pl
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W mattersAlexandre Moneger
 
You're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security SoftwareYou're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security SoftwareCylance
 
Accelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesAccelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesDmitry Vostokov
 
Fundamentals of Complete Crash and Hang Memory Dump Analysis
Fundamentals of Complete Crash and Hang Memory Dump AnalysisFundamentals of Complete Crash and Hang Memory Dump Analysis
Fundamentals of Complete Crash and Hang Memory Dump AnalysisDmitry Vostokov
 
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹GangSeok Lee
 

Similaire à Crash Dump Analysis 101 (20)

Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Driver Debugging Basics
Driver Debugging BasicsDriver Debugging Basics
Driver Debugging Basics
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
 
Defcon 22-colby-moore-patrick-wardle-synack-drop cam
Defcon 22-colby-moore-patrick-wardle-synack-drop camDefcon 22-colby-moore-patrick-wardle-synack-drop cam
Defcon 22-colby-moore-patrick-wardle-synack-drop cam
 
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the Stack
 
sponsorAVAST-VB2014
sponsorAVAST-VB2014sponsorAVAST-VB2014
sponsorAVAST-VB2014
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Csw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCsw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physical
 
02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack
 
Writing Metasploit Plugins
Writing Metasploit PluginsWriting Metasploit Plugins
Writing Metasploit Plugins
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
Filip palian mateuszkocielski. simplest ownage human observed… routers
Filip palian mateuszkocielski. simplest ownage human observed… routersFilip palian mateuszkocielski. simplest ownage human observed… routers
Filip palian mateuszkocielski. simplest ownage human observed… routers
 
Simplest-Ownage-Human-Observed… - Routers
 Simplest-Ownage-Human-Observed… - Routers Simplest-Ownage-Human-Observed… - Routers
Simplest-Ownage-Human-Observed… - Routers
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
 
Touch your NetBSD
Touch your NetBSDTouch your NetBSD
Touch your NetBSD
 
You're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security SoftwareYou're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security Software
 
Accelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slidesAccelerated .NET Memory Dump Analysis training public slides
Accelerated .NET Memory Dump Analysis training public slides
 
Fundamentals of Complete Crash and Hang Memory Dump Analysis
Fundamentals of Complete Crash and Hang Memory Dump AnalysisFundamentals of Complete Crash and Hang Memory Dump Analysis
Fundamentals of Complete Crash and Hang Memory Dump Analysis
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹
[2007 CodeEngn Conference 01] dual5651 - Windows 커널단의 후킹
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Crash Dump Analysis 101

  • 1. CRASH DUMP ANALYSIS 101 JOHN S. HOWARD JOHN.HOWARD@NEXENTA.COM 1 © Copyright Nexenta 2012
  • 2. AGENDA !   Terminology !   Core Dumps and Crash Dumps !   C Language Basics !   The Mechanism of a Panic ! mdb Overview !   Basic Crash Dump Analysis 2 © Copyright Nexenta 2012
  • 3. PROCESS, THREAD, LWP ! Process   !  A program in execution !  May be comprised of threads or LWPs !  Thread !  The smallest unit of scheduling !  Shared address space and resources !  Light Weight Process (LWP) !  A many-to-1 mapping of user threads to a kernel thread !  Provides user-level multitasking 3 © Copyright Nexenta 2012
  • 4. INTERRUPTS AND TRAPS ! I nterrupts are asynchronous messages notifying the kernel of external device events !  Some interrupts are handled as traps !  Traps are synchronous messages, essentially a software interrupt !  Bus errors are issued to a processor when referencing a location that can’t be resolved or located 4 © Copyright Nexenta 2012
  • 5. HANGS, CRASHES, AND PANICS ! Hang   !  Potentially limited or no forensic information !  System up, but unresponsive !  Crash !  Potentially limited forensic information !  System down or rebooted !  Panic !  Maximum potential forensic information !  System down or rebooted 5 © Copyright Nexenta 2012
  • 6. FORENSIC INFORMATION SOURCES ! Forensic Information Sources   !  Console !  syslog, typically logged to /var/adm/messages !  Core file or crash dump 6 © Copyright Nexenta 2012
  • 7. CORE FILE !   A dump of the contents of all memory allocated to the process !   Inert and static record of state !   Process core files are dumped to the working directory by default !   Core file properties managed via coreadm !   Requires the same libraries to be read 7 © Copyright Nexenta 2012
  • 8. CRASH DUMP ! A dump of the contents of all memory allocated to the kernel   !  Inert and static record of state !  Written to the pre-specified dump device or swap partition !  Written “backwards” !  Reading requires the same OS version !  Kernel core file facility managed via dumpadm 8 © Copyright Nexenta 2012
  • 9. DUMPADM ! dumpadm with no options shows current settings # dumpadm !Dump content: kernel pages! !Dump device: /dev/zvol/dsk/rpool/dump (dedicated)! !Savecore directory: /var/crash/myhost! !Savecore enabled: yes! !  To force a crash dump: # savecore -L !  Note that savecore does not quiesce system, so memory contents are changing # uadmin 5 0 # reboot -dn 9 © Copyright Nexenta 2012
  • 10. PANIC ! Kernel detected inconsistency   !  Protect by exiting !  Three major tasks to be performed in a system panic: !  record information about the panic in memory (making it part of the crash dump) !  synchronize the file systems to preserve user file data !  generate the crash dump 10 © Copyright Nexenta 2012
  • 11. C PROGRAMMING LANGUAGE DATATYPES ! Built-ins   ! int, float,char ! struct !  A grouping of data !  union !  variant records !  All constituent data items are overlaid ! typedef !  Pointers !  A reference to a memory location 11 © Copyright Nexenta 2012
  • 12. C DATATYPES EXAMPLES int ap;! char buf[128];! int *user = sr;! typedef struct smb_mtype {! ! !char! !*mt_name;!   ! !int ! !mt_namelen;!   ! !int ! !mt_flags;! } smb_mtype_t 12 © Copyright Nexenta 2012
  • 13. C FUNCTIONS ! Declaration   !  Definition !  Parameters are pass by value 13 © Copyright Nexenta 2012
  • 14. C FUNCTION EXAMPLES Declaration static void smb_tree_log(smb_request_t *, const char *, ! const char *, ...);! Definition
 smb_tree_log(smb_request_t *sr, const char *sharename,! const char *fmt, ...)
 {
 .
 .
 .
 }! 14 © Copyright Nexenta 2012
  • 15. PANIC() ! panic(),   cmn_err() !  Common entry points for vpanic() !  Responsible for providing panic information !  die() ! vpanic() !  Assembly language function for saving register state !  ASSERT(condition) !  Halts execution of the kernel if condition is false !  Evaluated and executed only when the DEBUG compilation symbol is defined !  VERIFY(condition) !  Similar to ASSERT, but active even when DEBUG isn’t defined !  Stack will contain assfail() near top 15 © Copyright Nexenta 2012
  • 16. EXAMPLE 1: PANIC STRING panic[cpu1]/thread=ffffff000e4e7c60: BAD TRAP: type=e (#pf Page fault) rp=ffffff000e4e77c0 addr=0 occurred in module "unix" due to a NULL pointer dereference 16 © Copyright Nexenta 2012
  • 17. EXAMPLE 1: STACK TRACE ffffff000e4e76a0 unix:die+dd () ffffff000e4e77b0 unix:trap+177b () ffffff000e4e77c0 unix:cmntrap+e6 () ffffff000e4e78c0 unix:strcasecmp+16 () ffffff000e4e7a50 smbsrv:smb_tree_log+b3 () ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a () ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 () ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 () ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 () ffffff000e4e7bb0 smbsrv:smb_session_worker+6c () ffffff000e4e7c40 genunix:taskq_d_thread+b1 () ffffff000e4e7c50 unix:thread_start+8 () 17 © Copyright Nexenta 2012
  • 18. MDB – MODULAR DEBUGGER ! Extensible utility for low-level debugging and editing   !  On live kernel: # mdb -k # mdb -kw to edit (VERY  DANGEROUS) !  On a core file: mdb syseventd.core.125 !  On a crash dump: # mdb -k unix.3 vmcore.3 18 © Copyright Nexenta 2012
  • 19. ANALYZE-CRASH.SH ! Extracts the crash dump from the dump device   (savecore -vf filename) if necessary !  Scripted mdb commands for basic crash information: !  Panic string and registers ! dmesg buffer !  Stack !  Thread list !  Executed automatically by the NMC `support` command (NS 3.1.2 and later) 19 © Copyright Nexenta 2012
  • 20. HAVE I SEEN THIS BEFORE? ! Footprints   !  Known problem or new? ! Redmine !  Search illumos Hg issues https://www.illumos.org/issues/ ! SunSolve is gone, however “We Sun Solve” is rescuing the data from SunSolve.Sun.COM http://wesunsolve.net/bsearch ! illumos Source browser http://src.illumos.org/source/ 20 © Copyright Nexenta 2012
  • 21. EXAMPLE 1: PANIC STRING panic[cpu1]/thread=ffffff000e4e7c60: BAD TRAP: type=e (#pf Page fault) rp=ffffff000e4e77c0 addr=0 occurred in module "unix" due to a NULL pointer dereference 21 © Copyright Nexenta 2012
  • 22. EXAMPLE 1: STACK TRACE ffffff000e4e76a0 unix:die+dd () ffffff000e4e77b0 unix:trap+177b () ffffff000e4e77c0 unix:cmntrap+e6 () ffffff000e4e78c0 unix:strcasecmp+16 () ffffff000e4e7a50 smbsrv:smb_tree_log+b3 () ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a () ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 () ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 () ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 () ffffff000e4e7bb0 smbsrv:smb_session_worker+6c () ffffff000e4e7c40 genunix:taskq_d_thread+b1 () ffffff000e4e7c50 unix:thread_start+8 () 22 © Copyright Nexenta 2012
  • 23. EXAMPLE 2: PANIC INFO panic[cpu5]/thread=ffffff000fd72c60: BAD TRAP: type=0 (#de Divide error) rp=ffffff000fd72a40 addr=ffffff02da92e900 sched: #de Divide error addr=0xffffff02da92e900 pid=0, pc=0xfffffffff7ad977b, sp=0xffffff000fd72b30, eflags=0x10246 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: fffffd7fff2a60c8 cr3: 5000000 cr8: c         rdi: ffffff02d282e840 rsi:                0 rdx:                0         rcx:               64  r8: ffffff000fd72c60  r9:                0         rax:                0 rbx:                0 rbp: ffffff000fd72b90         r10:                0 r11: ffffff02f46e8264 r12: ffffff02da316338         r13: ffffff02da3163d0 r14: ffffff02d5061a50 r15: ffffff02da92e900         fsb:                0 gsb: ffffff02da9a1540  ds:               4b          es:               4b  fs:                0  gs:              1c3         trp:                0 err:                0 rip: fffffffff7ad977b          cs:               30 rfl:            10246 rsp: ffffff000fd72b30          ss:               38 23 © Copyright Nexenta 2012
  • 24. EXAMPLE 2: STACK ffffff000fd72920 unix:die+10f () ffffff000fd72a30 unix:trap+1555 () ffffff000fd72a40 unix:cmntrap+e6 () ffffff000fd72b90 cpudrv:cpudrv_monitor+1cb () ffffff000fd72c40 genunix:taskq_thread+285 () ffffff000fd72c50 unix:thread_start+8 () syncing file systems...  done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc STACK --- ffffff000fd72b90 cpudrv_monitor+0x1cb(ffffff02da316338) ffffff000fd72c40 taskq_thread+0x285(ffffff02da859140) ffffff000fd72c50 thread_start+8() 24 © Copyright Nexenta 2012
  • 25. EXAMPLE 2: THREAD LIST ffffff000fd72c60 fffffffffbc2dbf0                0   0  60                0   PC: panicsys+0x9b    TASKQ: cpudrv_cpudrv_monitor   stack pointer for thread ffffff000fd72c60: ffffff000fd726e0     xc_insert+0x36()     0xffffff0200000000()     cpudrv_monitor+0x1cb()     taskq_thread+0x285()     thread_start+8() 25 © Copyright Nexenta 2012
  • 26. EXAMPLE 2: SOURCE CODE From cpudrv_monitor() 1109      /*    1110       * Adjust counts based on the delay added by timeout and taskq.    1111       */    1112      idle_cnt = (idle_cnt * cur_spd->quant_cnt) / tick_cnt;    1113      user_cnt = (user_cnt * cur_spd->quant_cnt) / tick_cnt;    1114  26 © Copyright Nexenta 2012
  • 27. HARDWARE, FIRMWARE, OR SOFTWARE? !   Crash dumps are inconclusive on hardware errors !   Correlate to fmdump output !   PCI-X panics are the most common hardware caused panic !   PCI Vendor Database http://pcidatabase.com !   KB Article: “Understanding and decoding PCI(-X) Express Fatal Error panics” 27 © Copyright Nexenta 2012
  • 28. EXAMPLE 3: PANIC STRING AND STACK TRACE panic[cpu7]/thread=ffffff005cbdbc60: pcieb-3: PCI(-X) Express Fatal Error. (0x101) ffffff005cbdbbb0 pcieb:pcieb_intr_handler+228 () ffffff005cbdbc00 unix:av_dispatch_autovect+7c () ffffff005cbdbc40 unix:dispatch_hardint+33 () ffffff005cbaba80 unix:switch_sp_and_call+13 () ffffff005cbabad0 unix:do_interrupt+b8 () ffffff005cbabae0 unix:_interrupt+b8 () ffffff005cbabbd0 unix:i86_mwait+d () ffffff005cbabc20 unix:cpu_idle_mwait+f1 () ffffff005cbabc40 unix:idle+114 () ffffff005cbabc50 unix:thread_start+8 () 28 © Copyright Nexenta 2012
  • 29. IDENTIFYING THE PCI-X COMPONENT Mar 30 2011 00:53:53.606674454 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0xbcd565541a801401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,3408@1 (end detector) bdf = 0x8 device_id = 0x3408 vendor_id = 0x8086 29 © Copyright Nexenta 2012
  • 30. IDENTIFYING THE VENDOR Device ID Chip Description Vendor ID Vendor Name 0x3408 Intel 7500 Chipset PCIe Root Port 0x8086 Intel Corporation device-path = /pci@0,0/pci8086,3408@1 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0,1 If no entries in neither the PCI vendor database nor `/usr/share/hwdata/pci.ids` then grep `/etc/path_to_inst`: "/pci@0,0/pci8086,3408@1" 0 "pcie_pci" "/pci@0,0/pci8086,3408@1/pci108e,484c@0" 0 "igb" "/pci@0,0/pci8086,3408@1/pci108e,484c@0,1" 1 "igb“ igb is the intel Gigabit NIC driver 30 © Copyright Nexenta 2012
  • 31. DETERMINE DRIVER AND PACKAGE DETAILS # dpkg -S igb | grep '/kernel’ sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv/igb.conf sunwigb: /kernel/drv/amd64/igb sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv sunwigb: /kernel/drv/igb sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel Examine the package details: # dpkg -l sunwigb Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-=======================-======================-====================================== ii sunwigb 5.11.134-31-8234-1 Intel 82575 1Gb PCI Express NIC Driver 31 © Copyright Nexenta 2012
  • 32. A PCI-X CONCLUSION, OF SORTS !  Searching redmine for “igb driver” will find a bug, but also check for any Intel 82575 gigabit issues !  Next, determine: !  Is the driver is down revision? !  Is the firmware is down revision? !  If the driver and firmware are current, then this is most likely a hardware problem !  CDA is inconclusive for proving hardware failures 32 © Copyright Nexenta 2012