SlideShare a Scribd company logo
1 of 92
Under the Hood
of Oracle Clusterware
Miracle OpenWorld 2010
15-Apr-2010
Alex Gorbachev, The Pythian Group
Alex Gorbachev

    • CTO, The Pythian Group
    • Blogger

    • OakTable Network member
    • Oracle ACE Director

    • BattleAgainstAnyGuess.com

    • Vice-president, Oracle RAC SIG




2                             © 2009/2010 Pythian
Why Companies Trust Pythian
    • Recognized Leader:
    •   Global industry-leader in remote database administration services and consulting for Oracle,
        Oracle Applications, MySQL and SQL Server
    •   Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and
        MDS Inc. to help manage their complex IT deployments

    • Expertise:
    •   One of the world’s largest concentrations of dedicated, full-time DBA expertise.

    • Global Reach & Scalability:
    •   24/7/365 global remote support for DBA and consulting, systems administration, special
        projects or emergency response




3                                            © 2009/2010 Pythian
Agenda

    • Place of Clusterware in Oracle RAC


    • Node membership and evictions


    • Clusterware startup sequence


    • Oracle Cluster Registry


    • Resources Management and troubleshooting


    • 11gR2 Grid Infrastructure

4                     © 2009/2010 Pythian
Agenda

                       High

                                         th Th
                                           e e
                                            le m
                                              ss or
                                                yo e y
    Need to memorize




                                                  u ou
                                                   ne u
                                                     ed nd
                                                       to ers
                                                         m ta
                                                           em nd
                                                              or ,
                                                                iz
                                                                   e
                       Low




                               Shallow                                 In-depth
                                          Understanding
4                                           © 2009/2010 Pythian
Architecture
    OS                     OS                                OS

          VIP                     VIP                              VIP
           Listener                Listener                         Listener
    Service                Service                           Service

              Instance               Instance                          Instance
    ASM                    ASM                               ASM
    Clusterware            Clusterware                       Clusterware



                                                                          interconnect
          storage access




                                       OCR          Voting
                                                     disk
                                Shared storage




5                                    © 2009/2010 Pythian
Architecture
    OS                     OS                                OS

          VIP                     VIP                              VIP
           Listener                Listener                         Listener
    Service                Service                           Service

              Instance               Instance                          Instance
    ASM                    ASM                               ASM
    Clusterware            Clusterware                       Clusterware



                                                                          interconnect
          storage access




                                       OCR          Voting
                                                     disk
                                Shared storage




5                                    © 2009/2010 Pythian
OS


    Clusterware




6                 © 2009/2010 Pythian
OS


    Clusterware




                                               Cluster Synchronization Services




                  CSSD




6                        © 2009/2010 Pythian
OS


    Clusterware




                                                    Cluster Ready Services




                                               Cluster Synchronization Services

                  CRSD


                  CSSD




6                        © 2009/2010 Pythian
OS


    Clusterware                                       HA Framework scripts

                     VIP

                  RACG
                                                      Cluster Ready Services




                                                 Cluster Synchronization Services

                  CRSD


                  CSSD




6                          © 2009/2010 Pythian
Event Manager

    OS


    Clusterware                                       HA Framework scripts

                     VIP

                  RACG
                                                      Cluster Ready Services
     EVMD




                                                 Cluster Synchronization Services

                  CRSD


                  CSSD




6                          © 2009/2010 Pythian
Event Manager

    OS


    Clusterware                                       HA Framework scripts

                     VIP

                  RACG
                                                      Cluster Ready Services
     EVMD




                                                 Cluster Synchronization Services

                  CRSD


                  CSSD                                Oracle Process Monitor


            OPROCD




6                          © 2009/2010 Pythian
OS


    Clusterware



                      VIP

                  RACG
     EVMD




                  CRSD


                  CSSD


             OPROCD




7                           © 2009/2010 Pythian
OS


    Clusterware



                     VIP

                  RACG
     EVMD




                                                 CSSD


                  CRSD                           OPROCD




7                          © 2009/2010 Pythian
OS                                               OS


    Clusterware                                      Clusterware

                                                                  VIP
                         VIP


                                                           RACG
                  RACG




                                                                        EVMD
     EVMD




                                                           CRSD
                  CRSD



                  CSSD                                     CSSD

                                  interconnect
            OPROCD                                           OPROCD




8                              © 2009/2010 Pythian
OS                                               OS


    Clusterware                                      Clusterware

                                                                  VIP
                         VIP


                                                           RACG
                  RACG




                                                                        EVMD
     EVMD




                                                           CRSD
                  CRSD



                  CSSD                                     CSSD

                                  interconnect
            OPROCD                                           OPROCD




8                              © 2009/2010 Pythian
OS                                               OS


    Clusterware                                      Clusterware

                                                                  VIP
                         VIP


                                                           RACG
                  RACG




                                                                        EVMD
     EVMD




                                                           CRSD
                  CRSD



                  CSSD                                     CSSD

                                  interconnect
            OPROCD                                           OPROCD




8                              © 2009/2010 Pythian
OS                                                  OS


    Clusterware                                         Clusterware


                                                                        VIP
                            VIP

                                                                 RACG
                     RACG




                                                                               EVMD
     EVMD




                                                                 CRSD
                     CRSD



                     CSSD                                       CSSD

                                     interconnect
                  OPROCD                                              OPROCD




                                  Voting
                                   disk




9                                 © 2009/2010 Pythian
OS                                                  OS


    Clusterware                                         Clusterware


                                                                        VIP
                            VIP

                                                                 RACG
                     RACG




                                                                               EVMD
     EVMD




                                                                 CRSD
                     CRSD



                     CSSD                                       CSSD

                                     interconnect
                  OPROCD                                              OPROCD




                                  Voting
                                   disk




9                                 © 2009/2010 Pythian
OS                                                  OS



Shoot
    Clusterware                                         Clusterware


                                                                        VIP
                            VIP

The                  RACG
                                                                 RACG




                                                                               EVMD
     EVMD




Other                CRSD
                                                                 CRSD




Node                 CSSD

                                     interconnect
                                                                CSSD


                  OPROCD                                              OPROCD

In
The
Head                              Voting
                                   disk




9                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD

                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




10                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD

                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




11                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD

                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




11                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD

                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




11                                 © 2009/2010 Pythian
OS


                                  Clusterware


                                                  VIP


                                           RACG




                                                         EVMD
                                           CRSD




     CSSD                                 CSSD

               interconnect
                                                OPROCD




            Voting
             disk




11          © 2009/2010 Pythian
OS


                                                  Clusterware



Ask                                                               VIP


                                                           RACG

The




                                                                         EVMD
                                                           CRSD



Other    CSSD                                             CSSD


Node                           interconnect
                                                                OPROCD




To
Reboot                      Voting
                             disk

Itself    (c) known quote


11                          © 2009/2010 Pythian
OS                                                    OS


     Clusterware                                           Clusterware


                                                                           VIP
                             VIP


                                                                    RACG
                      RACG




                                                                                  EVMD
      EVMD




                                                                    CRSD
                      CRSD
                            CS
                              SD
                                                                   CSSD

                                            interconnect
                   OPROCD                                                OPROCD




                                   Voting
                                    disk




12                                 © 2009/2010 Pythian
OS                                                    OS


     Clusterware                                           Clusterware


                                                                           VIP
                             VIP


                                                                    RACG
                      RACG
             OCLSOMON




                                                                                  EVMD
      EVMD




                                                                    CRSD
                      CRSD
                            CS
                              SD
                                                                   CSSD

                                            interconnect
                   OPROCD                                                OPROCD




                                   Voting
                                    disk




12                                 © 2009/2010 Pythian
OS


                                        Clusterware


                                                        VIP


                                                 RACG
     OCLSOMON




                                                               EVMD
                                                 CRSD



                                                CSSD

                         interconnect
                                                      OPROCD




                Voting
                 disk




12              © 2009/2010 Pythian
OS


                                   Clusterware


                                                   VIP


                                            RACG




                                                          EVMD
                                            CRSD



     CSSD                                  CSSD

                interconnect
                                                 OPROCD




            Voting
             disk




13           © 2009/2010 Pythian
OS


                                   Clusterware


                                                   VIP


                                            RACG




                                                          EVMD
                                            CRSD



     CSSD                                  CSSD

                interconnect
                                                 OPROCD




            Voting
             disk




13           © 2009/2010 Pythian
OS


                                   Clusterware


                                                   VIP


                                            RACG




                                                          EVMD
                                            CRSD



     CSSD                                  CSSD

                interconnect
                                                 OPROCD




            Voting
             disk




13           © 2009/2010 Pythian
OS


                                       Clusterware


                                                       VIP


                                                RACG




                                                              EVMD
                                                CRSD



         CSSD                                  CSSD

                    interconnect
     OPROCD                                          OPROCD




                Voting
                 disk




13               © 2009/2010 Pythian
OS


                                     Clusterware


                                                     VIP


                                              RACG




                                                            EVMD
                                              CRSD



                                             CSSD

                  interconnect
     OPROCD                                        OPROCD




              Voting
               disk




13             © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




14                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




14                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




14                                 © 2009/2010 Pythian
OS


                                  Clusterware


                                                  VIP


                                           RACG




                                                         EVMD
                                           CRSD




     CSSD                                 CSSD
               interconnect
                                                OPROCD




            Voting
             disk




14          © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




                                   Voting
                                    disk




15                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




15                                 © 2009/2010 Pythian
OS                                                  OS


     Clusterware                                         Clusterware


                                                                         VIP
                             VIP


                                                                  RACG
                      RACG




                                                                                EVMD
      EVMD




                                                                  CRSD
                      CRSD



                      CSSD                                       CSSD
                                      interconnect
                   OPROCD                                              OPROCD




15                                 © 2009/2010 Pythian
CSSD                         CSSD
               interconnect




15          © 2009/2010 Pythian
Evictions




16               © 2009/2010 Pythian
Evictions

     • Network   heartbeat lost




16                                © 2009/2010 Pythian
Evictions

     • Network  heartbeat lost
     • Voting disk access lost




16                               © 2009/2010 Pythian
Evictions

     • Network  heartbeat lost
     • Voting disk access lost

     • CSSD is not healthy




16                               © 2009/2010 Pythian
Evictions

     • Network  heartbeat lost
     • Voting disk access lost

     • CSSD is not healthy
     • OS is not healthy
      •   OPROCD - Unix, Windows, 11g Linux
      •   hangcheck-timer - 10g Linux




16                                  © 2009/2010 Pythian
DEMO
     NHB failure


       • Simulate with “ifconfig eth1 down”
       • Both nodes notice the loss

       • Racing to evict each other
        •   from voting disk => 2 equal sub-clusters
        •   survives the one with the lowest leader #
            •   leader is the node with lowest # in sub-cluster
       • Winner        evicts another node
        •   Setting kill-block in voting disk
       • CSSD       and OCLSOMON race to suicide


17                                        © 2009/2010 Pythian
NHB failure symptoms

     • NHB    failure on several nodes
      •   ocssd.log
     • Evicted    node can contain other traces
      •   maybe - syslog (Linux - /var/log/messages)
      •   maybe - oclsomon.log
      •   almost always - console
     • Network        is only *possible* root cause
      •   check syslog, ifconfig, netstat
      •   Network engineering - switches logs




18                                     © 2009/2010 Pythian
DEMO
     CSSD is not healthy

     • Simulate using kill -STOP <cssd.bin pid>
     • Another node observes NHB loss
      •   After misscount seconds => attempt eviction
          •   but CSSD is frozen and can’t commit suicide
     • OCLSOMON           detects CSSD timeout
      •   Commit suicide




19                                         © 2009/2010 Pythian
OCSSD sick - symptoms

     • Error in OCLSOMON.log
     • OCSSD log might be clean on evicted node

     • syslog might contain OCLSOMON diag. err.
     • Console often contains diag. err.
      •   Depending on syslogd settings
     • Set   diagwait to more that 3 for better diagnosability
      •   3 seconds is reboottime
      •   Increases risk of corruption




20                                       © 2009/2010 Pythian
DEMO
     host sick - CPU stalled

     • Simulate     by pausing OPROCD
      •   kill -STOP <oprocd pid>
      •   sleep 1 or 2
      •   kill -CONT <oprocd pid>
     • oprocd.log
      •   Usually nothing if node is reset
     • Immediate         reboot
      •   Console might contain diag msg




21                                     © 2009/2010 Pythian
Killed by OPROCD - symptoms

     • Hard to confirm (nothing in oprocd.log)
     • Console output often helps
      •   “SysRq: resetting” could be in syslog as well
     • Root   cause
      •   Faulty hardware, drivers, caused by IO/network
      •   Kernel bugs, NTP bugs
      •   Investigate syslog messages
     • Margin    can be tuned
      •   diagwait and reboottime CSSD parameters




22                                      © 2009/2010 Pythian
10g on Linux - hangcheck-timer

     • Replaced  by OPROCD in 11g and 10.2.0.4+
     • Most of the time useless and inactive!

     • Metalink Note 726833.1
      •   Updated 21-JUL-08!
     • Oracle   suggests to keep both
      •   I would only leave OPROCD
     • Metalink   Note 567730.1
      •   OPROCD in 10.2.0.4




23                                    © 2009/2010 Pythian
Killed by hangcheck-timer

     • Rarely   can be confirmed
      •   “Hangcheck: hangcheck is restarting the machine”
     • Can   set hangcheck_dump_tasks to dump state

      •   See source code...




24                                  © 2009/2010 Pythian
Clusterware startup

     • Linux    & UNIX inittab
      •   init.cssd
      •   init.evmd
      •   init.crsd
     • Linux    & UNIX init.d
      •   init.crs


     • Windows        Services




25                               © 2009/2010 Pythian
Daemons startup sequence


      Third-party
      clusterware




                    CSSD
                              • Triggered
                               •   by init.crs from init.d sequence
                               •   manually
      EVMD



                       CRSD




26                             © 2009/2010 Pythian
Startup in Linux & Unix
     [gorby@dime ~]$ ps -fe | grep 'init.' | grep -v grep

     root      6352      1   0 10:24 ... /bin/sh /etc/init.d/init.evmd run

     root      6353      1   0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal

     root      6354      1   0 10:24 ... /bin/sh /etc/init.d/init.crsd run

     root      7356   6353   0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd

     root      7364   6353   0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon

     root      7383   6353   0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon



     [gorby@dime ~]$ tail -3 /etc/inittab

     h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null

     h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null

     h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null



     [gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs

     lrwxrwxrwx 1 root root 20 Aug    1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs



27                                          © 2009/2010 Pythian
Startup flow




                             t




28     © 2009/2010 Pythian
Startup flow




     init.cssd fatal

     init.evmd run

     init.crsd run


                                               t




28                       © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun




     init.cssd fatal

     init.evmd run

     init.crsd run


                                                                  t




28                                          © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun




     init.cssd fatal

     init.evmd run

     init.crsd run


                                                                  t




28                                          © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun




                        init.crs start



                              init.cssd autostart




     init.cssd fatal

     init.evmd run

     init.crsd run


                                                                          t




28                                                  © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun
                                                     /etc/oracle/scls_scr/{host}/root/crsstart
                                                                          • enable
                                                                          • disable




                        init.crs start



                              init.cssd autostart




     init.cssd fatal

     init.evmd run

     init.crsd run


                                                                                                 t




28                                                  © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun
                                                     /etc/oracle/scls_scr/{host}/root/crsstart
                                                                          • enable
                                                                          • disable




                        init.crs start



                              init.cssd autostart




     init.cssd fatal

     init.evmd run

     init.crsd run


                                                                                                 t




28                                                  © 2009/2010 Pythian
Startup flow
/etc/oracle/scls_scr/{host}/root/cssrun
                                             /etc/oracle/scls_scr/{host}/root/crsstart
                                                                  • enable
                                                                  • disable




                                                    init.cssd oprodc             oprocd

                                                    init.cssd oclsomon            oclsomon.bin

                                                    init.cssd oclsvmon            oclsvmon.bin

                                                    init.cssd daemon             ocssd.bin
     init.cssd fatal

                                               evmd.bin
     init.evmd run

     init.crsd run                                crsd.bin


                                                                                                 t




28                                          © 2009/2010 Pythian
DEMO
     Startup troubleshooting

     • Check processes using “ps -fe | grep init”
     • Check syslog (/var/log/messages)
      •   Can point to /tmp/crsctl.#####
     • Remember  boot sequence
     • Clusterware log files
      •   if *.bin processes are running already
     • crsctl
      •   crsctl check crs/cssd/crsd/evmd




29                                    © 2009/2010 Pythian
Log files

     • log/{host}/cssd/ocssd.log

     • log/{host}/cssd/oclsomon/ocslmon.log
      •   ocslmon.ba1, ocslmon.ba2,...
     • /etc/oracle/oprocd/{host}.oprocd.log
      •   {host}.oprocd.log.{timestamp}
     • syslog
      •   Linux /var/log/messages
      •   Solaris /var/adm/log
     • Console    logs



30                                   © 2009/2010 Pythian
Windows world

     • OPROCD  = OraFenceService
     • EVMD = OracleEVMService

     • CRSD = OracleCRService
     • CSSD = OracleCSService

     • OPMD
     •   Oracle Process Manager Daemon
     •   Start trigger like init.crs in *nix
     •   registered with Windows Service Control Manager (WSCM) and delay
         start by 60 seconds




31                                      © 2009/2010 Pythian
OS

     Clusterware


                       VIP
                             • Passing    clusterware events
                   RACG



                             • Usually     not a problem
      EVMD




                             • Verify
                              •   evmwatch -A
                   CRSD

                              •   evmpost -u "my message"
                   CSSD


              OPROCD




32                                      © 2009/2010 Pythian
OS
                                                        EVMD
     Clusterware


                       VIP
                             • Passing    clusterware events
                   RACG



                             • Usually     not a problem
                             • Verify
                              •   evmwatch -A
                   CRSD

                              •   evmpost -u "my message"
                   CSSD


              OPROCD




32                                      © 2009/2010 Pythian
OS

     Clusterware


                       VIP


                   RACG
      EVMD




                   CRSD


                   CSSD


              OPROCD




33                           © 2009/2010 Pythian
VIP
     OS
                                                         CRSD

     Clusterware                         RACG




                          • CRSD   manages cluster resources
      EVMD




                          •   Stop / Start
                          •   Failover
                          •   VIP management
                   CSSD
                          •   New resources and etc.
              OPROCD
                          • RACG    helper scripts




33                                 © 2009/2010 Pythian
CRSD startup

     • AfterCSSD and EVMD
     • Re-spawned on failure
      •   No eviction
     • Runs       as root
      •   VIP control
      •   OCR management
      •   root ulimits are in place!
      •   Can run resources owned by any user
          •   owner is the property of a resource




34                                         © 2009/2010 Pythian
Oracle Cluster Registry

                      • Repository      for all configuration data
                         •   Except OCR location itself
                      • OCR   is accessed mostly read-only
                         •   Every component reads OCR
                      • OCR   is written only by CRS
                         •   only from a single OCR master node



### crsd.log ###

2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13:
I AM THE NEW OCR MASTER at incar 12. Node Number 1




35                                © 2009/2010 Pythian
CRS resources

     • Standard       Oracle resources
      •   ASM
      •   Listener
      •   VIP
      •   Database and Instance
      •   etc..
      •   srvctl => manages Oracle resources
     • Custom        user resources
      •   crs_% => manages any resources




36                                    © 2009/2010 Pythian
CRS resource internals

     • Unique name
     • Associated action script
      •   stop / start / check functions
     • Other       attributes
      •   check frequency
      •   pre-requisites
      •   restart retries
      •   etc...
     • All   info stored in OCR



37                                    © 2009/2010 Pythian
DEMO
     Resource profiles

     • Use crs_stat [-t] to check status
     • Use crs_stat -p to check attributes

     • crs_* vs srvctl (like srvctl config ... -a)
     • Standard action scripts
      •   racgimon
      •   racgwrap / racgmain
      •   racgvip
      •   racgons
      •   usrvip




38                                © 2009/2010 Pythian
DEMO
     OCR internals

     • ocrcheck

     • ocrconfig
      •   used during install/ugrade
      •   backup OCR
      •   recover OCR
     • ocrdump
      •   txt or xml




39                                     © 2009/2010 Pythian
DEMO
     racgvip case study

     • Check the script
     • Set env. vars and simulate the call

     • Use _USR_ORA_DEBUG=1 in the script




40                           © 2009/2010 Pythian
Resources hierarchy


                                              CS
                                                             • 10.2.0.2    (?)
           DB
                                           (Collective
                                            Service)           •   released dependency of
                    Service                                        ASM and Instance on VIP
       Instance


                                                             • If
                                                                DB registered
                            ASM

                                                               manually with srvctl
                                       Listener                •   ASM dependency missing

      GSD         ONS

                                    VIP

Nodeapps
                        Only 10.1 and 10.2.0.1




41                                                © 2009/2010 Pythian
Resources and Oracle homes


                                              CS                   DB Home
           DB
                                           (Collective
                                            Service)
                    Service


       Instance

                            ASM                                    ASM Home

                                                                   Listener can be in ASM home
                                                                   ASM home can be Oracle home
                                       Listener


                                                                   CRS Home
      GSD         ONS

                                    VIP

Nodeapps                                                  Logs are in appropriate home

                        Only 10.1 and 10.2.0.1




42                                                © 2009/2010 Pythian
DEMO
     troubleshooting resources

     • {home}/log/{host}/racg/{resource_name}.log

     • Old   way - edit racgwrap
      •   Uncomment _USR_ORA_DEBUG=1
     • crsctl   debug log res ‘{res_name}:{0|1}’
      •   crs_stat -p | grep DEBUG
     • Run   “srvctl start ...” manually
      •   SRVM_TRACE=TRUE




43                                   © 2009/2010 Pythian
Troubleshooting summary

     • crsctl check crs | crsd | cssd | evmd
     • crs_stat [-t]

     • crs_stat -p [{res_name}]
     • crsctl debug log css | crs | evm | res

     • crsctl lsmodules css | crs | evm

     • crs_stop {res_name} [-f] (stop force resource)

     • ocrdump

     • See scripts




44                             © 2009/2010 Pythian
Troubleshooting flow

     • Is Clusterware up?
     • Is Oracle resources up?
      •   Listener & VIP
      •   Database & ASM instance
      •   Services
     • Did any nodes got rebooted?
     • Did any resources re-started?
      •   $ORA_CRS_HOME/log/{host}/crs/crsd.log
      •   $ORA_CRS_HOME/log/{host}/alert{host}.log
     • MOS Note 265769.1 “Troubleshooting 10g and 11.1
      Clusterware Reboots”
45                                  © 2009/2010 Pythian
Enter the 11gR2 World - Grid
     Infrastructure




46                    © 2009/2010 Pythian
Enter the 11gR2 World - Grid
     Infrastructure   Oracle Clusterware Administration and Deployment Guide




46                    © 2009/2010 Pythian
Enter the 11gR2 World - Grid
     Infrastructure
     My Oracle Support Note 1053147.1




47                                      © 2009/2010 Pythian
11g Grid Infrastructure Documentation

     • OracleClusterware Administration and Deployment Guide
     • MOS Note 1053147.1
      •   11gR2 Clusterware and Grid Home - What You Need to Know
     • MOS    Note 1050908.1
      •   How to Troubleshoot Grid Infrastructure Startup Issues
     • MOS    Note 1053970.1
      •   Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues
     • MOS    Note 1050693.1
      •   Troubleshooting 11.2 Clusterware Node Evictions (Reboots)




48                                    © 2009/2010 Pythian
11gR2 Node Evictions

     • Same      as in 10g + member kill escalation
      •   LMON process may request CSS to remove an instance from the
          cluster via the instance eviction mechanism.  If this times out it
          could escalate to a node kill.
     • Processes     evicting
      •   CSSD
      •   CSSDAGENT
      •   CSSDMONITOR




49                                     © 2009/2010 Pythian
Questions?




       Thank you!
   http://www.pythian.com/

gorbachev@pythian.com

        © 2009/2010 Pythian

More Related Content

What's hot

Windows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olssonWindows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olssonscsibeast
 
GlassFish REST Administration Backend
GlassFish REST Administration BackendGlassFish REST Administration Backend
GlassFish REST Administration BackendArun Gupta
 
Hyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingHyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingAhmet Mutlu
 
CompatibleOne Collaborative Project OW2con11
CompatibleOne Collaborative Project OW2con11CompatibleOne Collaborative Project OW2con11
CompatibleOne Collaborative Project OW2con11CompatibleOne
 
OpenStack meetup, March2013 keynote
OpenStack meetup, March2013 keynoteOpenStack meetup, March2013 keynote
OpenStack meetup, March2013 keynoteopenstackindia
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5David Nuescheler
 
CompatibleOne OSCi workshop March 2012
CompatibleOne OSCi workshop March 2012CompatibleOne OSCi workshop March 2012
CompatibleOne OSCi workshop March 2012CompatibleOne
 
Mil soft company-overview
Mil soft company-overviewMil soft company-overview
Mil soft company-overviewtheCemre
 
CompatibleOne SNIA Cloud Plugfest Feb 28 2012
CompatibleOne SNIA Cloud Plugfest Feb 28 2012CompatibleOne SNIA Cloud Plugfest Feb 28 2012
CompatibleOne SNIA Cloud Plugfest Feb 28 2012CompatibleOne
 
Logical Domains
Logical DomainsLogical Domains
Logical DomainsJarod Wang
 
Fabric, Cuisine and Watchdog for server administration in Python
Fabric, Cuisine and Watchdog for server administration in PythonFabric, Cuisine and Watchdog for server administration in Python
Fabric, Cuisine and Watchdog for server administration in PythonFFunction inc
 
Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogConFoo
 
Converged infrastructure ucc
Converged infrastructure  uccConverged infrastructure  ucc
Converged infrastructure ucctamar1981
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageElizabeth Ciabattari
 

What's hot (18)

Windows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olssonWindows offloaded data_transfer_steve_olsson
Windows offloaded data_transfer_steve_olsson
 
GlassFish REST Administration Backend
GlassFish REST Administration BackendGlassFish REST Administration Backend
GlassFish REST Administration Backend
 
LMAX Architecture
LMAX ArchitectureLMAX Architecture
LMAX Architecture
 
Hyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingHyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computing
 
CompatibleOne Collaborative Project OW2con11
CompatibleOne Collaborative Project OW2con11CompatibleOne Collaborative Project OW2con11
CompatibleOne Collaborative Project OW2con11
 
OpenStack meetup, March2013 keynote
OpenStack meetup, March2013 keynoteOpenStack meetup, March2013 keynote
OpenStack meetup, March2013 keynote
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
 
What's new in JSR-283?
What's new in JSR-283?What's new in JSR-283?
What's new in JSR-283?
 
CompatibleOne OSCi workshop March 2012
CompatibleOne OSCi workshop March 2012CompatibleOne OSCi workshop March 2012
CompatibleOne OSCi workshop March 2012
 
Mil soft company-overview
Mil soft company-overviewMil soft company-overview
Mil soft company-overview
 
CompatibleOne SNIA Cloud Plugfest Feb 28 2012
CompatibleOne SNIA Cloud Plugfest Feb 28 2012CompatibleOne SNIA Cloud Plugfest Feb 28 2012
CompatibleOne SNIA Cloud Plugfest Feb 28 2012
 
Logical Domains
Logical DomainsLogical Domains
Logical Domains
 
Fabric, Cuisine and Watchdog for server administration in Python
Fabric, Cuisine and Watchdog for server administration in PythonFabric, Cuisine and Watchdog for server administration in Python
Fabric, Cuisine and Watchdog for server administration in Python
 
Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and Watchdog
 
Converged infrastructure ucc
Converged infrastructure  uccConverged infrastructure  ucc
Converged infrastructure ucc
 
Orange is v cloud 3
Orange is v cloud 3Orange is v cloud 3
Orange is v cloud 3
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable Storage
 
Orange is v cloud 3
Orange is v cloud 3Orange is v cloud 3
Orange is v cloud 3
 

Similar to MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian

Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterpriseGruter
 
Triangle OpenStack Meetup
Triangle OpenStack MeetupTriangle OpenStack Meetup
Triangle OpenStack Meetupmestery
 
Bangalore cloudstack user group
Bangalore cloudstack user groupBangalore cloudstack user group
Bangalore cloudstack user groupShapeBlue
 
Security in the Cloud
Security in the CloudSecurity in the Cloud
Security in the CloudWSO2
 
vBrownBag OpenStack Networking Talk
vBrownBag OpenStack Networking TalkvBrownBag OpenStack Networking Talk
vBrownBag OpenStack Networking Talkmestery
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practiceOpenCity Community
 
CloudStack for Java User Group
CloudStack for Java User GroupCloudStack for Java User Group
CloudStack for Java User GroupSebastien Goasguen
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Serhad MAKBULOĞLU, MBA
 
z/VM 6.2: Increasing the Endless Possibilities of Virtualization
z/VM 6.2: Increasing the Endless Possibilities of Virtualizationz/VM 6.2: Increasing the Endless Possibilities of Virtualization
z/VM 6.2: Increasing the Endless Possibilities of VirtualizationIBM India Smarter Computing
 
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...OpenStack Korea Community
 
Data center Technologies
Data center TechnologiesData center Technologies
Data center TechnologiesEMC
 
Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionJerome Lecat
 
Scality, Cloud Storage pour Zimbra
Scality, Cloud Storage pour ZimbraScality, Cloud Storage pour Zimbra
Scality, Cloud Storage pour ZimbraAntony Barroux
 
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...EMC Forum India
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
FOSS Sthlm: Realtime Communication Update
FOSS Sthlm: Realtime Communication UpdateFOSS Sthlm: Realtime Communication Update
FOSS Sthlm: Realtime Communication UpdateOlle E Johansson
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 

Similar to MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian (20)

RubyWorld 2011
RubyWorld 2011RubyWorld 2011
RubyWorld 2011
 
Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterprise
 
Triangle OpenStack Meetup
Triangle OpenStack MeetupTriangle OpenStack Meetup
Triangle OpenStack Meetup
 
Bangalore cloudstack user group
Bangalore cloudstack user groupBangalore cloudstack user group
Bangalore cloudstack user group
 
Security in the Cloud
Security in the CloudSecurity in the Cloud
Security in the Cloud
 
Security in the Cloud
Security in the CloudSecurity in the Cloud
Security in the Cloud
 
vBrownBag OpenStack Networking Talk
vBrownBag OpenStack Networking TalkvBrownBag OpenStack Networking Talk
vBrownBag OpenStack Networking Talk
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
 
CloudStack for Java User Group
CloudStack for Java User GroupCloudStack for Java User Group
CloudStack for Java User Group
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
 
z/VM 6.2: Increasing the Endless Possibilities of Virtualization
z/VM 6.2: Increasing the Endless Possibilities of Virtualizationz/VM 6.2: Increasing the Endless Possibilities of Virtualization
z/VM 6.2: Increasing the Endless Possibilities of Virtualization
 
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
 
Data center Technologies
Data center TechnologiesData center Technologies
Data center Technologies
 
Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distribution
 
Scality, Cloud Storage pour Zimbra
Scality, Cloud Storage pour ZimbraScality, Cloud Storage pour Zimbra
Scality, Cloud Storage pour Zimbra
 
Ibm 23sept2010
Ibm 23sept2010Ibm 23sept2010
Ibm 23sept2010
 
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
FOSS Sthlm: Realtime Communication Update
FOSS Sthlm: Realtime Communication UpdateFOSS Sthlm: Realtime Communication Update
FOSS Sthlm: Realtime Communication Update
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 

More from Alex Gorbachev

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsAlex Gorbachev
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Alex Gorbachev
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevAlex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Alex Gorbachev
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...Alex Gorbachev
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionAlex Gorbachev
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Alex Gorbachev
 

More from Alex Gorbachev (8)

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
 

MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian

  • 1. Under the Hood of Oracle Clusterware Miracle OpenWorld 2010 15-Apr-2010 Alex Gorbachev, The Pythian Group
  • 2. Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • Vice-president, Oracle RAC SIG 2 © 2009/2010 Pythian
  • 3. Why Companies Trust Pythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response 3 © 2009/2010 Pythian
  • 4. Agenda • Place of Clusterware in Oracle RAC • Node membership and evictions • Clusterware startup sequence • Oracle Cluster Registry • Resources Management and troubleshooting • 11gR2 Grid Infrastructure 4 © 2009/2010 Pythian
  • 5. Agenda High th Th e e le m ss or yo e y Need to memorize u ou ne u ed nd to ers m ta em nd or , iz e Low Shallow In-depth Understanding 4 © 2009/2010 Pythian
  • 6. Architecture OS OS OS VIP VIP VIP Listener Listener Listener Service Service Service Instance Instance Instance ASM ASM ASM Clusterware Clusterware Clusterware interconnect storage access OCR Voting disk Shared storage 5 © 2009/2010 Pythian
  • 7. Architecture OS OS OS VIP VIP VIP Listener Listener Listener Service Service Service Instance Instance Instance ASM ASM ASM Clusterware Clusterware Clusterware interconnect storage access OCR Voting disk Shared storage 5 © 2009/2010 Pythian
  • 8. OS Clusterware 6 © 2009/2010 Pythian
  • 9. OS Clusterware Cluster Synchronization Services CSSD 6 © 2009/2010 Pythian
  • 10. OS Clusterware Cluster Ready Services Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • 11. OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • 12. Event Manager OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services EVMD Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • 13. Event Manager OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services EVMD Cluster Synchronization Services CRSD CSSD Oracle Process Monitor OPROCD 6 © 2009/2010 Pythian
  • 14. OS Clusterware VIP RACG EVMD CRSD CSSD OPROCD 7 © 2009/2010 Pythian
  • 15. OS Clusterware VIP RACG EVMD CSSD CRSD OPROCD 7 © 2009/2010 Pythian
  • 16. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • 17. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • 18. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • 19. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 9 © 2009/2010 Pythian
  • 20. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 9 © 2009/2010 Pythian
  • 21. OS OS Shoot Clusterware Clusterware VIP VIP The RACG RACG EVMD EVMD Other CRSD CRSD Node CSSD interconnect CSSD OPROCD OPROCD In The Head Voting disk 9 © 2009/2010 Pythian
  • 22. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 10 © 2009/2010 Pythian
  • 23. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • 24. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • 25. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • 26. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 11 © 2009/2010 Pythian
  • 27. OS Clusterware Ask VIP RACG The EVMD CRSD Other CSSD CSSD Node interconnect OPROCD To Reboot Voting disk Itself (c) known quote 11 © 2009/2010 Pythian
  • 28. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CS SD CSSD interconnect OPROCD OPROCD Voting disk 12 © 2009/2010 Pythian
  • 29. OS OS Clusterware Clusterware VIP VIP RACG RACG OCLSOMON EVMD EVMD CRSD CRSD CS SD CSSD interconnect OPROCD OPROCD Voting disk 12 © 2009/2010 Pythian
  • 30. OS Clusterware VIP RACG OCLSOMON EVMD CRSD CSSD interconnect OPROCD Voting disk 12 © 2009/2010 Pythian
  • 31. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • 32. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • 33. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • 34. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 13 © 2009/2010 Pythian
  • 35. OS Clusterware VIP RACG EVMD CRSD CSSD interconnect OPROCD OPROCD Voting disk 13 © 2009/2010 Pythian
  • 36. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • 37. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • 38. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • 39. OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 14 © 2009/2010 Pythian
  • 40. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 15 © 2009/2010 Pythian
  • 41. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 15 © 2009/2010 Pythian
  • 42. OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 15 © 2009/2010 Pythian
  • 43. CSSD CSSD interconnect 15 © 2009/2010 Pythian
  • 44. Evictions 16 © 2009/2010 Pythian
  • 45. Evictions • Network heartbeat lost 16 © 2009/2010 Pythian
  • 46. Evictions • Network heartbeat lost • Voting disk access lost 16 © 2009/2010 Pythian
  • 47. Evictions • Network heartbeat lost • Voting disk access lost • CSSD is not healthy 16 © 2009/2010 Pythian
  • 48. Evictions • Network heartbeat lost • Voting disk access lost • CSSD is not healthy • OS is not healthy • OPROCD - Unix, Windows, 11g Linux • hangcheck-timer - 10g Linux 16 © 2009/2010 Pythian
  • 49. DEMO NHB failure • Simulate with “ifconfig eth1 down” • Both nodes notice the loss • Racing to evict each other • from voting disk => 2 equal sub-clusters • survives the one with the lowest leader # • leader is the node with lowest # in sub-cluster • Winner evicts another node • Setting kill-block in voting disk • CSSD and OCLSOMON race to suicide 17 © 2009/2010 Pythian
  • 50. NHB failure symptoms • NHB failure on several nodes • ocssd.log • Evicted node can contain other traces • maybe - syslog (Linux - /var/log/messages) • maybe - oclsomon.log • almost always - console • Network is only *possible* root cause • check syslog, ifconfig, netstat • Network engineering - switches logs 18 © 2009/2010 Pythian
  • 51. DEMO CSSD is not healthy • Simulate using kill -STOP <cssd.bin pid> • Another node observes NHB loss • After misscount seconds => attempt eviction • but CSSD is frozen and can’t commit suicide • OCLSOMON detects CSSD timeout • Commit suicide 19 © 2009/2010 Pythian
  • 52. OCSSD sick - symptoms • Error in OCLSOMON.log • OCSSD log might be clean on evicted node • syslog might contain OCLSOMON diag. err. • Console often contains diag. err. • Depending on syslogd settings • Set diagwait to more that 3 for better diagnosability • 3 seconds is reboottime • Increases risk of corruption 20 © 2009/2010 Pythian
  • 53. DEMO host sick - CPU stalled • Simulate by pausing OPROCD • kill -STOP <oprocd pid> • sleep 1 or 2 • kill -CONT <oprocd pid> • oprocd.log • Usually nothing if node is reset • Immediate reboot • Console might contain diag msg 21 © 2009/2010 Pythian
  • 54. Killed by OPROCD - symptoms • Hard to confirm (nothing in oprocd.log) • Console output often helps • “SysRq: resetting” could be in syslog as well • Root cause • Faulty hardware, drivers, caused by IO/network • Kernel bugs, NTP bugs • Investigate syslog messages • Margin can be tuned • diagwait and reboottime CSSD parameters 22 © 2009/2010 Pythian
  • 55. 10g on Linux - hangcheck-timer • Replaced by OPROCD in 11g and 10.2.0.4+ • Most of the time useless and inactive! • Metalink Note 726833.1 • Updated 21-JUL-08! • Oracle suggests to keep both • I would only leave OPROCD • Metalink Note 567730.1 • OPROCD in 10.2.0.4 23 © 2009/2010 Pythian
  • 56. Killed by hangcheck-timer • Rarely can be confirmed • “Hangcheck: hangcheck is restarting the machine” • Can set hangcheck_dump_tasks to dump state • See source code... 24 © 2009/2010 Pythian
  • 57. Clusterware startup • Linux & UNIX inittab • init.cssd • init.evmd • init.crsd • Linux & UNIX init.d • init.crs • Windows Services 25 © 2009/2010 Pythian
  • 58. Daemons startup sequence Third-party clusterware CSSD • Triggered • by init.crs from init.d sequence • manually EVMD CRSD 26 © 2009/2010 Pythian
  • 59. Startup in Linux & Unix [gorby@dime ~]$ ps -fe | grep 'init.' | grep -v grep root 6352 1 0 10:24 ... /bin/sh /etc/init.d/init.evmd run root 6353 1 0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal root 6354 1 0 10:24 ... /bin/sh /etc/init.d/init.crsd run root 7356 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd root 7364 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon root 7383 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon [gorby@dime ~]$ tail -3 /etc/inittab h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null [gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs lrwxrwxrwx 1 root root 20 Aug 1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs 27 © 2009/2010 Pythian
  • 60. Startup flow t 28 © 2009/2010 Pythian
  • 61. Startup flow init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 62. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 63. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 64. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 65. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 66. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • 67. Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.cssd oprodc oprocd init.cssd oclsomon oclsomon.bin init.cssd oclsvmon oclsvmon.bin init.cssd daemon ocssd.bin init.cssd fatal evmd.bin init.evmd run init.crsd run crsd.bin t 28 © 2009/2010 Pythian
  • 68. DEMO Startup troubleshooting • Check processes using “ps -fe | grep init” • Check syslog (/var/log/messages) • Can point to /tmp/crsctl.##### • Remember boot sequence • Clusterware log files • if *.bin processes are running already • crsctl • crsctl check crs/cssd/crsd/evmd 29 © 2009/2010 Pythian
  • 69. Log files • log/{host}/cssd/ocssd.log • log/{host}/cssd/oclsomon/ocslmon.log • ocslmon.ba1, ocslmon.ba2,... • /etc/oracle/oprocd/{host}.oprocd.log • {host}.oprocd.log.{timestamp} • syslog • Linux /var/log/messages • Solaris /var/adm/log • Console logs 30 © 2009/2010 Pythian
  • 70. Windows world • OPROCD = OraFenceService • EVMD = OracleEVMService • CRSD = OracleCRService • CSSD = OracleCSService • OPMD • Oracle Process Manager Daemon • Start trigger like init.crs in *nix • registered with Windows Service Control Manager (WSCM) and delay start by 60 seconds 31 © 2009/2010 Pythian
  • 71. OS Clusterware VIP • Passing clusterware events RACG • Usually not a problem EVMD • Verify • evmwatch -A CRSD • evmpost -u "my message" CSSD OPROCD 32 © 2009/2010 Pythian
  • 72. OS EVMD Clusterware VIP • Passing clusterware events RACG • Usually not a problem • Verify • evmwatch -A CRSD • evmpost -u "my message" CSSD OPROCD 32 © 2009/2010 Pythian
  • 73. OS Clusterware VIP RACG EVMD CRSD CSSD OPROCD 33 © 2009/2010 Pythian
  • 74. VIP OS CRSD Clusterware RACG • CRSD manages cluster resources EVMD • Stop / Start • Failover • VIP management CSSD • New resources and etc. OPROCD • RACG helper scripts 33 © 2009/2010 Pythian
  • 75. CRSD startup • AfterCSSD and EVMD • Re-spawned on failure • No eviction • Runs as root • VIP control • OCR management • root ulimits are in place! • Can run resources owned by any user • owner is the property of a resource 34 © 2009/2010 Pythian
  • 76. Oracle Cluster Registry • Repository for all configuration data • Except OCR location itself • OCR is accessed mostly read-only • Every component reads OCR • OCR is written only by CRS • only from a single OCR master node ### crsd.log ### 2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13: I AM THE NEW OCR MASTER at incar 12. Node Number 1 35 © 2009/2010 Pythian
  • 77. CRS resources • Standard Oracle resources • ASM • Listener • VIP • Database and Instance • etc.. • srvctl => manages Oracle resources • Custom user resources • crs_% => manages any resources 36 © 2009/2010 Pythian
  • 78. CRS resource internals • Unique name • Associated action script • stop / start / check functions • Other attributes • check frequency • pre-requisites • restart retries • etc... • All info stored in OCR 37 © 2009/2010 Pythian
  • 79. DEMO Resource profiles • Use crs_stat [-t] to check status • Use crs_stat -p to check attributes • crs_* vs srvctl (like srvctl config ... -a) • Standard action scripts • racgimon • racgwrap / racgmain • racgvip • racgons • usrvip 38 © 2009/2010 Pythian
  • 80. DEMO OCR internals • ocrcheck • ocrconfig • used during install/ugrade • backup OCR • recover OCR • ocrdump • txt or xml 39 © 2009/2010 Pythian
  • 81. DEMO racgvip case study • Check the script • Set env. vars and simulate the call • Use _USR_ORA_DEBUG=1 in the script 40 © 2009/2010 Pythian
  • 82. Resources hierarchy CS • 10.2.0.2 (?) DB (Collective Service) • released dependency of Service ASM and Instance on VIP Instance • If DB registered ASM manually with srvctl Listener • ASM dependency missing GSD ONS VIP Nodeapps Only 10.1 and 10.2.0.1 41 © 2009/2010 Pythian
  • 83. Resources and Oracle homes CS DB Home DB (Collective Service) Service Instance ASM ASM Home Listener can be in ASM home ASM home can be Oracle home Listener CRS Home GSD ONS VIP Nodeapps Logs are in appropriate home Only 10.1 and 10.2.0.1 42 © 2009/2010 Pythian
  • 84. DEMO troubleshooting resources • {home}/log/{host}/racg/{resource_name}.log • Old way - edit racgwrap • Uncomment _USR_ORA_DEBUG=1 • crsctl debug log res ‘{res_name}:{0|1}’ • crs_stat -p | grep DEBUG • Run “srvctl start ...” manually • SRVM_TRACE=TRUE 43 © 2009/2010 Pythian
  • 85. Troubleshooting summary • crsctl check crs | crsd | cssd | evmd • crs_stat [-t] • crs_stat -p [{res_name}] • crsctl debug log css | crs | evm | res • crsctl lsmodules css | crs | evm • crs_stop {res_name} [-f] (stop force resource) • ocrdump • See scripts 44 © 2009/2010 Pythian
  • 86. Troubleshooting flow • Is Clusterware up? • Is Oracle resources up? • Listener & VIP • Database & ASM instance • Services • Did any nodes got rebooted? • Did any resources re-started? • $ORA_CRS_HOME/log/{host}/crs/crsd.log • $ORA_CRS_HOME/log/{host}/alert{host}.log • MOS Note 265769.1 “Troubleshooting 10g and 11.1 Clusterware Reboots” 45 © 2009/2010 Pythian
  • 87. Enter the 11gR2 World - Grid Infrastructure 46 © 2009/2010 Pythian
  • 88. Enter the 11gR2 World - Grid Infrastructure Oracle Clusterware Administration and Deployment Guide 46 © 2009/2010 Pythian
  • 89. Enter the 11gR2 World - Grid Infrastructure My Oracle Support Note 1053147.1 47 © 2009/2010 Pythian
  • 90. 11g Grid Infrastructure Documentation • OracleClusterware Administration and Deployment Guide • MOS Note 1053147.1 • 11gR2 Clusterware and Grid Home - What You Need to Know • MOS Note 1050908.1 • How to Troubleshoot Grid Infrastructure Startup Issues • MOS Note 1053970.1 • Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues • MOS Note 1050693.1 • Troubleshooting 11.2 Clusterware Node Evictions (Reboots) 48 © 2009/2010 Pythian
  • 91. 11gR2 Node Evictions • Same as in 10g + member kill escalation • LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism.  If this times out it could escalate to a node kill. • Processes evicting • CSSD • CSSDAGENT • CSSDMONITOR 49 © 2009/2010 Pythian
  • 92. Questions? Thank you! http://www.pythian.com/ gorbachev@pythian.com © 2009/2010 Pythian

Editor's Notes

  1. - Successful growing business for more than 10 years - Served many customers with complex requirements/infrastructure just like yours. - Operate globally for 24 x 7 &amp;#x201C;always awake&amp;#x201D; services
  2. Clusterware is generic with customizations for Oracle resources. Only Clusterware accesses OCR and VD. Only DB instances access shared database files. OCR is accessed by almost every Clusterware component - configuration read from OCR. VIP is part of OC. Emphasize shared access to data!!!
  3. Clusterware is generic with customizations for Oracle resources. Only Clusterware accesses OCR and VD. Only DB instances access shared database files. OCR is accessed by almost every Clusterware component - configuration read from OCR. VIP is part of OC. Emphasize shared access to data!!!
  4. Clusterware is generic with customizations for Oracle resources. Only Clusterware accesses OCR and VD. Only DB instances access shared database files. OCR is accessed by almost every Clusterware component - configuration read from OCR. VIP is part of OC. Emphasize shared access to data!!!
  5. Clusterware is generic with customizations for Oracle resources. Only Clusterware accesses OCR and VD. Only DB instances access shared database files. OCR is accessed by almost every Clusterware component - configuration read from OCR. VIP is part of OC. Emphasize shared access to data!!!
  6. OPROCD - pre 10.2.0.4 - hangcheck-timer
  7. OPROCD - pre 10.2.0.4 - hangcheck-timer
  8. OPROCD - pre 10.2.0.4 - hangcheck-timer
  9. OPROCD - pre 10.2.0.4 - hangcheck-timer
  10. OPROCD - pre 10.2.0.4 - hangcheck-timer
  11. OPROCD - pre 10.2.0.4 - hangcheck-timer
  12. OPROCD - pre 10.2.0.4 - hangcheck-timer
  13. OPROCD - pre 10.2.0.4 - hangcheck-timer
  14. OPROCD - pre 10.2.0.4 - hangcheck-timer
  15. OPROCD - pre 10.2.0.4 - hangcheck-timer
  16. Node membership and group membership for instances, ASM diskgrops
  17. Node membership and group membership for instances, ASM diskgrops
  18. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  19. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  20. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  21. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  22. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  23. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  24. CSSD cannot talk to each other -&gt; operations are not synchronized -&gt; shared data access -&gt; corruption
  25. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  26. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  27. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  28. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  29. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  30. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  31. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  32. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  33. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  34. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  35. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  36. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  37. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  38. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  39. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  40. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  41. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  42. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  43. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  44. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  45. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  46. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  47. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  48. In addition to NHB, Oracle introduced DHB. IO Fencing needed on split brain to avoid evicted node doing any further IO&amp;#x2019;s. Oracle doesn&amp;#x2019;t rely on any hardware - need compatibility with all palatform/hardware.
  49. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  50. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  51. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  52. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  53. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  54. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  55. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  56. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  57. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  58. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  59. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  60. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  61. Oracle can&amp;#x2019;t shoot another node without remote control and can&amp;#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). What&amp;#x2019;s left - beg another another - please shoot yourself!
  62. What if CSSD is not healthy? It&amp;#x2019;s very possible that it&amp;#x2019;s not network problem but CSSD just doesn&amp;#x2019;t reply for some reason. OCLSOMON comes to the scene.
  63. What if CSSD is not healthy? It&amp;#x2019;s very possible that it&amp;#x2019;s not network problem but CSSD just doesn&amp;#x2019;t reply for some reason. OCLSOMON comes to the scene.
  64. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  65. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  66. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  67. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  68. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  69. Worse yes, the whole node is sick and even OCLSOMON can&amp;#x2019;t function properly. Like CPU execution is stall.
  70. Losing access to voting disks - CSSD commit suicide. Why? Cluster must have two communication paths + VD is the media for IO fencing.
  71. Losing access to voting disks - CSSD commit suicide. Why? Cluster must have two communication paths + VD is the media for IO fencing.
  72. Losing access to voting disks - CSSD commit suicide. Why? Cluster must have two communication paths + VD is the media for IO fencing.
  73. All nodes can reboot if voting disk is lost. Good time to discuss voting disk redundancy? 1 vs 2 vs 3
  74. All nodes can reboot if voting disk is lost. Good time to discuss voting disk redundancy? 1 vs 2 vs 3
  75. All nodes can reboot if voting disk is lost. Good time to discuss voting disk redundancy? 1 vs 2 vs 3
  76. All nodes can reboot if voting disk is lost. Good time to discuss voting disk redundancy? 1 vs 2 vs 3
  77. All nodes can reboot if voting disk is lost. Good time to discuss voting disk redundancy? 1 vs 2 vs 3
  78. diagwait -&gt; not set by default (assumed 0) reboottime -&gt; 3 seconds margin = reboottime - diagwait See init.cssd for more details
  79. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  80. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  81. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  82. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  83. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  84. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  85. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  86. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  87. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  88. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  89. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  90. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  91. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  92. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  93. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  94. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  95. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  96. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  97. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  98. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  99. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  100. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  101. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  102. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  103. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  104. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  105. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  106. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  107. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  108. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  109. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  110. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  111. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  112. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  113. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  114. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  115. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  116. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  117. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  118. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  119. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  120. When Clusterware autostart is disabled (crsstart -&gt; disable) then &amp;#x201C;init.cssd autostart&amp;#x201D; doesn&amp;#x2019;t do anything. In this case a DBA can initiate the start later using &amp;#x201C;init.crs start&amp;#x201D; (10.1+) or crsctl start crs (10.2+).
  121. Configuration data - voting disks, ports, resource profiles (ASM, instances, listeners, VIPs and etc).
  122. DEMO - existing dependencies
  123. DB is in CRS Home Log files would be in appropriate Oracle home: {home}/log/{host}/racg/{resource_name}.log DEMO - log files and action script home match! DEMO - IMON logs
  124. DEMO - stop DB + rename spfile + start DB old way if have time with .cap file
  125. DEMO - lsmodules