SlideShare une entreprise Scribd logo
1  sur  29
Oracle Clusterware and Private Network Considerations- Practical Performance Management for Oracle RAC November 12, 2009 1 Guenadi Nedkov Jilevski
Agenda Oracle RAC Fundamentals and Infrastructure. Analysis of Cache fusion Impact on RAC. Private Interconnect Considerations. Aggregation. Common known Problems and Symptoms - from cache fusion wait events and statistics.   Diagnostics and Problem troubleshooting. Q and A November 12, 2009 2
Oracle RAC Fundamentals and infrastructure Oracle RAC Architecture November 12, 2009 3
Oracle rac fundamentals and infrastructure Function and Processes of Global Enqueue Services (GES) and Global Cache Services (GCS) November 12, 2009 4
ORACLE rac FUNDAMENTAL And INFRASTRUCTURE Global Buffer Cache November 12, 2009 5
Analyzing Cache fusion impact in rac The cost of block access and cache coherency is represented by: Global Cache services  statistics Global Cache Services wait events The response time for cache fusion transfers is determined by: Overhead by the physical interconnect components IPC protocol GCS protocol The response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in a write-read or write-write situation November 12, 2009 6
Analyzing cache fusion impact on rac Typical Latencies for RAC Operations November 12, 2009 7 ,[object Object]
Current block request time = pin time + flash time + send time
Latencies from  V$SYSSTAT
Other Latencies may be seen in V$SEG_STATISTICS ,[object Object]
Analyzing Cache fusion impact on RAC Wait Events Views November 12, 2009 9
Analyzing cache fusion impact on rac November 12, 2009 10 Global Cache Wait Events: Overview
Analyzing cache fusion impact on rac November 12, 2009 11 2 – way Block Request: Example
Analyzing cache fusion impact on rac November 12, 2009 12 3-way Block Request: Example
Analyzing cache fusion impact on rac  November 12, 2009 13 2-way Grant : Example
Analyzing cache fusion impact on rac Enqueues are synchronous. Enqueues are global resources in RAC The most frequent wait are for: TX – row wait locks or ITL waits TM – Table Manipulation Enqueue TA – Transaction Recovery Enqueue SQ – Sequence generation Enqueue HW – High Watermark Enqueue US – Undo Segment Enqueue to manage undo segment extensions. The waits may constitute serious serialization point November 12, 2009 14 Global Enqueue Waits: Overview
Analyzing cache fusion impact on rac Use V$SYSSTAT to characterize the workload. Use V$SESSSTAT to monitor important sessions. V$SEGMENT_STATISTICS includes RAC statistics. RAC relevant statistics group are: Global Cache Service statistics Global Enqueue Service  statistics Statistics for messages send V$ENQUEUE_STATISTICS  determines the enqueue with the highest impact. V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes. November 12, 2009 15 Session and System Statistics
Private Interconnect Considerations November 12, 2009 16 IPC Configuration
Private Interconnect Considerations November 12, 2009 17 Infrastructure Network Packet Processing
Private Interconnect considerations November 12, 2009 18 Network Packet Processing: Layers, Queues and Buffers
Private Interconnect Considerations Network between the nodes of a RAC cluster must be private. NIC to have the same name across all the nodes in the RAC cluster. Supported links: Gbe, IB   Supported transport protocols: UDP, RDS  Use multiple or dual-ported NICs for redundancy (HA), load balancing, load spreading and increase bandwidth with NIC bonding/aggregation. Large ( Jumbo ) Frames for Gbe recommended if the global cache workload requires it. Bandwidth requirements depend on several factors ( e.g. buffer cache size, #of CPUs per node, access patterns) and cannot be predicted precisely for every application For OLTP 1Gb/sec usually is sufficient for performance and scalability. DSS/DW systems should be designed with > 1Gb/sec capacity  November 12, 2009 19 Infrastructure: Private Interconnect
Private Interconnect considerations Important Settings: Negotiated top bit rate and full duplex mode NIC ring buffers Ethernet flow control settings CPU(s) receiving network interrupts Verify your setup: CVU does checking Load testing eliminates potential for problems AWR and ADDM give estimations of link utilization Buffer overflows, congested links and flow control can have severe consequences for performance Block access latencies increase when CPU(s) busy and run queues are long Immediate LMS scheduling is critical for predictable block access latencies when CPU > 80% busy Fewer and busier LMS processes may be more efficient. monitor their CPU utilization Caveat: 1 LMS can be good for runtime performance but may impact cluster reconfiguration and instance recovery time the default is good for most requirements. gcs_server_processes init parameter overrides defaults  Higher priority for LMS is default The implementation is platform-specific  November 12, 2009 20 Infrastructure: IPC configuration and Operating System
Private interconnect considerations Interconnect should be dedicated non-routable subnet mapped to a single dedicated, non-shared VLAN If VLANs are ‘trunked’ the interconnect VLAN traffic should not exceed the access switch layer Minimize the impact of Spanning Tree events Monitor the switch(es) for congestion Avoid QoS definitions that may negatively impact interconnect performance NIC driver dependent – DEFAULTS GENERALLY SATISFACTORY Confirm flow control: rx=on, tx=off Confirm full bit rate (1000) for the NICs Confirm full duplex auto-negotiate Ensure NIC names/slots identical on all nodes Configure interconnect NICs on fastest PCI bus Ensure compatible switch settings 802.3ad on NICs = 802.3ad on switch ports MTU=9000 on NICs = MTU=9000 on switch ports FAILURE TO CONFIGURE THE NICS AND SWITCHES CORRECTLY WILL RESULT IN SEVERE  PERFORMANCE DEGRADATION AND NODE FENCING November 12, 2009 21 The Interconnects, VLANs and NIC settings
Private Interconnect considerations November 12, 2009 22
Aggregation Cisco Etherchannel based 802.3ad AIX Etherchannel HPUX Auto Port Aggregation SUN Trunking, IPMP, GLD Linux Bonding (only certain modes) Windows NIC teaming Aggregation Methods Load balance/failover/load spreading spread on sends/serialize on receives Active/Standby Oracle Interconnect Requirement Both Send/Receive side load balancing NIC and Switch port failure detection November 12, 2009 23
Common Problems and symptoms  gc [current][cr] block lost:  This event shows block losses during transfers. High values indicate IPC, downstream network problems. ‘request retry’ event is likely to be seen . global cache blocks corrupt: This statistic shows if any blocks were corrupted during transfers. If high values are returned for this statistic, there is probably an IPC, network or hardware problem.  global cache open s and global cache open x: The initial access of a particular data block by an instance generates these events. The duration of the wait should be short, and the completion of the wait is most likely followed by a read from disk. This wait is a result of the blocks that are being requested and not being cached in any instance in the cluster database. Pre-load heavily used tables into the buffer caches. global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Interinstance block ping is when two instances exchange the same block back and forth. Reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster. global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred has not arrived at the requesting instance. Placeholder event. Look for other gc events. gc buffer busy:  This event can be associated with a disk I/O contention for example slow disk I/O due to rogue query. Slow concurrent scans can cause buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event.  Global cache access and serialization attributes to this event. Serialization is likely to be due to log flush time on another node or immediate block transfers.  November 12, 2009 24 Wait events worth investigation
Common Problems and symptoms  congested:The events that contain  ‘congested’ suggest CPU, LMS saturation, long running queries, swapping, network configuration issues. Maintain a global view and remember that symptom and cause can be on different instances. busy:  The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization. Gc [current/cr] [2/3]-way –Increase private interconnects bandwidth and decreasing the private interconnects latency. Gc [current/cr] grant 2-way – Increase private interconnects bandwidth and decreasing the private interconnects latency.  Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of  intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. We will look at it later. Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy. Variety of reasons for being busy just means cannot be sent immediately due to Oracle oriented reasons. Gc current grant busy – Grant is received but there is a delay due to many shared block images or load.  Gc [current/cr][failure/retry] -  Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry. Investigate the IPC or downstream network problems.  November 12, 2009 25 Wait events worth investigation
Diagnostics and Problem Determination  Tune for a single instance first Tune for RAC Instance Recovery Interconnect traffic Points of serialization can be exacerbated RAC–reactive tuning tools : Specific Wait events System and enqueue statistics Enterprise Manager performance pages AWR and ASH reports RAC – proactive tools AWR snapshots ADDM reports November 12, 2009 26
Diagnostics and Problem Determination Application tuning is often the most beneficial. Resizing and tuning the buffer cache. Reducing the long full-table scans in OLTP systems. Using Automatic Segment Space Management. Increasing sequence caches. Using partitioning to reduce inter-instance traffic. Avoid unnecessary parsing. Minimizing locking usage. Removing unselective indexes. Configuring Interconnect properly. November 12, 2009 27 Most common RAC tuning tips

Contenu connexe

Tendances

Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksCloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksScott Jenner
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareNikhil Kumar
 
Oracle flex asm & flex cluster
Oracle flex asm & flex clusterOracle flex asm & flex cluster
Oracle flex asm & flex clusterGhanshyam Khetan
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASMRACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASMMaaz Anjum
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMAlex Zaballa
 
Oracle Linux and Oracle Database - A Trusted Combination
Oracle Linux and Oracle Database - A Trusted Combination Oracle Linux and Oracle Database - A Trusted Combination
Oracle Linux and Oracle Database - A Trusted Combination Guatemala User Group
 
Best practices oracle_clusterware_session355_wp
Best practices oracle_clusterware_session355_wpBest practices oracle_clusterware_session355_wp
Best practices oracle_clusterware_session355_wpwardell henley
 
Install oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle homeInstall oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle homeSatishbabu Gunukula
 
A Deep Dive into ASM Redundancy in Exadata
A Deep Dive into ASM Redundancy in ExadataA Deep Dive into ASM Redundancy in Exadata
A Deep Dive into ASM Redundancy in ExadataEmre Baransel
 
Percona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupPercona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupVadim Tkachenko
 
tow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxtow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxjustinit
 
Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Frederic Descamps
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Emre Baransel
 
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...Ludovico Caldara
 
Oracle WebLogic Server 12c with Docker
Oracle WebLogic Server 12c with DockerOracle WebLogic Server 12c with Docker
Oracle WebLogic Server 12c with DockerGuatemala User Group
 

Tendances (20)

Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksCloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 Clusterware
 
Oracle flex asm & flex cluster
Oracle flex asm & flex clusterOracle flex asm & flex cluster
Oracle flex asm & flex cluster
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASMRACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
 
Rac questions
Rac questionsRac questions
Rac questions
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
 
Oracle Linux and Oracle Database - A Trusted Combination
Oracle Linux and Oracle Database - A Trusted Combination Oracle Linux and Oracle Database - A Trusted Combination
Oracle Linux and Oracle Database - A Trusted Combination
 
Best practices oracle_clusterware_session355_wp
Best practices oracle_clusterware_session355_wpBest practices oracle_clusterware_session355_wp
Best practices oracle_clusterware_session355_wp
 
Install oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle homeInstall oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle home
 
A Deep Dive into ASM Redundancy in Exadata
A Deep Dive into ASM Redundancy in ExadataA Deep Dive into ASM Redundancy in Exadata
A Deep Dive into ASM Redundancy in Exadata
 
Oss4b - pxc introduction
Oss4b   - pxc introductionOss4b   - pxc introduction
Oss4b - pxc introduction
 
Percona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupPercona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF Meetup
 
tow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxtow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualbox
 
Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012
 
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
 
Oracle WebLogic Server 12c with Docker
Oracle WebLogic Server 12c with DockerOracle WebLogic Server 12c with Docker
Oracle WebLogic Server 12c with Docker
 

Similaire à Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC

Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performanceRicky Zhu
 
UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?Zahid Anwar (OCM)
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
 
Why new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterWhy new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterSolarWinds
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overviewRobel Parvini
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldPaul Marden
 
342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.pptNithinRoy12
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Markus Michalewicz
 
Unleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucsUnleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucssolarisyougood
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011sandeep_tata
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 

Similaire à Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC (20)

Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
final_rac
final_racfinal_rac
final_rac
 
UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 
Why new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterWhy new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases faster
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Using Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and TuningUsing Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and Tuning
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overview
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open World
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
 
342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Unleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucsUnleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucs
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 

Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC

  • 1. Oracle Clusterware and Private Network Considerations- Practical Performance Management for Oracle RAC November 12, 2009 1 Guenadi Nedkov Jilevski
  • 2. Agenda Oracle RAC Fundamentals and Infrastructure. Analysis of Cache fusion Impact on RAC. Private Interconnect Considerations. Aggregation. Common known Problems and Symptoms - from cache fusion wait events and statistics. Diagnostics and Problem troubleshooting. Q and A November 12, 2009 2
  • 3. Oracle RAC Fundamentals and infrastructure Oracle RAC Architecture November 12, 2009 3
  • 4. Oracle rac fundamentals and infrastructure Function and Processes of Global Enqueue Services (GES) and Global Cache Services (GCS) November 12, 2009 4
  • 5. ORACLE rac FUNDAMENTAL And INFRASTRUCTURE Global Buffer Cache November 12, 2009 5
  • 6. Analyzing Cache fusion impact in rac The cost of block access and cache coherency is represented by: Global Cache services statistics Global Cache Services wait events The response time for cache fusion transfers is determined by: Overhead by the physical interconnect components IPC protocol GCS protocol The response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in a write-read or write-write situation November 12, 2009 6
  • 7.
  • 8. Current block request time = pin time + flash time + send time
  • 9. Latencies from V$SYSSTAT
  • 10.
  • 11. Analyzing Cache fusion impact on RAC Wait Events Views November 12, 2009 9
  • 12. Analyzing cache fusion impact on rac November 12, 2009 10 Global Cache Wait Events: Overview
  • 13. Analyzing cache fusion impact on rac November 12, 2009 11 2 – way Block Request: Example
  • 14. Analyzing cache fusion impact on rac November 12, 2009 12 3-way Block Request: Example
  • 15. Analyzing cache fusion impact on rac November 12, 2009 13 2-way Grant : Example
  • 16. Analyzing cache fusion impact on rac Enqueues are synchronous. Enqueues are global resources in RAC The most frequent wait are for: TX – row wait locks or ITL waits TM – Table Manipulation Enqueue TA – Transaction Recovery Enqueue SQ – Sequence generation Enqueue HW – High Watermark Enqueue US – Undo Segment Enqueue to manage undo segment extensions. The waits may constitute serious serialization point November 12, 2009 14 Global Enqueue Waits: Overview
  • 17. Analyzing cache fusion impact on rac Use V$SYSSTAT to characterize the workload. Use V$SESSSTAT to monitor important sessions. V$SEGMENT_STATISTICS includes RAC statistics. RAC relevant statistics group are: Global Cache Service statistics Global Enqueue Service statistics Statistics for messages send V$ENQUEUE_STATISTICS determines the enqueue with the highest impact. V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes. November 12, 2009 15 Session and System Statistics
  • 18. Private Interconnect Considerations November 12, 2009 16 IPC Configuration
  • 19. Private Interconnect Considerations November 12, 2009 17 Infrastructure Network Packet Processing
  • 20. Private Interconnect considerations November 12, 2009 18 Network Packet Processing: Layers, Queues and Buffers
  • 21. Private Interconnect Considerations Network between the nodes of a RAC cluster must be private. NIC to have the same name across all the nodes in the RAC cluster. Supported links: Gbe, IB Supported transport protocols: UDP, RDS Use multiple or dual-ported NICs for redundancy (HA), load balancing, load spreading and increase bandwidth with NIC bonding/aggregation. Large ( Jumbo ) Frames for Gbe recommended if the global cache workload requires it. Bandwidth requirements depend on several factors ( e.g. buffer cache size, #of CPUs per node, access patterns) and cannot be predicted precisely for every application For OLTP 1Gb/sec usually is sufficient for performance and scalability. DSS/DW systems should be designed with > 1Gb/sec capacity November 12, 2009 19 Infrastructure: Private Interconnect
  • 22. Private Interconnect considerations Important Settings: Negotiated top bit rate and full duplex mode NIC ring buffers Ethernet flow control settings CPU(s) receiving network interrupts Verify your setup: CVU does checking Load testing eliminates potential for problems AWR and ADDM give estimations of link utilization Buffer overflows, congested links and flow control can have severe consequences for performance Block access latencies increase when CPU(s) busy and run queues are long Immediate LMS scheduling is critical for predictable block access latencies when CPU > 80% busy Fewer and busier LMS processes may be more efficient. monitor their CPU utilization Caveat: 1 LMS can be good for runtime performance but may impact cluster reconfiguration and instance recovery time the default is good for most requirements. gcs_server_processes init parameter overrides defaults Higher priority for LMS is default The implementation is platform-specific November 12, 2009 20 Infrastructure: IPC configuration and Operating System
  • 23. Private interconnect considerations Interconnect should be dedicated non-routable subnet mapped to a single dedicated, non-shared VLAN If VLANs are ‘trunked’ the interconnect VLAN traffic should not exceed the access switch layer Minimize the impact of Spanning Tree events Monitor the switch(es) for congestion Avoid QoS definitions that may negatively impact interconnect performance NIC driver dependent – DEFAULTS GENERALLY SATISFACTORY Confirm flow control: rx=on, tx=off Confirm full bit rate (1000) for the NICs Confirm full duplex auto-negotiate Ensure NIC names/slots identical on all nodes Configure interconnect NICs on fastest PCI bus Ensure compatible switch settings 802.3ad on NICs = 802.3ad on switch ports MTU=9000 on NICs = MTU=9000 on switch ports FAILURE TO CONFIGURE THE NICS AND SWITCHES CORRECTLY WILL RESULT IN SEVERE PERFORMANCE DEGRADATION AND NODE FENCING November 12, 2009 21 The Interconnects, VLANs and NIC settings
  • 24. Private Interconnect considerations November 12, 2009 22
  • 25. Aggregation Cisco Etherchannel based 802.3ad AIX Etherchannel HPUX Auto Port Aggregation SUN Trunking, IPMP, GLD Linux Bonding (only certain modes) Windows NIC teaming Aggregation Methods Load balance/failover/load spreading spread on sends/serialize on receives Active/Standby Oracle Interconnect Requirement Both Send/Receive side load balancing NIC and Switch port failure detection November 12, 2009 23
  • 26. Common Problems and symptoms gc [current][cr] block lost: This event shows block losses during transfers. High values indicate IPC, downstream network problems. ‘request retry’ event is likely to be seen . global cache blocks corrupt: This statistic shows if any blocks were corrupted during transfers. If high values are returned for this statistic, there is probably an IPC, network or hardware problem. global cache open s and global cache open x: The initial access of a particular data block by an instance generates these events. The duration of the wait should be short, and the completion of the wait is most likely followed by a read from disk. This wait is a result of the blocks that are being requested and not being cached in any instance in the cluster database. Pre-load heavily used tables into the buffer caches. global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Interinstance block ping is when two instances exchange the same block back and forth. Reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster. global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred has not arrived at the requesting instance. Placeholder event. Look for other gc events. gc buffer busy: This event can be associated with a disk I/O contention for example slow disk I/O due to rogue query. Slow concurrent scans can cause buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event. Global cache access and serialization attributes to this event. Serialization is likely to be due to log flush time on another node or immediate block transfers. November 12, 2009 24 Wait events worth investigation
  • 27. Common Problems and symptoms congested:The events that contain ‘congested’ suggest CPU, LMS saturation, long running queries, swapping, network configuration issues. Maintain a global view and remember that symptom and cause can be on different instances. busy: The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization. Gc [current/cr] [2/3]-way –Increase private interconnects bandwidth and decreasing the private interconnects latency. Gc [current/cr] grant 2-way – Increase private interconnects bandwidth and decreasing the private interconnects latency. Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. We will look at it later. Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy. Variety of reasons for being busy just means cannot be sent immediately due to Oracle oriented reasons. Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. Gc [current/cr][failure/retry] - Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry. Investigate the IPC or downstream network problems. November 12, 2009 25 Wait events worth investigation
  • 28. Diagnostics and Problem Determination Tune for a single instance first Tune for RAC Instance Recovery Interconnect traffic Points of serialization can be exacerbated RAC–reactive tuning tools : Specific Wait events System and enqueue statistics Enterprise Manager performance pages AWR and ASH reports RAC – proactive tools AWR snapshots ADDM reports November 12, 2009 26
  • 29. Diagnostics and Problem Determination Application tuning is often the most beneficial. Resizing and tuning the buffer cache. Reducing the long full-table scans in OLTP systems. Using Automatic Segment Space Management. Increasing sequence caches. Using partitioning to reduce inter-instance traffic. Avoid unnecessary parsing. Minimizing locking usage. Removing unselective indexes. Configuring Interconnect properly. November 12, 2009 27 Most common RAC tuning tips
  • 30. Diagnostics and Problem Determination November 12, 2009 28
  • 31. Oracle Clusterware and Private Network Considerations- Practical Performance Management for Oracle RAC November 12, 2009 29 Questions & Answers