More Related Content
Similar to New availability features in oracle rac 12c release 2 anair ss (20)
New availability features in oracle rac 12c release 2 anair ss
- 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 12c Release 2
Anil Nair
Sr. Principal Product Manager,
Oracle Real Application Clusters (RAC)
@RACMasterPM, @OracleRACpm
http://www.linkedin.com/in/anil-nair-01960b6
http://www.slideshare.net/AnilNair27/
New Availability Features
- 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
- 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Introduction
Database Availability Features
Storage & System Availability Features
Management and Efficiency Availability Features
1
2
3
4
4
- 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Introduction1
2
3
4
5
- 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Lets discuss Availability
• Wikipedia defines availability that a system will
work as required.
• Industry uses the term 5 nines to measure the
availability of a system
• Provisions for planned and un-planned downtime
• Availability for Hardware components
– MTBF for CPU is high
– Bonding networks protects against interface failures
– RAID protects against individual disk failures
• Most server components are hot pluggable
Confidential – Oracle Internal/Restricted/Highly Restricted 6
Availability Downtime per
Year
99.8 17.52 hrs
99.9 8.76 hrs
99.99 52.5 min
99.999 5.26 min
- 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Maximum Availability Architecture (MAA)
Best Practices Blueprints & White Papers
Service-level Oriented Reference Architectures
Solutions to Prevent Business Outages
Protect Oracle Data at various levels
Feedback into Product Features
7
25+ Years of Lessons learned in High Availability, Data Protection and Disaster Recovery
Production
Database
Replication RAC
ASM
Flashback
RMAN OSB
Data Guard
Active Data Guard
GoldenGate
Enterprise Manager
GDS
Application
Continuity
Online
Redefinition
Edition based
Redefinition
http://oracle.com/goto/maa
On-Premises, Cloud, Hybrid
Copy
- 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
What isn’t obvious is?
Confidential – Oracle Internal/Restricted/Highly Restricted 8
Availability Scalability
Availability & Scalability
requirements are inter-twined
- 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Oracle RAC family of products provide
availability and Scalability to Database,
Application & Client tier
• No code changes are required
• Consistent behavior on-Prem, Public
Cloud, Private or Hybrid Cloud
• Optimized for Engineered systems
Other solutions require Architecture and Design changes
- 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 10
Available & Scalable without Application code change(s)
0
5000
10000
15000
20000
25000
30000
35000
40000
4 8 32 48 64 80
2035
4010
15520
22416
30016
37040
# of Cores across RAC Nodes
Users
2 Nodes
3 Nodes
4 Nodes
5 Nodes
SAP certified SD Benchmark results
- 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 11
Seamless Integration without Application code change(s)
MultiTenant In MemoryData Guard
- 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 12
Oracle Real Application Cluster Family of Solutions
• Integrated set of tools that work
cohesively to provide High Availability
and Scalability
• Functionality provided by Oracle RAC
Family of Solutions can be used by
licensed Oracle RAC or Oracle RAC One
Node customers without any additional
charge
- 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Database Availability Features
1
2
3
4
13
- 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cache Fusion
14
A brief Refresher
• Maximum 3-way communication
• Dynamic Resource Management
(DRM) attempts to optimize
down to 2-way communication by
moving the master to the
instance where the resource is
frequently accessed
- 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC Always Chooses the “Right Path”
15
• RAC determines the optimal path
to serve blocks – network or disk
• SSDs and NVMe storage technology
continue to drive down latency
• e.g. flash storage may provide better
access times to data than the private
network under high load
• RAC takes those statistics into account
Query
Block
Block sent over
private network
- 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC Always Chooses the “Right Path”
16
• RAC “Cache Fusion” maintains
statistics on Network latency and
Disk Latency
• Private network or Disk is
dynamically chosen during runtime
• AWR report contains detailed
statistics
Query
Block
Network
congestion
Block is read
from Disk
1
2
- 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
1 0 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1
1 0 1 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 1
0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 0 1
17
• Using Oracle Multitenant, PDBs can be opened
as singletons (in one database instance only), in
a subset of instances or all in instances at once.
• If certain PDBs are only opened on some
instances, Pluggable Database Isolation
– improves performance by
• Reducing DLM operations for
PDBs not open in all instances.
• Optimizing block operations based
on in-memory block separation.
– improves availability by
• Ensuring that instance failures of instances
only hosting singleton PDBs will not impact
other instances of the same RAC-based CDB.
Optimized Singleton Workload Scaling
Pluggable Database Isolation
NEW IN 12.2
- 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18
• Service-oriented Buffer Cache Access over
time determines the data (on database
object level) accessed by the service. This
information
– Is persisted in the database.
– Is used to improve data access performance
(e.g. do not manage data of a service in an
instance that does not host the service).
Optimized Singleton Workload Scaling
Service-oriented Buffer Cache Access
- 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Cache Fusion maintains a Service to
Buffer Cache relationship
– Tracks which service causes row(s)
to be read into the buffer cache
• This statistic is used to
– Master the resource only on those
nodes where the service is active
• Optimized “Resource Master” Dispersion
– Pre-Warm the cache during service
failover amid planned downtime
Service-Oriented Buffer Cache Access
19
NodeA
Oracle GI
Oracle RAC
NodeB
Oracle GI
Oracle RAC
cons_1 cons_2
- 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Performance Outliers
20
Hard to find cause
- 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Introducing LMS CR Slaves
21
helps mitigate Performance outliers
• In previous releases, LMS work on incoming consistent read requests in
sequential fashion
• Sessions requesting consistent blocks that require applying lot of undo
may cause LMS to be busy
• Starting with Oracle RAC 12c Release 2, LMS offloads work to ‘CR slaves’
if the amount of UNDO to be applied exceeds a certain, dynamic
threshold
• Default is 1 slave and additional slaves are spawned as needed
- 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• OLTP sessions require remote
undo header lookups to find
– If a transaction has committed
– Block cleanouts
• To reduce remote look ups, each instance
maintains a hash table of recent
transactions (active & committed)
• Undo Header Hash table improves
scalability by eliminating remote look ups
Introducing Undo Header Hash Table
22
- 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Data Guard Standby Redo Apply
• In a typical RAC Primary and RAC standby, Only one node of the standby
can apply redo
• Other RAC nodes of the standby instance are typically in waiting mode
even if the apply is CPU bound.
• Other instance only takes over redo apply only if the instance applying redo
crashes
- 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
Data Guard Standby Redo Apply
- 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Multi-Instance Redo Apply
• Parallel, multi-instance recovery means “the standby DB will keep up”
– Standby recovery - utilizes CPU and I/O across all nodes of RAC standby
– Up to 3500MB+/sec apply rate on an 8 node RAC
• Multi-Instance Apply runs on all MOUNTED instances or all OPEN Instances
• Exposed in the Broker with the ‘ApplyInstances’ property on standby
Utilize all RAC nodes on standby to apply Redo
recover managed standby database disconnect using instances 4;
- 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
Multi-Instance Redo Apply
- 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 27
Near Zero Downtime Reconfiguration with Recovery Buddy
4x
faster! *
• Recovery Buddy
• Tracks blocks changes on buddy
instance
• Quickly identifies blocks requiring
recovery during reconfiguration
• Allows rapid processing of new
transactions
• * up to 4 times faster with Recovery
Buddy and Optimized (Singleton)
Reconfiguration Time
- 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Detect Node/Instance Hang/Death
• Evict the dead/hung Instance/Node
• Elect a Recovery Master (RM)
– One of the surviving instance process (SMON)
will get lock and be elected RM
• RM will then
– read redo of evicted instance
– apply recovery
– signal completion
High Level Reconfiguration Stages
28
Detect
Evict
Elect
Recovery
Read
Redo
Apply
Recovery
- 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Recovery Buddy feature optimizes
reconfiguration
– Buddy Instances eliminate the
“Elect Recovery Master” phase
– Redo-read is optimized
via memory-reads
– Apply recovery is optimized as
switching between read and
writes is no longer required
Reduced Reconfiguration time with “Recovery Buddy”
29
Detect
Evict
Elect
Recovery
Read
Redo
Apply
Recovery
- 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
1. Buddy Instance mapping is simple
( 1 2, 2 3, 3 4, 4 1)
2. Recovery buddies are set
during instance startup
3. RMS0 on each recovery buddy
instance maintains an in-memory
area of redo log change
4. The in-memory area is used during
recovery therefore eliminating the
need to physically read the redo
Buddy Instances – Under the Hood
30
1. Inst1 is recovery buddy for Inst2
2. Inst2 is recovery buddy for 3 and so-on
3. Recovery buddy mapping will change as new
instances join or leave
For e.g If inst3 crashes, a new recovery
buddy will be assigned to Inst 4
Inst 1 Inst 2 Inst 3 Inst 4
Recovery
Buddy 2
Recovery
Buddy 3
Recovery
Buddy 4
MyCluster
Recovery
Buddy 1
- 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Overlooked and Underestimated – Hang Manager
• Customers experience database hangs for a variety of reasons
– High system load, workload contention, network congestion or errors
• Before Hang Manager was introduced with Oracle RAC 11.2.0.2
– Oracle required information to troubleshoot a hang - e.g.:
• System state dumps
• For RAC: global system state dumps
– Customer usually had to reproduce with additional events
31
Why is a Hang Manager required?
- 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32
Hang Manager - Workings
• Always on - Enabled by default
• Reliably detects database hangs
• Attempts to resolves them
• Considers QoS policies during Hang
Resolution
• Logs all detected hangs and their
resolutions
• New SQL interface to configure sensitivity
(Normal/High)
Hang
Resolution
Analyze
Evaluate
Detect
Session
Hung?
- 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Hang Manager auto-tunes itself by
periodically collecting instance-and
cluster-wide hang statistics
• Metrics like Cluster Health/Instance
health is tracked over a moving average
• This moving Average considered during
resolution
• Holders waiting on SQL*Net break/reset
are fast tracked
Hang Manager Optimizations
33
- 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Early Warning exposed via (V$ view)
• Sensitivity can be set higher, if the user
feels the default level is too conservative.
• Hang Manager behavior can be further
fine-tuned by setting appropriate QoS
policies
DBMS_HANG_MANAGER.Sensitivity
34
Hang
Sensitivity
Level
Description Note
NORMAL Hang Manager uses its
default internal operating
parameters to try to meet
typical requirements for any
environments.
Default
HIGH Hang Manager is more alert
to sessions waiting in a chain
than when sensitivity is in
NORMAL level.
- 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Storage & system Availability Features
1
2
3
4
35
- 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Starting with Oracle RAC 12c Rel. 2,
it is now possible to run read-only
workloads on instances running on
Leaf Nodes/Reader Nodes
• This requires that Leaf nodes be
connected to storage ***
• A Reader Node failure does not impact
the overall database activity, making it
easy to scale to hundreds of nodes.
Read only instance on Leaf Nodes
36
DSS Services on
Reader Nodes
OLTP Services
on Hub Nodes
- 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Create
– CREATE LOCAL TEMPORARY TABLESPACE FOR RIM
temp_rim TEMPFILE ‘/loc/temp_file_rim’
EXTENT Management local UNIFORM SIZE 1M AUTOEXTEND ON
– One Bigfile per Tablespace
• Configure
– Alter user scott local temporary tablespace
temp_rim;
• Result
– local temporary will be used when user is connected
to Reader Node instance
– Shared Temporary will be used when user is
connected to Read Write instance
Optionally Configure a local Temporary Tablespace
37
User Shared temp
Read Write
Instance
Read Only
Instance
N
Continue
SQL
Processing
User Local temp
N
DB Shared temp
N
DB Local temp
User Local temp
N
DB Local temp
N
User Shared temp
N
DB Shared temp
Session(s)
- 38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Enhanced HTAP Capabilities
38
• Services can be configured redirect
OLTP/RW sessions to HUB Nodes while
DSS/RO sessions can be redirected to
Leaf Nodes
• Read mostly instances can benefit from
parallel queries which can be distributed
across Leaf nodes for massive scalability
• RAC reader nodes can be used with
– In-Memory Database
• Exadata can cache EHCC tables on Flash
Cache in pure Columnar format
Parallel Query
Coordinator
Parallel Query
Processes
Hybrid Transactional and Analytical Processing
- 39. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 39
ASM Flex Disk Groups
Diskgroup
DB3 : File 1
DB2 : File 2 DB1 : File 3
DB3 : File 3
DB2 : File 1
DB1 : File 1
DB1 : File 2
DB2 : File 3DB3 : File 2
DB2 : File 4
Flex Diskgroup
DB1
File 1
File 2
File 3
DB2
File 1
File 2
File 3
File 4
DB3
File 1
File 2
File 3
Database-oriented Storage Management for additional flexibility and availability
File Group
- 40. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Flex Diskgroup
DB1
File 1
File 2
File 3
DB2
File 1
File 2
File 3
File 4
DB3
File 1
File 2
File 3
40
Database-oriented Storage Management for more flexibility and availability
ASM Flex Disk Groups
Quota
DB3
File 1
File 2
File 3
12.2 Flex Disk Group Organization
• Flex Diskgroups enable
– Quota Management - limit the space
databases can allocate in a diskgroup and
thereby improve the customers’ ability to
consolidate databases into fewer DGs
– Redundancy Change – utilize lower
redundancy for less critical databases
– Shadow Copies (“split mirrors”) to easily
and dynamically create database clones
for test/dev or production databases
- 41. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 41
• Pre-12.2, node eviction follows
a rather “ignorant” pattern
– Example in a 2-node cluster: The node
with the lowest node number survives.
• Customers must not base their
application logic on which node
survives the split brain.
– As this may(!) change in future releases
Node Eviction Basics
✔
1 2
- 42. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 42
• Node Weighting is a new feature that considers
the workload hosted in the cluster during fencing
• The idea is to let the majority of work survive,
if everything else is equal
– Example: In a 2-node cluster, the node hosting the
majority of services (at fencing time) is meant to survive
Node Weighting in Oracle RAC 12c Release 2
Idea: Everything equal, let the majority of work survive
✔
1 2
- 43. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
A three node cluster
will benefit from “Node Weighting”, if
three equally sized sub-clusters are
built as s result of the failure, since two
differently sized sub-clusters are not
equal.
43
Secondary failure consideration can
influence which node survives.
Secondary failure consideration will
be enhanced successively.
A fallback scheme
is applied if considerations do not
lead to an actionable outcome.
Let’s Define “Equal”
✔
Public network
card failure.
“Conflict”.
- 44. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CSS_CRITICAL
can be set on various levels /
components to mark them as
“critical” so that the cluster will try to
preserve them in case of a failure.
44
CSS_CRITICAL will be honored
if no other technical reason prohibits
survival of the node which has at
least one critical component at the
time of failure.
A fallback scheme is applied if
CSS_CRITICAL settings do not lead to
an actionable outcome.
CSS_CRITICAL – Fencing with Manual Override
crsctl set server
css_critical {YES|NO}
+ server restart
srvctl modify database -help
|grep critical
…
-css_critical {YES | NO}
Define whether the database
or service is CSS critical
✔
Node eviction
despite WL; WL
will failover.
“Conflict”.
- 45. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 45
Domain Services Cluster
Cluster Domain
ASM
IO Service
ACFS
Services
ASM
Service
Database
Member Cluster
Uses ASM
Service
Database
Member Cluster
Uses ASM IO
Service of DSC
Trace File
Analyzer
(TFA)
Service
Mgmt
Repository
(GIMR)
Service
Application
Member Cluster
GI only
Database
Member Cluster
Uses local ASM
Shared ASM
Additional
Optional
Services
Rapid Home
Provisioning
(RHP)
Service
Private
Network
SAN
Storage
ASM
Network
Storage
1 2 3 4
- 46. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Management and Efficiency availability Features
1
2
3
4
46
- 47. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Recurring tasks
– Software Installation
– Storage management
– Tuning and Diagnosis
What if
If these activities are performed only
once
and can then be re-used multiple times?
… allowing you to save many hours
performing these tiring tasks?
47
Efficiency and management
- 48. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Rapid Home Provisioning, Scaling, Patching and Upgrade
• Provision and Patch 11.2, 12.1, 12.2
Grid Infrastructure & Databases
• Can perform addNode and deleteNode
operations
• Standardizes Customer software installs
via the use of Gold Images
• Provisions Grid Infrastructure, RAC, RAC
One, Single Instance and Applications
- 49. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Health Framework
Powered by Machine Learning
49
• Integrates existing and new tools
& runs them as components - 24/7
• Speeds up issue diagnosis
and recovery
• Discovers potential issues and
notifies or takes corrective actions
- 50. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 50
The DSC Management Service
Applied Machine Learning for Database Diagnostics
• Efficient diagnosis using Machine Learning
• Automatically performs corrective actions to
prevent possible issues
• Provides simple alerts & recommendations for
issues that require manual intervention
Subject Matter
Expert
ASH
ML
Knowledge
Extraction
Model
Generation
Human
Supervision
Application
Optimized
Models
Feedback