UCP, GridLink, TAF, AC, TAC, FAN… The configuration of Oracle Drivers for application high availability is not an easy job. The developers often care about the minimal working configuration, while the DBAs are busy with the operations. In this session I will try to demystify application server’s connectivity to the database and give a direction toward the highest availability, using Real Application Clusters and new Oracle features like TAC and CMAN TDM.
2. Ludovico Caldara
• Principal consultant @ Trivadis Lausanne
• Two decades of DBA experience (Not only Oracle)
• ITOUG co-founder
• Active blogger and speaker
• Italian living in Switzerland
• Oracle ACE Director
@ludodba ludovicocaldara.net
12. Disclaimer
●Some oversimplifications
●A very complex topic
●Requires DBA and developer skills
●Assume you know some basic concepts
– High availability and failover concepts
– Connections to database
– Basic NET configurations
(SCAN, Listener, Services, TNS)
●Assume you have recent DB and client (>=12.2)
13. "Failure happens all the time.
It happens every day in practice.
What makes you better
is how you react to it."
― Mia Hamm
14. Factors that influence HA
Too many!
●Network topology
●OS type and configuration
●DB version and service configuration
●Client version and type
●Application design / exception handling
15. Factors that influence HA
Too many!
●Network topology
●OS type and configuration
●DB version and service configuration
●Client version and type
●Application design / exception handling
Our mission today
16. Factors that influence HA
Too many!
●Network topology
●OS type and configuration
●DB version and service configuration
●Client version and type
●Application design / exception handling
Good white-paper:
Oracle Client Failover - Under the Hood
By Robert Bialek (Trivadis)
18. Database Services
Virtual name for a database endpoint
HR_SVC HR_SVC
CRM_SVC REP_SVC
Registered with
the listener
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
21. Database Services
The DBA can create services with:
● srvctl add service
● dbms_service.create_service() PL/SQL procedure.
Both methods have parameters for HA
●Hint: HA at service level is superfluous if the client is not configured properly
Did you know? Parameter service_names is deprecated!
32. Planned Maintenance
●Service relocation: new sessions go to instance 2
●Problem: what about existing sessions?
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
33. Planned Maintenance
●Service relocation: new sessions go to instance 2
●Problem: what about existing sessions?
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
34. How to drain sessions
●You need to know that the service is being relocated
●Use Fast Application Notification (FAN)!
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
ONS
35. How to drain sessions
●You need to know that the service is being relocated
●Use Fast Application Notification (FAN)!
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
ONS
register
connect
36. How to drain sessions
●You need to know that the service is being relocated
●Use Fast Application Notification (FAN)!
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
ONS
stop
notification!
CRM_SVCstart
37. How to drain sessions
●You need to know that the service is being relocated
●Use Fast Application Notification (FAN)!
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
ONS
CRM_SVC
disconnect when the transaction
is over and reconnect
ONS
38. FAN at database side
●ONS is there by default with Grid Infrastructure
●Default remote port 6200
●18c: in-band notifications
●FAN/enabled Service
srvctl add service –db orcl –service hr_svc
-rlbgoal [SERVICE_TIME | THROUGHPUT] # for load balancing advisory
-notification TRUE # for OCI/ODP.net connections
srvctl relocate service –db orcl –service hr_svc
-oldinst orcl1 -newinst orcl2
-drain_timeout 10 # let some time for sessions to drain
# switch –force not specified, sessions are not killed
39. FAN at client side
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.ServiceDownEvent;
[...]
FanManager fanMngr = FanManager.getInstance();
onsProps.setProperty("onsNodes", “node1:6200,node2:6200");
fanMngr.configure(onsProps);
FanSubscription sub = fanMngr.subscribe(props);
sub.addListener(new FanEventListener() {
public void handleEvent(ServiceDownEvent event) {
System.out.println("Service down event");
System.out.println(event.getReason());
// handle the event
}
});
40. FAN at client side
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.ServiceDownEvent;
[...]
FanManager fanMngr = FanManager.getInstance();
onsProps.setProperty("onsNodes", “node1:6200,node2:6200");
fanMngr.configure(onsProps);
FanSubscription sub = fanMngr.subscribe(props);
sub.addListener(new FanEventListener() {
public void handleEvent(ServiceDownEvent event) {
System.out.println("Service down event");
System.out.println(event.getReason());
// handle the event
}
});
41. Fast Connection Failover (FCF)
●Pre-configured FAN integration
●Works with connection pools
●The application must be pool aware
– (borrow/release)
●The connection pool leverages FAN events to:
– Remove quickly dead connections on a DOWN event
– (opt.) Redistribute the load on a UP event
42. Fast Connection Failover (FCF)
●UCP (Universal Connection Pool, ucp.jar) and WebLogic Active GridLink
handle FAN out of the box.
No code changes! Just enable FastConnectionFailoverEnabled.
●Third-party connection pools can implement FCF
– If JDBC driver version >= 12.2
– simplefan.jar and ons.jar in CLASSPATH
– Connection validation options are set in pool properties
– Connection pool can plug javax.sql.ConnectionPoolDataSource
– Connection pool checks connections at borrow/release
43. Fast Connection Failover (FCF)
●UCP (Universal Connection Pool, ucp.jar) and WebLogic Active GridLink
handle FAN out of the box.
No code changes! Just enable FastConnectionFailoverEnabled.
●Third-party connection pools can implement FCF
– If JDBC driver version >= 12.2
– simplefan.jar and ons.jar in CLASSPATH
– Connection validation options are set in pool properties
– Connection pool can plug javax.sql.ConnectionPoolDataSource
– Connection pool checks connections at borrow/release
44. Fast Connection Failover (FCF)
●OCI Connection Pool handles FAN events as well
– Need to configure oraaccess.xml properly in TNS_ADMIN
– Python’s cx_oracle, PHP oci8, etc. have native options
●ODP.Net: just set "HA events = true;pooling=true"
45. Session Draining in 18c
●Database invalidates connection at:
–Standard connection tests for connection validity
(conn.isValid(), CheckConStatus, OCI_ATTR_SERVER_STATUS)
–Custom SQL tests for validity (DBA_CONNECTION_TESTS)
– SELECT 1 FROM DUAL
– SELECT COUNT(*) FROM DUAL
– SELECT 1
– BEGIN NULL;END
– Add new:
execute dbms_app_cont_admin.add_sql_connection_test(
'select * from dual', service_name);
46. “Have we implemented FAN/FCF correctly?”
●TEST, TEST, TEST
●Relocate services as part of your CI/CD
●Application ready for planned maintenance
=> happy DBA, Dev, DevOps
47. Why draining?
Best solution for hiding planned maintenance
No draining
Killing persisting sessions
Unplanned from application perspective
49. Unplanned Maintenance (failover)
●CRM sessions exist on instance 1
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
50. Unplanned Maintenance (failover)
●CRM sessions exist on instance 1
●The instance crashes. What about running sessions/transactions?
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
51. Unplanned Maintenance (failover)
●CRM sessions exist on instance 1
●The instance crashes. What about running sessions/transactions?
●(Any maintenance that terminate sessions non-transactional)
CRM_SVC
Real Applications Cluster / Data GuardReal Applications Cluster / Data Guard
52. Transparent Application Failover (TAF)
●For OCI drivers only
●Automates reconnect
●Allows resumable queries (session state restored in 12.2)
●Transactions and PL/SQL calls not resumed (rollback)
53. Transparent Application Failover (TAF)
●For OCI drivers only
●Automates reconnect
●Allows resumable queries (session state restored in 12.2)
●Transactions and PL/SQL calls not resumed (rollback)
Oracle Net
Fetched
54. Transparent Application Failover (TAF)
●For OCI drivers only
●Automates reconnect
●Allows resumable queries (session state restored in 12.2)
●Transactions and PL/SQL calls not resumed (rollback)
Oracle Net
Fetched
Lost
55. Transparent Application Failover (TAF)
●For OCI drivers only
●Automates reconnect
●Allows resumable queries (session state restored in 12.2)
●Transactions and PL/SQL calls not resumed (rollback)
Oracle Net
Fetched
Lost
Discarded
56. Transparent Application Failover (TAF)
●For OCI drivers only
●Automates reconnect
●Allows resumable queries (session state restored in 12.2)
●Transactions and PL/SQL calls not resumed (rollback)
Oracle Net
Fetched
Lost Fetched
Discarded
58. Fast Connection Failover and FAN
●Like for planned maintenance, but…
– Connection pool recycles dead connections
– Application must handle all the exceptions
●FAN avoids TCP timeouts!
59. Application Continuity (AC)
●Server-side Transaction Guard (included in EE)
–Transaction state is recorded upon request
●Client-side Replay Driver
–Keeps journal of transactions
–Replays transactions upon reconnect
JDBC thin 12.1, OCI 12.2
60. Application Continuity (AC)
• AC with UCP: no code change
• AC without connection pool: code change
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl");
...
conn = pds.getConnection(); // Implicit database request begin
// calls protected by Application Continuity
conn.close(); // Implicit database request end
OracleDataSourceImpl ods = new OracleDataSourceImpl();
conn = ods.getConnection();
...
((ReplayableConnection)conn).beginRequest(); // Explicit database request begin
// calls protected by Application Continuity
((ReplayableConnection)conn).endRequest(); // Explicit database request end
61. Application Continuity (AC)
srvctl add service –db orcl –service hr
-failovertype TRANSACTION # enable Application Continuity
-commit_outcome TRUE # enable Transaction Guard
-failover_restore LEVEL1 # restore session state before replay
-retention 86400 # commit outcome retained 1 day
-replay_init_time 900 # replay not be initiated after 900 seconds
-notification true
Service definition:
Special configuration to retain mutable values at replay:
GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>;
GRANT KEEP DATE TIME TO <USER>;
GRANT KEEP SYSGUID TO <USER>;
62. Transparent Application Continuity (TAC)
●“New” in 18c for JDBC thin, 19c for OCI
●Records session and transaction state server-side
●No application change
●Replayable transactions are replayed
●Non-replayable transactions raise exception
●Good driver coverage but check the doc!
●Side effects are never replayed
63. Transparent Application Continuity (TAC)
srvctl add service –db orcl –service hr
-failover_restore AUTO # enable Transparent Application Continuity
-failovertype AUTO # enable Transparent Application Continuity
-commit_outcome TRUE # enable Transaction Guard
-retention 86400 # commit outcome retained 1 day
-replay_init_time 900 # replay not be initiated after 900 seconds
-notification true
Service definition:
Special configuration to retain mutable values at replay:
GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>;
GRANT KEEP DATE TIME TO <USER>;
GRANT KEEP SYSGUID TO <USER>;
64. Still not clear?
●Fast Application Notification to drain sessions
●Application Continuity for full control
(code change)
●Transparent Application Continuity for good HA
(no code change)
67. Session Failover with TDM
CLIENT
cman
CDBA
PDB1
• Client connects to cman:1521/pdb1
CDBA
68. Session Failover with TDM
CLIENT
cman
CDBA
PDB1
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
CDBA
69. Session Failover with TDM
CLIENT
cman
CDBA
PDB1
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
CDBA
70. Session Failover with TDM
CLIENT
cman
CDBA
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
• The next request is executed on the
surviving instance
CDBA
PDB1
71. Session Failover with TDM
CLIENT
cman
CDBA
• Client connects to cman:1521/pdb1
• Cman opens a connection to pdb1
• Upon PDB/service relocate, cman detects
the stop and closes the connections at
transaction boundaries
• The next request is executed on the
surviving instance
• The connection client-cman is intact, the
client does not experience a
disconnection
CDBA
PDB1