Introduction:
Starting from 12c TAF can failover DMLs to the available nodes on a RAC environment in case of any interruption happens to it including node/service/network failure. This feature is called Application Continuity or Transaction Guard.
Pre-requisites: [For 19c]
The applications should use Oracle Client 19c. Although the Oracle note Doc ID 2011697.1 mentioning that any 12.1+ Oracle Client will work, it didn't work for me while testing on a 19c DB.
Application Continuity works on
the assumption that the applications are well written in terms of connection
usage:
- Borrowing connections
and returning them to the connection pool instead of pinning
connections.
- If a statement cache at
the application server level is enabled it must be disabled when the replay is used. or use JDBC statement cache, which is compatible with Application Continuity.
- Additional CPU utilization will happen on the client-side in order to handle garbage collections.
Implementation:
# srvctl add service -database
sprint -service pssfztest_rpt -preferred sprint1,sprint2 -tafpolicy BASIC
-failovertype TRANSACTION -commit_outcome TRUE -failovermethod BASIC
-failoverretry 100 -stopoption IMMEDIATE -session_state DYNAMIC -role PRIMARY
-policy AUTOMATIC -clbgoal long -verbose -failover_restore LEVEL1
-replay_init_time 3600
Notes:
- Don't use replay_init_time
along with failoverdelay or you will get this error when trying to start the
service:
CRS-2632: There are no more
servers to try to place resource 'ora.sprint.pssfztest_gg.svc' on that would
satisfy its placement policy
- In order for the applications to
use the DML failover feature (Application Continuity) application users should
be granted execute permission on DBMS_APP_CONT: [For simplicity I'm granting it
for everyone but some applications don't work properly with this feature, so it's recommended to test your application and grant this permission to the users
of the applications that support Application Continuity]
SQL> grant execute on DBMS_APP_CONT to public;
- clbgoal=short is less stable
than clbgoal=long where failover retries can be exhausted before reaching its max
limit.
- clbgoal=short balance the
sessions between RAC nodes based on response time, while clbgoal=long balance
the sessions based on the total number of sessions on each node.
- The PRECONNECT option for
-tafpolicy parameter is deprecated in 19c.
- If you set -failovertype =
TRANSACTION, then you must set -commit_outcome to TRUE.
- For -session_state Oracle
recommends to set it to DYNAMIC for most applications, to use the default session
settings (NLS settings, optimizer preferences,..) after the session fails over.
- replay_init_time: Specifies
the time in seconds after which replay (failover) will not happen. [It's set to 3600sec =1 hour above]
The following activities if happened can cause the transaction to failover without being disrupted [transaction will hang for a few seconds till the connectivity get restored on the available nodes]:
- instance crash.
- Partial Network disruption.
- OS kill -9 of the instance main processes (PMON/SMON).
- OS kill -STOP followed by kill -9 of the same session.
- shutdown immediate (from SQLPLUS console).
- shu abort (from SQLPLUS console).
- srvctl stop service -d sprint -i sprint1
- srvctl stop service -d sprint -i sprint1 -force
- srvctl stop instance -d sprint -i sprint1 -failover
- ALTER SYSTEM KILL SESSION command.
- ALTER SYSTEM DISCONNECT SESSION command.
The following activities will terminate the DML transaction WITHOUT failing them over but the session itself will re-connect automatically: [If performed on the node where the session is connected]
- OS kill -9 of the session PID.
- ALTER SYSTEM CANCEL SQL '<SID>,<SERIAL#>';
- srvctl stop instance -d sprint -i sprint1 -force
- crsctl stop cluster
- crsctl stop crs
Conclusion:
- Application Continuity feature can let you carry out activities like software patching, hardware/network maintenance with real ZERO downtime.
- Before using Application Continuity feature, you have to make sure that your applications are compatible with this feature by testing all the scenarios you may go through. It will be wise if you consult your application vendor before implementing this feature.
Using this feature blindly without proper testing may result in unexpected application behavior.
- Applications should not use the default database service (which has the same name as DB_NAME/DB_UNIQUE_NAME of the database) as this service is not supported by the high availability features.
References:
This comment has been removed by the author.
ReplyDeleteVery useful
ReplyDeleteThankyou Mahmmoud that's a great explanation on this HA feature. Also for your warning about correct service name usage.
ReplyDelete