Wednesday, September 11, 2019

RAC Database Fail To Start Due To "terminating the instance due to error 119" error

Problem:
On a Linux 7, after system reboot I have an 11.2.0.3 RAC Database failed to start due to this error in the alertlog:

USER (ospid: 13324): terminating the instance due to error 119
Instance terminated by USER, pid = 13324

Cause:
While investigating I figured out that the SCAN name "rac1-scan" which is used in "remote_listener" initialization parameter is not pingable:
$ ping rac1-scan
ping: unknown host rac1-scan

When checked /etc/resolv.conf file I found the DNS server entry got removed:
$ cat /etc/resolv.conf
# Generated by NetworkManager
search preprod.mycompany.com


# No nameservers found; try putting DNS servers into your
# ifcfg files in /etc/sysconfig/network-scripts like so:
#
# DNS1=xxx.xxx.xxx.xxx
# DNS2=xxx.xxx.xxx.xxx
# DOMAIN=lab.foo.com bar.foo.com

Solution:
Luckily I always run a script called configuration_baseline.sh to keep a backup of the critical system files entries inside one log file, it helps to restore the entries later whenever any bad change happens to these files.

I restored the original entries of /etc/resolv.conf which holds the right DNS server name in my environment then managed to start the RAC DB successfully.

$ cat /etc/resolv.conf
# Generated by NetworkManager
search preprod.mycompany.com
nameserver 10.100.22.10

Conclusion:
terminating the instance due to error 119 error is mainly related to "remote_listener" initialization parameter setting, if this setting is messed up it can prevent the RAC instance from starting up.

Recommendation:
Always keep a copy of /etc/resolv.conf or immune the file from getting reset by the system after system reboot by using the following command:
chattr +i /etc/resolv.conf

You can use this script to help you with keeping the entries of your Linux & Oracle critical files saved somewhere, in case you will have a need to restore them later:

Tuesday, September 10, 2019

OPATCHAUTO-72083: Performing bootstrap operations failed OPATCHAUTO-72146: Failed to load patch

Problem:
While patching 19.3 GRID_HOME with July2019 RU patch 19.4 I got this error:

[root@fzppon06vs1n bin]#  export PATCH_LOCATION=/u01/oracle/RU_PATCHES/29708769
[root@fzppon06vs1n bin]# $GRID_HOME/OPatch/opatchauto apply $PATCH_LOCATION -oh $GRID_HOME

OPatchauto session is initiated at Mon Sep  9 11:00:57 2019
OPATCHAUTO-72083: Performing bootstrap operations failed.
OPATCHAUTO-72083: The bootstrap execution failed because OPATCHAUTO-72146: Failed to load patch
OPATCHAUTO-72146: Failed while collecting patch information for patch /u01/oracle/RU_PATCHES/29708769.
OPATCHAUTO-72146: Check the log for more information..
OPATCHAUTO-72083: Fix the reported problem and re-run opatchauto.

OPatchauto session completed at Mon Sep  9 11:01:02 2019
Time taken to complete the session 0 minute, 5 seconds

opatchauto bootstrapping failed with error code 255.

Solution:
The problem was with the RU patch binaries, looks the files were not copied properly, extracting the binaries properly from the zip file has fixed the problem!
unzip -d /u01/oracle/RU_PATCHES  p29708769_190000_Linux-x86-64.zip

Initially, I copied the "extracted" RU binary files from another node, seems they were incomplete.

Prerequisite check "CheckActiveFilesAndExecutables" failed

Problem:
While patching the GRID INFRASTRUCTURE home with July2019 RU patch I got this error:

[root@fzppon06vs1n 29708769]# $GRID_HOME/OPatch/opatchauto apply $PATCH_LOCATION -oh $GRID_HOME

OPatchauto session is initiated at Mon Sep  9 11:14:23 2019

System initialization log file is /u01/grid/12.2.0.3/cfgtoollogs/opatchautodb/systemconfig2019-09-09_11-14-29AM.log.

Session log file is /u01/grid/12.2.0.3/cfgtoollogs/opatchauto/opatchauto2019-09-09_11-15-43AM.log
The id for this session is TV9U

Executing OPatch prereq operations to verify patch applicability on home /u01/grid/12.2.0.3
Patch applicability verified successfully on home /u01/grid/12.2.0.3


Bringing down CRS service on home /u01/grid/12.2.0.3
CRS service brought down successfully on home /u01/grid/12.2.0.3


Start applying binary patch on home /u01/grid/12.2.0.3
Failed while applying binary patches on home /u01/grid/12.2.0.3

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : fzppon06vs1n->/u01/grid/12.2.0.3 Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/grid/12.2.0.3, host: fzppon06vs1n.
Command failed:  /u01/grid/12.2.0.3/OPatch/opatchauto  apply /u01/oracle/RU_PATCHES/29708769 -oh /u01/grid/12.2.0.3 -target_type cluster -binary -invPtrLoc /u01/grid/12.2.0.3/oraInst.loc -jre /u01/grid/12.2.0.3/OPatch/jre -persistresult /u01/grid/12.2.0.3/OPatch/auto/dbsessioninfo/sessionresult_fzppon06vs1n_crs.ser -analyzedresult /u01/grid/12.2.0.3/OPatch/auto/dbsessioninfo/sessionresult_analyze_fzppon06vs1n_crs.ser
Command failure output: 
==Following patches FAILED in apply:

Patch: /u01/oracle/RU_PATCHES/29708769/29834717
Log: /u01/grid/12.2.0.3/cfgtoollogs/opatchauto/core/opatch/opatch2019-09-09_11-19-51AM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Prerequisite check "CheckActiveFilesAndExecutables" failed. 

After fixing the cause of failure Run opatchauto resume

]
OPATCHAUTO-68061: The orchestration engine failed.
OPATCHAUTO-68061: The orchestration engine failed with return code 1
OPATCHAUTO-68061: Check the log for more details.
OPatchAuto failed.

OPatchauto session completed at Mon Sep  9 11:20:28 2019
Time taken to complete the session 6 minutes, 6 seconds

 opatchauto failed with error code 42

The environment is two nodes RAC 19c, while the patching worked fine on node1 it failed with above error on node2. 
Neither Restarting the Clusterware or killing the sessions IDs that come out of below commands did help as well:
fuser /u01/grid/12.2.0.3/lib/libsqlplus.so
fuser /u01/grid/12.2.0.3/lib/libasmclntsh19.so

Solution:
Apply the RU patch manually for each patch using opatch instead of opatchauto:

Manual patching of RU patch on GRID HOME:
Note the patch has been extracted under /u01/oracle/RU_PATCHES/ make sure that PATCH_LOCATION variables refers to the location where you extracted the RU binaries.

[As ROOT]
        Run Pre-patching script:
        # export GRID_HOME=/u01/grid/12.2.0.3
# $GRID_HOME/crs/install/rootcrs.sh -prepatch 
Stop clusterware on the local node:
        # crsctl stop cluster

[As GRID OWNER]
        Start patching:
# export PATCH_LOCATION=/u01/oracle/RU_PATCHES/29708769
$GRID_HOME/OPatch/opatch apply -oh $GRID_HOME -local $PATCH_LOCATION/29850993 
$GRID_HOME/OPatch/opatch apply -oh $GRID_HOME -local $PATCH_LOCATION/29851014 
$GRID_HOME/OPatch/opatch apply -oh $GRID_HOME -local $PATCH_LOCATION/29834717 
$GRID_HOME/OPatch/opatch apply -oh $GRID_HOME -local $PATCH_LOCATION/29401763 

[As ROOT]
        Run Post-patching script:
$GRID_HOME/crs/install/rootcrs.sh -postpatch 

Manual patching of RU patch on ORACLE HOME:
[As ORACLE OWNER]
        Run Pre-patching script:
# export PATCH_LOCATION=/u01/oracle/RU_PATCHES/29708769
$PATCH_LOCATION/29850993/custom/scripts/prepatch.sh -dbhome $ORACLE_HOME 

Stop all instances that runs from ORACLE_HOME on the local node:
# srvctl stop instance-d xxx -i xxx
        Start patching:
        # export ORACLE_HOME=/u01/oracle/12.2.0.3
# export PATCH_LOCATION=/u01/oracle/RU_PATCHES/29708769
$ORACLE_HOME/OPatch/opatch apply -oh $ORACLE_HOME -local $PATCH_LOCATION/29850993 
$ORACLE_HOME/OPatch/opatch apply -oh $ORACLE_HOME -local $PATCH_LOCATION/29834717 

        Run Post-patching script:
$PATCH_LOCATION/29850993/custom/scripts/postpatch.sh -dbhome $ORACLE_HOME 

Start back all instances on the local node:
# srvctl start instance-d xxx -i xxx

Wednesday, September 4, 2019

DBA Bundle 5.4 Is Ready

DBA Bundle V5.4 is now available in this link:
https://www.dropbox.com/s/k96rl0f4g39ukih/DBA_BUNDLE5.tar?dl=0

The following are the new features:
- Excluding +APX DB (>=12.2) from the DB list when bundle scripts get invoked.
- Defaulting RMAN_full.sh & schedule_rman_full_bkp.sh scripts to create Full backup instead of Incremental 0 to avoid interfering with any existing backup policies.
- Added Hard parsing command post to the completion of tuning task in sql_id_details.sh.
- All DB users DDL generated script out of configuration_baseline.sh can be executed directly from SQLPLUS when needed.

- Bug fixes applied to the following scripts:
Table rebuild online script: rebuild_table.sh
Aliases setup script: aliases_DBA_BUNDLE.sh
Database/Server Monitoring script: dbalarm.sh
Health Check Report script: dbdailychk.sh
Checking Active Sessions script: active_sessions.sh

If you are new to the DBA BUNDLE please read the full post:
http://dba-tips.blogspot.com/2014/02/oracle-database-administration-scripts.html

Tuesday, September 3, 2019

Datafile Rename with minimal Downtime using RMAN

Case:
Let's suppose you created a new datafile and missed the .dbf extension or had a typo in the name but you want to fix it.

Let's suppose the bad datafile name is /data01/STFBKUAT01/datafile/users012 and it should be renamed to /data01/STFBKUAT01/datafile/users02.dbf

Prerequisite: - The database should be in ARCHIVELOG mode.
                     - It's always recommended to take a full backup before and after changing the
                       database physical structure.

Solution:
If you are on 12c, lucky you are, you can do the job with a single command with NO DOWNTIME but with more overhead:

SQL> alter database move datafile '/data01/STFBKUAT01/datafile/users012' to '/data01/STFBKUAT01/datafile/users02.dbf';

Database altered.

This command will copy the datafile using the new name, then will switch to the new datafile, then will delete the old datafile with the old name. as it's engaging a physical copy of the datafile this will generate an overhead based on the datafile size and system speed.

If you are on 11g backward you can rename the datafile with minimal downtime, but with less overhead compared to 12c method:

-- Bring the datafile back offline: [Downtime start]
SQL> alter database datafile '/data01/STFBKUAT01/datafile/users012' offline;

Database altered.

-- Rename the datafile on the OS side:
SQL> host mv /data01/STFBKUAT01/datafile/users012  /data01/STFBKUAT01/datafile/users02.dbf

-- Rename the datafile on the DB side:
SQL> alter database rename file '/data01/STFBKUAT01/datafile/users012' to '/data01/STFBKUAT01/datafile/users02.dbf';

Database altered.

-- Recover the datafile:
SQL> recover datafile '/data01/STFBKUAT01/datafile/users02.dbf';
Media recovery complete.

-- Bring the datafile back online: [Downtime end]
SQL> alter database datafile '/data01/STFBKUAT01/datafile/users02.dbf' online;

Database altered.