Sunday, March 27, 2022

Shutdown of RAC DB instance is taking very long time showing error: kponepms: encountered ORA-12535

 Problem:

While trying to shut down a 19c RAC DB instance using srvctl command with IMMEDIATE shutdown option, the instance shutdown was taking a significantly long time, at the same time the DB alertlog was showing this error frequently:

kponepms: encountered ORA-12535 for location net8://(ADDRESS=(PROTOCOL=tcp)(HOST=10.1.1.105)(PORT=10809))?PR=0 subname "SYS"."ALERT_QUE":"HAE_SUB" after 7200000 milisecs 

Using ABORT option in srvctl command, still developing same symptoms:

# srvctl stop instance -d dbx -i dbx1 -o abort -force

I decided not to act as per the book anymore, and end up this embarrassment by issuing "SHUTDOWN ABORT" command from a new SYSDBA SQLPlus session!

Analysis:

I used to disable AQ & Jobs 15 minutes before Shutting down the DB to avoid the time jobs will consume for rolling back transactions.

Solution:

Make sure to set AQ and Jobs parameters to at least =1, rather than setting them to 0 before the shutdown of the DB:

SQL> alter system set aq_tm_processes=1;
SQL> alter system set job_queue_processes=1;

If you still face the same issue, then you may need to set the following hidden parameter (you have to consult Oracle Support before setting it):

SQL> alter system set "_client_enable_auto_unregister"=true scope=BOTH sid='*';

 

Saturday, March 26, 2022

RMAN Duplicate hang on error: WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!

Problem:

I was trying to duplicate a 19c database from an RMAN backup, but it was hanging at this final stage of opening the cloned DB in resetlogs mode:

Analysis:

When checked the alertlog I found the following messages in the alertlog:

2022-03-21T13:14:26.588816+00:00
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=58
System State dumped to trace file /oracle/19.5/diag/rdbms/uat3/uat3/trace/UAT3_p00e_5888.trc
2022-03-21T13:14:27.941400+00:00
TT03 (PID:5858): Sleep 160 seconds and then try to clear SRLs in 8 time(s)
2022-03-21T13:17:07.941846+00:00
TT03 (PID:5858): Sleep 320 seconds and then try to clear SRLs in 9 time(s)
2022-03-21T13:18:01.355711+00:00
********************  WARNING **************************
The errors during server control file autobackup are not
fatal, as it is attempted after sucessful completion of
the command. However, it is recomended to take an RMAN
control file backup as soon as possible because the
autobackup failed with the following error:
ORA-00237: snapshot operation disallowed: control file newly created

********************  END OF WARNING *******************

 After a long search, I figured out that it's Bug 30931981 which impacts 19c and was fixed in 21c, also I figured that I'm having more patches installed on the target cloned target DB ORACLE_HOME compared to the ones installed on the source DB ORACLE_HOME which I'm cloning from.

Solution:

Permanent: Apply patch 30931981

Workaround: Make sure the target "cloned DB" ORACLE HOME is having same patches installed as similar to the source DB you are cloning from.


Friday, March 25, 2022

Duplicate of a Standby DB from an RMAN backup throws RMAN-05657: There are multiple database IDs present

 Problem:

The RMAN duplicate command for creating a standby database from a backup throws the following error, just right after allocating auxiliary channels:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 03/24/2022 13:32:59
RMAN-05501: aborting duplication of target database
RMAN-05657: There are multiple database IDs present.

Solution:

Try to set the same DBID of the primary DB in the duplicate script of the standby DB:

Get the DBID from the primary DB:

SQL> select dbid from v$database;

      DBID
----------
1352250453

 

Or from the RMAN backup logfile:

Then set the DBID in the duplicate script like this:

set dbid 1352250453
run {
allocate auxiliary channel aux1 type disk;
...
duplicate target database for standby backup location '/backup/RMANBKP' nofilenamecheck
spfile
...
}

Good luck!


Wednesday, March 16, 2022

Following active executables are not used by opatch process : bin/oracle

 Problem:

When applying an interim patch on a 19c ORACLE_HOME using "opatch apply" command, I get this error:

 Prerequisite check "CheckActiveFilesAndExecutables" failed.
The details are:

Following active executables are not used by opatch process :
/u01/oracle/19.5/bin/oracle


Following active executables are used by opatch process :

UtilSession failed: Prerequisite check "CheckActiveFilesAndExecutables" failed.
Log file location: /u01/oracle/19.5/cfgtoollogs/opatch/opatch2022-03-03_15-39-23PM_1.log

OPatch failed with error code 73

Analysis:

Oracle binary file is being used by a process or session. Which indicates that there are services under ORACLE_HOME are still up and running.


Solution:

1- Make sure you properly shut down all Oracle services running from the ORACLE_HOME that is being patched.

2- Kill the processes that remain accessing ORACLE_HOME which will be patched: 

Caution: fuser command will not show only the process accessing ORACLE_HOME but will show all processes accessing the filesystem from which ORACLE_HOME is running,  i.e. if you have ORACLE Home and GRID Home running from same filesystem, below command will show all processes accessing both ORACLE_HOME and GRID_HOME.

# fuser -v -m $ORACLE_HOME

Then validate the list and kill the PIDs running from ORACLE_HOME.

Monday, March 14, 2022

INS-40980 Installer has detected that alias names is specified for the following nodes

 Problem:

Getting the following error when installing a new 19c Grid Infrastructure installation on two RAC nodes:

[INS-40980] Installer has detected that alias names is specified for the following nodes:










 


Analysis:

Regardless of the grammar mistake in the error message; where "is" should be replaced with "are" 😄 Most probably, this issue happens due to wrong configuration of user equivalency file settings under ~/.ssh directory on any of the cluster nodes.


Solution:

[On both Nodes: As the Grid Infrastructure OS user owner]

1- Cleanup the files under ~/.ssh 

# rm -f ~/.ssh/*

[On one Node only: As the Grid Infrastructure OS user owner]

2- Set up the user equivalency between RAC nodes using sshUserSetup.sh under the extracted Grid Infrastructure home, specifying node name (in green):

# /u01/grid/19c/oui/prov/resources/scripts/sshUserSetup.sh -user grid  -hosts "node1 node2" -noPromptPassphrase -confirm -advanced

It will verify the connectivity between the specified nodes, then will prompt you to Enter the Grid Infrastructure OS user owner password for each node. In my case, I've to enter it two times.

3- You may either need to close and open the Installer again or restart the RAC candidate nodes.


Friday, March 11, 2022

What Happen If The Grid Infrastructure Installation GUI Crashed before Reaching Finish Page

Question:

 What will happen if the Grid Infrastructure Installer crash before reaching the "Finish" page and after running root.sh on all RAC nodes?

Answer:

First, check the installation log file:

oraInventory/logs/GridSetupActions*

If you don't find errors in the logfile, and it ends by the following records, then your installation probably finished successfully:

Second, to be certainly sure that nothing wrong happened during the installation, verify GI setup by running runcluvfy tool, replacing node1,node2 with your actual name of your cluster nodes:

# $GRID_HOME/runcluvfy.sh stage  -post hwos  -n node1,node2  -verbose

All verifications should be flagged with "...PASSED"

In case some verifications are flagged with "FAILED" then rather than troubleshooting them one by one, it's better to save the time and cleanup the installation following below post and re-do the installation again:


Thursday, March 10, 2022

ORA-01503 ORA-17503 When Creating a Controlfile

Problem:

Control file creation fail when checking REDOLOG files:

CREATE CONTROLFILE REUSE DATABASE "RDQ" NORESETLOGS NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01565: error in identifying file '+REDO1/RDQ/ONLINELOG/group_1.1'
ORA-17503: ksfdopn:2 Failed to open file +REDO1/RDQ/ONLINELOG/group_1.1
ORA-15173: entry 'ONLINELOG' does not exist in directory 'RDQ'


Analysis:

When creating a control file with " NORESETLOGS" option the database will expect that the REDOLOG files are already exist in the specified location, so it will use them to completely recover the database when coming up from mount to open mode, otherwise it will fail.

Solution:

1- If you really intended to create the control file with "NORESETLOGS" option, make sure you pass the right REDOLOG files path to the CREATE CONTROLFILE command.

2- If you are re-creating the control file after restoring the database and doing "in-complete" point in time recovery, then the database have to create a fresh REDOLOG files, and thus, you have to replace the "NORESETLOGS" parameter with "RESETLOGS" so the database will create the REDOLOG files in the specified location even if they are not already exist there.


Wednesday, March 9, 2022

Continue the Duplicate after RMAN-06054: media recovery requesting unknown archived log

 Problem:

During the execution of Duplicate RMAN command to create a new 19c database from an RMAN backup, the duplicate processes terminated with RMAN-06054 before the completion of the duplicate process:

Analysis:

The backup I was restoring from was missing some critical archivelogs to bring all datafiles in a consistent state. Luckily, the datafile which was inconsistent belonged to goldengate tablespace which I don't need it in the new database.

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 68 needs more recovery to be consistent
ORA-01110: data file 68: '+DATA/SPP/DATAFILE/ggtbs.339.1098850765'

Solution:

1- Backup the current controlfile [In case things may go wrong, and you want to restore it back]

SQL> ALTER DATABASE BACKUP CONTROLFILE TO '/home/oracle/controlfile.bkp';

2- Generate the controlfile script with the RESETLOGS option:

-- Generate the controlfile script:

SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS '/home/oracle/controlfile.trc' RESETLOGS;

-- Edit the controlfile trace script adding "SET" keyword: [If you will change the database name, make sure to modify the parameter db_name in the SPFILE to the same as well]

CREATE CONTROLFILE REUSE SET DATABASE "SPP" RESETLOGS
...

CHARACTER SET AL32UTF8
;

 

Delete the rest of lines after CHARACTER SET XXX ;

3- Shutdown the database and recreate the controlfile using the edited trace script:

SQL> shu abort
SQL> @/home/oracle/controlfile.trc

4- Set the inconsistent datafile in OFFLINE mode:

SQL> alter database datafile '+DATA/SPP/DATAFILE/ggtbs.339.1098850765' offline;

5- Open the database with RESETLOGS mode:

 SQL> ALTER DATABASE OPEN RESETLOGS;

6- Add TEMP FILES to the TEMPORARY TABLESPACE:

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE;

Note: Setting parameter "_allow_resetlogs_corruption"=TRUE scope=spfile; may be an option here if you cannot set the datafiles in OFFLINE mode for any reason, but first you need to consult Oracle support before doing so; as the database will not be in a consistent state and most probably you will hit ora-00600 errors frequently!