Monday, April 10, 2023

Recovering the Standby Database from MRP0: Detected orphaned datafiles after Opening it in Snapshot mode

Problem:

I ran into the following situation on a 19c STANDBY DB:

1- Converted the Standby DB to SNAPSHOT mode for testing purposes.
2- During the test, the /u01 filesystem where ORACLE is installed got full and the following errors started to be reported in the alertlog:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '+REDO2/SPR/ONLINELOG/group_2.260.1040915087'
ORA-17503: ksfdopn:2 Failed to open file +REDO2/SPR/ONLINELOG/group_2.260.1040915087
ORA-15045: ASM file name 'ORA-09925: Unable to create audit trail file
Linux-' is not in reference form
ORA-15120: ASM file name 'ORA-09925: Unable to create audit trail file
Linux-' does not begin with the ASM prefix character
Linux-x86_64 Error: 28: No space left on device

2- Converted back the DB to Physical Standby.
3- Startup the recovery, it started but not recovering anything, and there are no errors in the alertlog as well!

Analysis:

When converting the database back from SNAPSHOT to PHYSICAL STANDBY mode, I can see in the DB alertlog the flashback has happened so quick within few seconds while the DB was already opened in SNAPSHOT mode for a long time and tons of transactions been performed on it requires long time to Flashback.

Also, when I checked the database incarnation using "list incarnation" RMAN command, it was still pointing to the incarnation when the DB was converted to SNAPSHOT STANDBY which is wrong:

Note: Each time you convert the PHYSICAL STANDBY to SNAPSHOT mode, the DB will be assigned a new incarnation number.

Solution:

1- Set the DB incarnation to the right BASE incarnation when the DB was in PHYSICAL STANDBY mode:

RMAN> RESET DATABASE TO INCARNATION 1920977;

RMAN> LIST INCARNATION;

2- Flashback the database to a time older than the time you converted the DB to SNAPSHOT mode while it was still in PHYSICAL STANDBY mode: [Luckily, the Flashback feature was turned on before the conversion to SNAPSHOT mode]

SQL> select timestamp_to_scn(to_timestamp('08/04/2023 00:01:00','DD/MM/YYYY HH24:MI:SS')) as scn from dual;

6683184418651

SQL> Flashback database to SCN 6683184418651;

Start the Recovery process:

SQL> RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY DISCONNECT; 

Monitor the alertlog to make sure that the recovery processes is now recovering from the archivelogs until it be in-sync with the primary.

SQL> select value "Time LAG between Primary and Standby" from v$dataguard_stats where name='apply lag';

Done. The database is supposed to be a functional PHYSICAL STANDBY now.

Conclusion:

- Having ORACLE HOME filesystem full during the SNAPSHOT STANDBY mode has triggered this weird issue.

- If the Standby database is unable to recover from archivelogs after converting it back from SNAPSHOT mode or if it's throwing ORA-19909 in the alertlog, try first to check if the database is on the right base incarnation which was CURRENT before the conversion to SNAPSHOT mode using "list incarnation" RMAN command.

- If the recovery is still unable to start, you may need to consider flashing back the Standby database to an older time before the conversion to SNAPSHOT mode.

- Enabling FLASHBACK feature on the STANDBY DB can save you from many catastrophic scenarios, one of them; is to rebuild the Standby if you reach to a dead end.


No comments:

Post a Comment