Thursday, June 11, 2020

Re-execution of Duplicate command on 19c DB fails with RMAN-06054: media recovery requesting unknown archived log

Problem:

On a 19.5 database, I was trying to duplicate a database from an RMAN backup, first time the duplicate failed after restoring 90% of the datafiles due to space limitation on the underlying datafiles location, I re-ran the same duplicate command for the second time without deleting the already restored datafiles to see if the 19c RMAN will be able to continue the previous duplicate without problems using this command:

I've restarted the auxiliary DB again in mount mode and re-ran the duplicate command:
run {
ALLOCATE AUXILIARY CHANNEL ch1 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch2 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch3 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch4 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch5 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch6 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch7 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL ch8 DEVICE TYPE DISK;
duplicate database to "AWSNFR"  backup location '/bkp' nofilenamecheck
UNTIL TIME "TO_DATE('31/05/2020 15:20:52', 'DD/MM/YYYY HH24:MI:SS')";
}


But the second run of the duplicate command failed with the following error after restoring the rest of datafiles, requesting a very old archivelog for recovering the database:

.....
Starting restore at 07-Jun-2020 11:04:26
creating datafile file number=19 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.283.1039812437
creating datafile file number=20 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.284.1039812457
creating datafile file number=21 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.285.1039812475
creating datafile file number=24 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.288.1039812529
creating datafile file number=25 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.289.1039812549
creating datafile file number=27 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.291.1039812587
creating datafile file number=28 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.292.1039812605
creating datafile file number=42 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.306.1039812873
creating datafile file number=43 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.307.1039812891
creating datafile file number=57 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.321.1039813155
creating datafile file number=58 name=/awsnfr/AWSNFRDB1/datafiles/p_fz_data.322.1039813173

......

Finished restore at 07-Jun-2020 12:05:28
.....
starting media recovery

unable to find archived log
archived log thread=1 sequence=7


RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 06/07/2020 12:05:31
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 7 and starting SCN of 2470350


Analysis:

You can notice from the above log, when duplicate command ran again, it started with re-creating (not restoring) some blank datafiles (in green color).

After the failure of the duplicate command, I ran the following command and figured out that those datafiles (in green color) are the only ones that having very low checkpoint_time and checkpoint_change# than the other datafiles:

col name for a60
col CREATION_TIME for a20
col checkpoint_time for a20
select FILE#||'|'||name name,fuzzy, creation_change#,
to_char(creation_time, 'DD-MON-YYYY HH24:MI:SS') as creation_time,
to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time,
to_char(checkpoint_change#, '999999999999999999999') as checkpoint_change#
from v$datafile_header;


Solution:

I decided to restore those datafiles manually from RMAN:

RMAN> restore datafile 19,20,21,24,25,27,28,42,43,57,58;

RMAN> recover database;
This will recover the database starting from the right sequence till it fail when no more archivelogs are available inside the backup.

Then, because I was running an RMAN duplicate on a different server, I need to complete the rest of in-complete duplicate steps manually:

1- Generate a create controlfile script:
RMAN> alter database backup controlfile to trace as '/home/oracle/control.trc';
 

2- Edit the controlfile trace by replacing the NORESETLOGS keyword with RESETLOGS then delete the rest of script contents :

# vi /home/oracle/control.trc

STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "
AWSNFR" RESETLOGS 
....
CHARACTER SET AL32UTF8
;


Then remove all the lines below CHARACTER SET AL32UTF8 ; lines:

Make sure that all datafiles, redo logs are pointing to the right exist paths on the target server.

3- Shutdown the database and execute the controlfile creation script "which we modified above" to overwrite the current clone controlfile:
RMAN> SHUTDOWN IMMEDIATE
RMAN> @/home/oracle/control.trc


4- Open the database with resetlogs option:  
RMAN> alter database open resetlogs;

Voilà! Database opened.


Add a tempfile to the Temporary tablespace:

RMAN> alter tablespace temp add tempfile;


Conclusion:
In Oracle 19c, RMAN can continue the duplicate from where it stopped, it will catch from where it stopped in the previous execution, but the catch here, it will not be able to re-restore the datafiles that were in the middle restoration during the last duplicate execution, so it will try to create them as blank datafiles, during the recovery it will try to recover these datafiles from the time they were created on the source DB! Of course most probably those archivelogs are since the creation of the source database itself, and they are not available anymore. The DBA has to correct this behavior by restore those datafiles manually from the same RMAN backup the duplicate was using. Still, the re-execution of the duplicate operation on 19c is not a fail proof!

No comments:

Post a Comment