Sunday, July 25, 2021

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired during impdp import in 19c

Problem:

During importing a big dump file on a 19c database using data pump impdp, I got the following error repeatedly:

Failing sql is:
ALTER INDEX "WM"."IDX_FK_PO2LU_NATIONALITY" LOGGING
10-JUN-21 15:10:54.278: ORA-39083: Object type INDEX:"WM"."VEN2PERSONORG2_FK" failed to create with error:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired


Analysis:

When you use DISABLE_ARCHIVE_LOGGING:Y during the import, it's expected to face ORA-00054, this is because the import process tries to re-enable the logging mode for each object while the object is still being locked by the data pump process. This looks like a bug, but Oracle Support keep saying: it's a normal behavior!

I tried setting DDL_LOCK_TIMEOUT parameter instance wide to a big value before executing the import to give those ALTER statement the sufficient time to wait for the lock to be released without any luck!

Workaround:

There are three workarounds I can think of:

1- Execute all the failing ALTER statements manually after the import; just grep all the failing ALTER statements from the import log and execute them from SQLPLUS:
e.g.
# grep "ALTER" import.log
terminate each statement with semicolon ";" at the end, and then execute them from SQLLPUS

2- If you don't care about enabling the LOGGING mode on the underlying objects; i.e. you are importing the data into a non-production database that doesn't need to be restored to a point in time when it fails, then just simply ignore the errors to leave those objects in NOLOGGING mode.

3- If you can sustain turning the target DB into NOARCHIVELOG mode during the import time window; then better restart the DB and alter the DB in NOARCHIVELOG mode, then start the import without using the parameter transform=DISABLE_ARCHIVE_LOGGING:Y , once the import is done, restart the DB and set it back to ARCHIVELOG mode. [A Stone Age workaround I know 𓀗]

Monday, July 12, 2021

Step By Step How To Add a New Node To a 19c Cluster

One bad news before we start: In 19c versions older than 19.8 you will not be able to add a new node to the cluster using this procedure due to bug 30195027 [Doc ID 30195027.8]
The workaround is to upgrade your GRID_HOME to 19.8 and above using RU patch, I'll shed some light on a similar GRID_HOME upgrade task using zero downtime technique in a future a post.


In this demo:

The cluster is already having one node available with name clsn1, and we are adding one extra node with name clsn2.
orcl   refers to the cluster database name
orcl2 refers to the new instance name on the new node.
clsn2-vip refers to the virtual name of the new node.

Step1: Pre-requisites: [On the new node]

Note: Please strictly follow the prerequisites instructions, failing to do so will lead to mind-blowing strange errors during the process of adding the node to the cluster!

- Install the same OS and kernel version
as similar to the existing nodes in the cluster.

- Install Oracle's required packages: [Applicable for Linux 7]
  # wget http://public-yum.oracle.com/public-yum-ol7.repo
  # yum install -y oracle-database-preinstall-19c gcc gcc-c++ glibc-devel glibc-headers elfutils-libelf-devel gcc gcc-c++ kmod-libs kmod unixODBC unixODBC-devel dtrace-modules-headers
  # yum install -y fontconfig-devel libXrender-devel librdmacm-devel python-configshell targetcli compat-libstdc++-33
  # yum install -y oracleasm-support
 

- Provision the same ASM disks which already provisioned on the other nodes of the cluster to the new node.

- Create the same Groups & Oracle users with the same IDs as the existing nodes in the cluster:
  # groupadd -g 54322 dba
    groupadd -g 54324 backupdba
    groupadd -g 54325 dgdba
    groupadd -g 54326 kmdba
    groupadd -g 54327 asmdba
    groupadd -g 54328 asmoper
    groupadd -g 54329 asmadmin
    groupadd -g 54330 racdba

Note: Below example will add or modify oracle user, I'm doing this for oracle user only because I'm using the same oracle user "oracle" as the owner of both GRID and ORACLE DB homes.

  # useradd oracle -u 54321 -g oinstall -G dba,oper,asmdba,backupdba,dgdba,kmdba,racdba,asmadmin,asmdba,asmoper
  # usermod oracle -u 54321 -g oinstall -G dba,oper,asmdba,backupdba,dgdba,kmdba,racdba,asmadmin,asmdba,asmoper


- Copy the oracle user's .bash_profile from any of the other nodes in the cluster, replacing the node name and instance name with the right values accordingly.

- Set up the same network configuration comparing these files with other nodes in the cluster: /etc/hosts, /etc/resolv.conf
- Protect resolv.conf file from getting changed: # chattr +i /etc/resolv.conf

- Adjust MTU for loopback device: /etc/sysconfig/network-scripts/ifcfg-lo adding this parameter: MTU=16436 and execute this command: # ifconfig lo mtu 16436

- Set the NOZEROCONF parameter in /etc/sysconfig/network adding this parameter: NOZEROCONF=yes

- Setup the same OS settings as similar to other settings in the other nodes: /etc/sysctl.conf, /etc/security/limits.conf

- Stop/Disable Avahi-daemon
:
# systemctl stop avahi-daemon; systemctl disable avahi-daemon; systemctl status avahi-daemon

- Disable SELinux & Firewall:
Adding parameter: SELINUX=disabled to /etc/selinux/config ,
# systemctl stop firewalld; systemctl disable firewalld; systemctl status firewalld

- Set Server Timezone similar to other nodes in the cluster: Check /etc/localtime

- NTP configuration should be same as other nodes [if it's being used]: /etc/chrony.conf

- Configure password less SSH:

  Generate new key On the new node: # ssh-keygen -t rsa
  Copy the key to the other nodes in the cluster: e.g.  # ssh-copy-id oracle@clsn1  # cat ~/.ssh/id_rsa.pub | ssh oracle@clsn2 "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
  Test the connectivity from the new node to the other nodes in the cluster: # ssh oracle@clsn1
  Copy the keys from each active node in the cluster to the new node: # ssh-copy-id oracle@clsn2
  Test the connectivity from other nodes to the new node: # ssh oracle@clsn2

- Change the ownership of the filesystem where Oracle installation files will be installed:

  # chown oracle:oinstall /u01
  # chmod 774 /u01


- Scan ASM disks: [As root on New node]
Note: Same disks on other cluster nodes should be provisioned to the new node as well before do the scanning.

  # oracleasm scandisks
  # oracleasm listdisks


Note: Make sure all ASM disks can be listed, comparing the result with other exist nodes.

- Compare the configuration between existing node and new node: [As GRID Owner - On any of the existing nodes in the cluster]

  # $GRID_HOME/bin/cluvfy comp peer -refnode <exist_node_in_the_cluster> -n <new_node> -orainv orainventory_group -osdba osdba_group -verbose
e.g.
  # $GRID_HOME/bin/cluvfy comp peer -refnode
clsn1 -n clsn2-orainv oinstall -osdba dba -verbose
 

- Workaround INS-06006 Passwordless bug: [As grid owner]

  # echo "export SSH_AUTH_SOCK=0" >> ~/.bashrc
  # export SSH_AUTH_SOCK=0


Step 2: Clone GRID_HOME to the new node and Add it to the cluster: [As GRID Owner - On any of the existing nodes in the cluster]

# export IGNORE_PREADDNODE_CHECKS=Y
# $GRID_HOME/addnode/addnode.sh -silent "CLUSTER_NEW_NODES={
clsn2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={clsn2-vip}" -ignorePrereq -ignoreSysPrereqs

   Monitor the operation's log for errors:
    # tail -f /u01/oraInventory/logs/addNodeActions`date -u +"%Y-%m-%d_%H-%M"`*.log


At the End execute root.sh [As root - On the New Node]
# $GRID_HOME/root.sh

    Troubleshooting:
    In case of Error: scp: /u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml: No such file or directory
    Solution: Copy profile.xml on the RAC node you issued addnode.sh from: cp -p $GRID_HOME/gpnp/clsn1/profiles/peer/profile.xml  $GRID_HOME/gpnp/profiles/peer/profile.xml
         Then run root.sh again on the new node

Step 3: Clone the ORACLE HOME to the new node: [As oracle - On any of the existing nodes in the cluster]

# $ORACLE_HOME/addnode/addnode.sh -silent "CLUSTER_NEW_NODES={clsn2}" -ignorePrereqFailure -ignoreSysPrereqs

At the End execute root.sh [As root - On the New Node]
# $ORACLE_HOME/root.sh

Step 4: Start ACFS: If exist [As root - on the New Node]

# $GRID_HOME/bin/srvctl start filesystem -device <volume_device_name> -node clsn2

Step 5: Check the cluster integrity:

# cluvfy stage -post nodeadd -n clsn2 -verbose

Step 6: Add DB instance to New Node
: [As oracle - on the New Node]

# dbca -silent -ignorePrereqFailure -addInstance -nodeName clsn2 -gdbName orcl -instanceName orcl2 -sysDBAUserName sys -sysDBAPassword oracle#123#

-gdbName            Provide the same value in DB_UNIQUE_NAME if the parameter is set.
-instanceName     The name of the instance on the new node.


Note: This will add a new instance to the new node, add an extra REDOLOG thread plus one more UNDO tablespace as well dedicated for the new instance.

Add the new instance as a preferred instance to the DB Services: [Do this for each service]

# srvctl modify service -d orcl -s reporting_svc -n -i orcl1,orcl2

References: 

https://docs.oracle.com/en/database/oracle/oracle-database/19/cwadd/adding-and-deleting-cluster-nodes.html#GUID-929C0CD9-9B67-45D6-B864-5ED3B47FE458
 

Step By Step How To delete a RAC Node from 19c Cluster

Scope: In this post, I'll go you through the steps of how to remove a node in a 19c RAC cluster.
In this demo:
clsn2 will refer to the node name to be deleted from the cluster in this demo.
orcl   will refer to the cluster database name.
orcl2 will refer to the DB instance name on the node to be deleted.

Step 1: Shutdown all DB Instances running on the node to be deleted:

# srvctl stop instance -db orcl -i orcl2 -o abort

Step 2: Delete all DB Instance on the node to be deleted:

Note: This will delete the instance, remove it's REDOLOG thread and remove its UNDO tablespace as well:

# dbca -silent -ignorePrereqFailure -deleteInstance -gdbName orcl  -instanceName orcl2   -sysDBAUserName sys -sysDBAPassword oracle#123#

-gdbName            Provide the same value in DB_UNIQUE_NAME
-instanceName     The name of the instance on the node to be deleted


Step 3: Unpin the Node to be deleted:

- Check if the node you want to remove is pinned: [As root - On the node to be deleted]

# olsnodes -s -t

- If the node is pinned, then unpin it: [As root - On the node to be deleted]

# crsctl unpin css -n clsn2
 

Step 4: Deinstall GRID Home: [As GRID HOME owner - On the node to be deleted]

# $GRID_HOME/deinstall/deinstall
 

Step 5: Remove the Node from the cluster: [As root - On any of other node in the cluster]

# $GRID_HOME/bin/crsctl delete node -n clsn2

- Verify the Node deletion: [As GRID HOME owner - On any of other node in the cluster]

# $GRID_HOME/bin/cluvfy stage -post nodedel -n node_list -verbose

    Verifying Node Removal ...
      Verifying CRS Integrity ...PASSED
      Verifying Clusterware Version Consistency ...PASSED
    Verifying Node Removal ...PASSED


Post-check for node removal was successful.

Step 6: Delete the VIP Name:

- Get the VIP name of the deleted Node: [As GRID owner - On any of other node in the cluster]

# $GRID_HOME/bin/srvctl config vip -node clsn2 | grep 'Name' 

VIP Name: clsn2-vip

- Delete the VIP name of the deleted node using the VIP name output from previous command: [As root - On any of other node in the cluster]

# $GRID_HOME/bin/srvctl stop   vip  -vip clsn2-vip
# $GRID_HOME/bin/srvctl remove vip  -vip
clsn2-vip -noprompt

References:

https://docs.oracle.com/en/database/oracle/oracle-database/19/cwadd/adding-and-deleting-cluster-nodes.html#GUID-8ADA9667-EC27-4EF9-9F34-C8F65A757F2A

Tuesday, July 6, 2021

CLSRSC-147: Failed to get parameter values from GPnP profile '.../profile.xml'

Problem:

When adding an extra node to a 19c cluster using $GRID_HOME/addnode/addnode.sh script, and while running root.sh script on the new node, I got the following error:

...

2021/07/06 18:27:39 CLSRSC-594: Executing installation step 6 of 19: 'CheckCRSConfig'.
oracle.ops.mgmt.cluster.ClusterException: scp: /u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml: No such file or directory

2021/07/06 18:27:42 CLSRSC-180: An error occurred while executing the command '/u01/grid/12.2.0.3/bin/cluutil -copy -sourcefile /u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml -fromnodesfile /tmp/gacVdvDVIe -destfile /u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml -nodelist ftestn2'
2021/07/06 18:27:42 CLSRSC-147: Failed to get parameter values from GPnP profile '/u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml'
2021/07/06 18:27:42 CLSRSC-147: Failed to get parameter values from GPnP profile '/u01/grid/12.2.0.3/gpnp/profiles/peer/profile.xml'
2021/07/06 18:27:43 CLSRSC-509: invalid ASM mode
Died at /u01/grid/12.2.0.3/crs/install/crsinstall.pm line 2214.

Solution:

Simply copy the missing file to the same location that appear in the error message: [run the copy command one time on the active node in the cluster where you ran addnode.sh from, and one time on the new node as well]

Note: Replace $GRID_HOME with your actual GRID HOME path in the below commands.

# cp -p $GRID_HOME/gpnp/fzppon05vs1n/profiles/peer/profile.xml  $GRID_HOME/gpnp/profiles/peer/profile.xml

Then re-execute root.sh script as root user On the new node only:

# $GRID_HOME/root.sh

Sunday, July 4, 2021

[INS-06006] Passwordless SSH connectivity not set up between the following node(s) when adding a new RAC node in 19c

Problem:

When trying to add an extra node to a 19c cluster by running $GRID_HOME/gridSetup.sh from an active node in the cluster, I was keep getting INS-06006 error, although the passwordless configuration already working fine between the current cluster nodes and the new node:



 

 

 

 

Analysis:

After hours of searching and trying lots of solutions, I figured out that the following variable value was triggering that bug:



 

Solution:

Set parameter SSH_AUTH_SOCK to zero:

# echo "export SSH_AUTH_SOCK=0" >> ~/.bashrc

# export SSH_AUTH_SOCK=0

 

Bingo! I managed to skip that bug: