Friday, December 30, 2022

RAC DB Crash by ORA-00600: internal error code, arguments: [ksm_mga_pseg_cbk_attach:map_null]

Problem:

When implementing a change required a RAC nodes restart one by one on a 19.5 RAC 2 nodes DB, and while restarting one node in the cluster (node2), the other node (node1) started to report this error and went in hung state, neither we were able to abort the DB instance nor restart the clusterware or the Linux OS:

ORA-00600: internal error code, arguments: [ksm_mga_pseg_cbk_attach:map_null]

Analysis:

The above-mentioned error didn't come alone, it was accompanied by a bunch of other ORA-00600 errors!

ORA 600 [ipc_recreate_que_2]                              
ORA 600 [ORA-00600: internal error code, arguments: [ipc_re
ORA 600 [IPC Assert]                                       
ORA 603                                                    
ORA 600 [17090]    
                                       


Along with the only clear ORA error which explained what was going on:

ORA-27300: OS system dependent operation:open failed with status: 23
ORA-27301: OS failure message: Too many open files in system

We decided to forcefully reboot the hung node (node1) from the hardware console; as the OS went hung, and to be honest, this is a tough decision any DBA can take on a cluster node; as this is known to corrupt the clusterware files especially when fsck runs at the startup.

What made the issue worse, is that the clusterware took too much time to come up on node2 (the node which restarted gracefully), as it was having trouble reading the votedisks --remember the other node (node1) which went hung and forcefully rebooted? This explains!

You may think that everything went fine after the startup of node2, and the hero managed to kill the beast and the movie ended? NO
10 minutes after, node1 DB instance joined back the cluster and node2 crashed by the same above-mentioned bunch or ORA-600 errors.
This scenario kept happening (one node join the cluster, the other node crash), until I started up both DB instances with the pfile.


Solution:

Oracle support referred us to an unpublished Bug 30851951. This bug is well known to hit 19c RAC DBs.

In short, there is something called MGA (Managed Global Area) memory, which shares the memory between the processes that access the PGA memory by sharing it under /dev/shm (if I'm not mistaken!), The bug causes the MGA to keep opening endless number of files under /dev/shm causing the system to reach its max open_files limit and go hung.

Which means you have to pick one of the following solutions:

1- Apply the bug fix patch 30851951 which is available for 19.5
2- Apply 19.8 RU patch or higher.
3- Workaround: Set a value for pga_aggregate_limit parameter.

The best solution I can see here, is to apply the bug fix 30851951 whereas,
Applying 19.8 RU patch is a major change in the system, and setting a value for pga_aggregate_limit parameter is well known to make the system prone to more PGA related bugs on 19c RAC! 

Honestly speaking, throughout the years and after witnessing similar RAC related bugs incidents in the past --since RAC 11g, I started to have a strong feeling that RAC technology itself (because of its bugs) contributes more to the SLAs breaches than the stability of the system!

Thursday, December 15, 2022

Add Disks with Different Sizes to ASM Diskgroup ONLINE

Scenario:

As of a sudden, I came to know that an AWS EC2 instance has a limitation of maximum of 27 EBS volumes to be attached to it. One of my EC2 DB server was already having 26 volumes attached to it, most of the volumes sizes is 500 GB. I planned to replace all the 500 GB luns with 1 TB luns to free a room for adding more luns in the future.

Facts:

- My DB is using ASM.
- My ASM instance version is 19c.
- My ASM Diskgroups redundancy is EXTERNAL, which means I've no redundancy, which means losing one disk can bring the whole diskgroup offline.
- The ASM diskgroup can contain disks with different sizes at the same time (this will help me perform the disk replacement with bigger sizes online). Not recommended from the performance perspective, but it's OK to have it for disk replacement purpose.
- The ASM diskgroup can balance the data evenly between the disks with different sizes on the same diskgroup, which means if the diskgroup is 80% full, each disk inside the diskgroup will be 80% full regardless of its size.

Technical Steps:

For the purpose of keeping this post briefed, I'll show here an example of replacing two disks of 516 GB with 1 TB disk in the RECO disk group.

1- 1 TB volume is attached to the machine.

2- I've partitioned the disk into 1 primary partition using fdisk command:



3- Labeled the 1TB disk as an ASM disk:

# oracleasm createdisk PSS_RECO_DISK01 /dev/nvme17n1p1

4- login to the ASM instance as / as sysasm:

# export ORACLE_SID=+ASM
# $GRID_HOME/bin/sqlplus '/ as sysasm'

5- At the same time, add the new disk of 1 TB to the diskgroup and remove two disks of 516 GB from the same diskgroup:

Note: you don't have to wait between adding the new and removing the old disks, run "ADD DISK" command followed by "DROP DISK" command in sequence as once you get back the prompt, but you shouldn't remove the old disks from the machine until the REBALANCE completes (remember: my diskgroup redundancy is EXTERNAL, this is why I must wait for the REBALANCE activity to complete)

-- Increase the rebalance power to speed up the replacement of the disks:
SQL>  ALTER DISKGROUP PSS_RECO REBALANCE POWER 11;

-- Add the 1TB disk:
SQL> ALTER DISKGROUP PSS_RECO ADD DISK '/dev/oracleasm/disks/PSS_RECO_DISK01';

-- Remove two disks of 500G from the same diskgroup:
SQL> ALTER DISKGROUP PSS_RECO drop disk PSS_RECO_0000;
SQL> ALTER DISKGROUP PSS_RECO drop disk PSS_RECO_0001;


6- Wait until the REBALANCE activity complete:

-- Keep checking the status until it returns no rows:
SQL> select * from V$ASM_OPERATION;

-- Also you can use this query to check the size of Data which being rebalanced so far:
SQL>
col DISK_PATH        for a45
col diskgroup_name  for a15
col FAILGROUP       for a17
col disk_name            for a17
col TOTAL_GB         for 9999999.99
col FREE_GB            for 9999999.99
SELECT NVL(a.name, '[CANDIDATE]')diskgroup_name, b.path disk_path, b.name disk_name,
b.failgroup, b.total_MB/1024 TOTAL_GB, b.free_MB/1024 FREE_GB, b.state, b.MODE_STATUS
FROM v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
ORDER BY a.name, b.path;


-- After approximately 49 minutes , it completed:



-- The dropped disks should now show as candidates:

SQL> SELECT NVL(a.name, '[CANDIDATE]')diskgroup_name, b.path disk_path, b.name disk_name,
b.failgroup, b.total_MB/1024 TOTAL_GB, b.free_MB/1024 FREE_GB, b.state, b.MODE_STATUS
FROM v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
ORDER BY a.name, b.path;


7- After the dropped disks show as candidates, now you can un-provision them from the server:

# oracleasm deletedisk PSS_RECO_0000
# oracleasm deletedisk PSS_RECO_0001

# oracleasm scandisks

8- Don't forget to return the REBALANCE power on the same diskgroup to its default:

SQL>  ALTER DISKGROUP PSS_RECO REBALANCE POWER 1;