Wednesday, September 11, 2019

RAC Database Fail To Start Due To "terminating the instance due to error 119" error

Problem:
On a Linux 7, after system reboot I have an 11.2.0.3 RAC Database failed to start due to this error in the alertlog:

USER (ospid: 13324): terminating the instance due to error 119
Instance terminated by USER, pid = 13324

Cause:
While investigating I figured out that the SCAN name "rac1-scan" which is used in "remote_listener" initialization parameter is not pingable:
$ ping rac1-scan
ping: unknown host rac1-scan

When checked /etc/resolv.conf file I found the DNS server entry got removed:
$ cat /etc/resolv.conf
# Generated by NetworkManager
search preprod.mycompany.com


# No nameservers found; try putting DNS servers into your
# ifcfg files in /etc/sysconfig/network-scripts like so:
#
# DNS1=xxx.xxx.xxx.xxx
# DNS2=xxx.xxx.xxx.xxx
# DOMAIN=lab.foo.com bar.foo.com

Solution:
Luckily I always run a script called configuration_baseline.sh to keep a backup of the critical system files entries inside one log file, it helps to restore the entries later whenever any bad change happens to these files.

I restored the original entries of /etc/resolv.conf which holds the right DNS server name in my environment then managed to start the RAC DB successfully.

$ cat /etc/resolv.conf
# Generated by NetworkManager
search preprod.mycompany.com
nameserver 10.100.22.10

Conclusion:
terminating the instance due to error 119 error is mainly related to "remote_listener" initialization parameter setting, if this setting is messed up it can prevent the RAC instance from starting up.

Recommendation:
Always keep a copy of /etc/resolv.conf or immune the file from getting reset by the system after system reboot by using the following command:
chattr +i /etc/resolv.conf

You can use this script to help you with keeping the entries of your Linux & Oracle critical files saved somewhere, in case you will have a need to restore them later:

No comments:

Post a Comment