Friday, February 14, 2014

Oracle Database Monitoring Script


The backbone monitoring solution for Oracle databases in my environment is dbalarm.sh script. I've developed it many years back and it proved itself successful in each complex/critical environment I've used on.

This script [Ver 5.0] will report/monitor the following:
  > Report ORA- errors and TNS- errors in ALL databases ALERT LOGs.
  > Report ORA- errors and TNS- errors in ALL Listeners ALERT LOGs.
  > Monitor CPU utilization. (along with detailed analysis of top Oracle sessions).
  > Monitor TABLESPACES utilization.
  > Monitor FLASH RECOVERY AREA (FRA) utilization.
  > Monitor ASM Disk Groups utilization.
  > Monitor mounted FILESYSTEMs.
  > Monitor database BLOCKING LOCKS.
  > Report ERRORs in GOLDENGATE logfile. [Disabled by default]
  > Monitor specific named SERVICES. [Service name need to be provided to SERVICEMON variable]
When the Paranoid mode is set to ON:
  > Report Startup/Shutdown of the instance.
  > Report ALTER SYSTEM commands.
  > Report ALTER DATABASE commands.
  > Report EXPORT/IMPORT operations.

This script is coded to send only the new errors that appear since the last script execution,
in other words, it will not report already reported errors unless they get logged again in the log files, and this is one of the key strengths of this script.


How dbalarm script works?

This script is very smart and very easy to use, just follow these three steps:

Step 1:
Download the script from this link:

Step 2:
Open the script and change the E-mail address to your email address in the line# 22
MAIL_LIST="youremail@yourcompany.com"

Note: sendmail service should be configured on your server to allow emails to be sent out from the machine.

Step 3:
By Oracle user:
In the crontab, schedule the script to run at least every 5 minutes:
# crontab -e
#Add this line:
*/5 * * * * /home/oracle/dbalarm.sh
Note: /home/oracle/dbalarm.sh is the full path that points to dbalarm script where /home/oracle is the Oracle user home directory.

In case you will schedule this script to run from root user crontab:
# crontab -e
#Add this line to schedule the run of dbalarm.sh script every 5minutes:
*/5 * * * * su - oracle -c /home/oracle/dbalarm.sh
Now the only thing remaining is to set back and relax and the script will report you all errors and all breached predefined threshold.

One thing more, you can adjust the threshold inside the script as per your preferences by altering the below red colored values in the THRESHOLDS section inside the script:

# #########################
# THRESHOLDS:
# #########################
# Modify the THRESHOLDS to the value you prefer:

FSTHRESHOLD=95 # THRESHOLD FOR FILESYSTEM %USED [OS]
CPUTHRESHOLD=95 # THRESHOLD FOR CPU %UTILIZATION [OS]
TBSTHRESHOLD=95 # THRESHOLD FOR TABLESPACE %USED [DB]
FRATHRESHOLD=95 # THRESHOLD FOR FRA %USED [DB]
ASMTHRESHOLD=95 # THRESHOLD FOR ASM DISK GROUPS [DB]
BLOCKTHRESHOLD=1         # THRESHOLD FOR BLOCKED SESSIONS#[DB]
CHKLISTENER=Y           # Enable/Disable Checking Listeners: [Default Enabled]
CHKOFFLINEDB=Y          # Enable/Disable Database Down Alert: [Default Enabled]
CHKGOLDENGATE=N         # Enable/Disable Goldengate Alert: [Default Disabled]
CPUDIGMORE=Y     # Break down to DB Active sessions when CPU hit the threshold: [RECOMMENDED TO SET =N on VERY BUSY environments]
SERVICEMON=""       # Monitor Specific Named Services. e.g. SERVICEMON="'ORCL_RO','ERP_SRVC','SAP_SERVICE'"
PARANOIDMODE=N    # Paranoid mode will report more events like export/import, instance shutdown/startup. [Default Disabled]

You can control script features like checking listener/offline databases/goldengate alert/show DB active sessions when CPU is high/monitor specific services by using Y for enabling the feature or N to disable it.
You can enable the script to do more checks like (reporting instance startup/shutdown, export/import operations, alter system/database commands and other major DB activities) when activating the Paranoid mode by setting its threshold PARANOIDMODE=Y

Also the script gives you the option to exclude specific database, tablespace, ASM Diskgroup, filesystm from having the script to run against:
You can do so by editing the following parameters:

# #######################################
# Excluded INSTANCES:
# #######################################
# Here you can mention the instances dbalarm will IGNORE and will NOT run against:
# Use pipe "|" as a separator between each instance name.
# e.g. Excluding: -MGMTDB, ASM instances:

EXL_DB="\-MGMTDB|ASM"                                                 #Excluded INSTANCES [Will not get reported offline].

# #########################
# Excluded TABLESPACES:
# #########################
# Here you can exclude one or more tablespace if you don't want to be alerted when they hit the threshold:
# e.g. to exclude "UNDOTBS1" modify the following variable in this fashion without removing "donotremove" value:
# EXL_TBS="donotremove|UNDOTBS1"
EXL_TBS="donotremove"

# #########################
# Excluded ASM Diskgroups:
# #########################
# Here you can exclude one or more ASM Disk Groups if you don't want to be alerted when they hit the threshold:
# e.g. to exclude "FRA" DISKGROUP modify the following variable in this fashion without removing "donotremove" value:
# EXL_DISK_GROUP="donotremove|FRA"
EXL_DISK_GROUP="donotremove"

# ################################
# Excluded FILESYSTEM/MOUNT POINTS:
# ################################
# Here you can exclude specific filesystems/mount points from being reported by dbalarm:
# e.g. Excluding: /dev/mapper, /dev/asm mount points:

EXL_FS="\/dev\/mapper\/|\/dev\/asm\/"                          #Excluded mount points [Will be skipped during the check].


Moreover, you can exclude specific ORA- TNS- errors in case you are OK with ignoring them, I've already excluded some of minor frequent errors to not get bothered about them:

# #########################
# Excluded ERRORS:
# #########################
# Here you can exclude the errors that you don't want to be alerted when they appear in the logs:
# Use pipe "|" between each error.

EXL_ALERT_ERR="ORA-2396|TNS-00507|TNS-12502|TNS-12560|TNS-12537|TNS-00505"              #Excluded ALERTLOG ERRORS [Will not get reported].
EXL_LSNR_ERR="TNS-00507|TNS-12502|TNS-12560|TNS-12537|TNS-00505"                        #Excluded LISTENER ERRORS [Will not get reported].
EXL_GG_ERR="donotremove"             #Excluded GoldenGate ERRORS [Will not get reported].


This script has been tested on Linux and SUN environments.

DISCLAIMER: THIS SCRIPT IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL, BUT WITHOUT ANY WARRANTY. IT IS PROVIDED "AS IS".

If you're looking for a script to check the database health check on a daily basis, please follow this link: 
http://dba-tips.blogspot.ae/2015/05/oracle-database-health-check-script.html

More and more of smart and "easy to use" scripts for database administration tasks can be found in the DBA Bundle:
http://dba-tips.blogspot.ae/2014/02/oracle-database-administration-scripts.html

In case the download link is not working, please find the full code:



4 comments:

hari said...

Hi GIThub,
I think this script DBAALARM is for Linux only.Is it compatible with AIX also?
Hari

Balajee said...

Excellent work. This is working fine

Thanks,
Balajee

Anonymous said...

Hi, I'm using your script. But ORA errors that get through the mail can not read.For example: ORA-01110: ТЮИК ДЮММШУ 228.

Mahmmoud ADEL said...

Hi Hari,
Actually I didn't test this script on an AIX environment, so I cannot give you an answer, but I've a doubt that all the functions inside the script will work there.