Monday, May 14, 2018

Linux Shell Script To Monitor and Report GoldenGate Lag

photographer's Twenty20 Philipbrunner
This script can monitor GoldenGate lag whenever it happens based on the per-defined LAG threshold inside the script.
It's highly recommended to deploy this script on all (source & destination) replication servers in order to detect the lag on all processes (Extract, Pump, and Replicate).

This script is not designed to monitor the replicated data inside the tables it totally relies on the native GoldenGate GGSCI console.

This script should be executed/scheduled by the GoldenGate installation owner OS user.                                               

How it works:

First, Download the script:
https://www.dropbox.com/s/l4dqzicviuaawt6/goldengate_lag_mon.sh?dl=0

Second, Adjust the following parameters:

MAIL_LIST="youremail@yourcompany.com"
Replace "youremail@yourcompany.com" pattern with your e-mail.


# ###########################################
# Mandatory Parameters To Be Set By The User:
# ###########################################
ORACLE_HOME= # ORACLE_HOME path of the database where GoldenGate is running against.
GG_HOME=           # GoldenGate Installation Home path. e.g. GG_HOME=/goldengate/gghome

Please note that ORACLE_HOME & GG_HOME are mandatory to be adjusted by YOU, in case you missed setting them up, the script will automatically try to guess the right values, but this will not be accurate most of the times.


# ################
# Script Settings:
# ################
# LAG THRESHOLD in minutes: [If reached an e-mail alert will be sent. Default 10 minutes]
LAG_IN_MINUTES=10

Here you define the LAG threshold in minutes (it's 10 minutes by default). Whereas if the lag reached 10 minutes it will send you an email.


# Excluded Specific PROCESSES NAME:
# e.g. If you want to exclude two replicate processes with names REP_11 and REP_12 from being reported then add them to below parameter as shown:
# EXL_PROC_NAME="DONOTREMOVE|REP_11|REP_12"
EXL_PROC_NAME="DONOTREMOVE"

In case you want to exclude specific (Extract, Pump, or Replicat) processes, let's say you want to exclude a process you use it for testing the replication, you can add it to the above parameter as shown in the blue color example.

DISCLAIMER: THIS SCRIPT IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL BUT WITHOUT ANY WARRANTY. IT IS PROVIDED "AS IS".

GitHub version:

8 comments:

  1. I have two Goldengate installations on same server. Does it work for both Goldengate installations i.e, two different Goldengate HOME

    ReplyDelete
  2. Looks I missed out your reply, Apologize for that.
    Actually, I don't have a test environment with the same scenario you have, but as an easy approach, I recommend to schedule two versions of the script in the crontab, each script will point to a different GoldenGate Home installation.

    ReplyDelete
  3. Assalam valaikum Abdel I tried your script but not working

    ReplyDelete
    Replies
    1. Would you mind posting the error/problem you are receiving.

      Delete
  4. Script is running fine on Linux but not on Solaris. Pls advise. On Solaris I not receiving email when replicat is shtopped.

    ReplyDelete
  5. Actually the script is designed for Linux, I never tested it on the other platforms.

    ReplyDelete
  6. Hi Mohammoud , Am not good in scripting . I tried your script it is working in linux but the problem is am 3 mails in the same time and exclude parameter is not working . By default it is sending details of all the process . I wanted remove some process names and lag should be greater than 30 mints . Can check what iam missing in this script . Sugesstion : If you can add header for the process and status would be good to understand
    =======

    Script i modified using ur script :
    ===================================
    [oracle] taxqn1pporadb08:cat gglag.sh
    #!/bin/bash
    set -x
    MAIL_LIST="svedachalamsundaram@corelogic.com"
    SERVER_NAME=`uname -n`
    export SERVER_NAME

    # ###########################################
    # Mandatory Parameters To Be Set By The User:
    # ###########################################
    ORACLE_HOME=/apps/oracle/product/11.2.0.4/db_1 # ORACLE_HOME path of the database where GoldenGate is running against.
    GG_HOME=/ora_backup/ggate/12.2 # GoldenGate Installation Home path. e.g. GG_HOME=/goldengate/gghome


    # ################
    # Script Settings:
    # ################
    # LAG THRESHOLD in minutes: [If reached an e-mail alert will be sent. Default 10 minutes]
    LAG_IN_MINUTES=2

    # Excluded Specific PROCESSES NAME:
    # e.g. If you want to exclude two replicate processes with names REP_11 and REP_12 from being reported then add them to below parameter as shown:
    # EXL_PROC_NAME="DONOTREMOVE|REP_11|REP_12"
    EXL_PROC_NAME="DONOTREMOVE|EPASAUD|RPASAUD1|RPASAUD2|RPASAUD3|RPASAUD4|RPASAUD5|RPASAUD6|RPASAUD7|RPASAUD8|RPASAUD9"
    #EXL_PROC_NAME="DONOTREMOVE|RCLGL|RLASP1|RLASP2"


    # ###############
    # VARIABLES:
    # ###############
    LOG_DIRECTORY=/export/home/oracle/dbascripts/dba # Log Location

    LAG=$LAG_IN_MINUTES
    #LAG=$((LAG_IN_MINUTES * 100))
    export LAG
    export EXL_PROC_NAME=$EXL_PROC_NAME
    export LD_LIBRARY_PATH=${ORACLE_HOME}/lib
    echo LD_LIBRARY_PATH is: $LD_LIBRARY_PATH

    # ################################################
    # Checking the LAG status from Goldengate Console:
    # ################################################
    for GREP_SERVICE in EXTRACT REPLICAT
    do
    export GREP_SERVICE

    export LOG_DIR=${LOG_DIRECTORY}
    export LOG_FILE=${LOG_DIR}/${GREP_SERVICE}_lag_mon.log

    # Identify lagging operation name:
    case ${GREP_SERVICE} in
    "REPLICAT") LAST_COL_OPNAME="RECEIVING"
    export LAST_COL_OPNAME
    BFR_LAST_COL_OPNAME="APPLYING"
    export BFR_LAST_COL_OPNAME
    ;;
    "EXTRACT") LAST_COL_OPNAME="SENDING"

    export LAST_COL_OPNAME
    BFR_LAST_COL_OPNAME="EXTRACTING"
    export BFR_LAST_COL_OPNAME
    ;;
    esac


    $GG_HOME/ggsci << EOF |grep "${GREP_SERVICE}" > ${LOG_FILE}
    info all
    exit
    EOF

    # ################################
    # Email Notification if LAG Found:
    # ################################

    for i in `cat ${LOG_FILE}|egrep -v ${EXL_PROC_NAME}|awk '{print $NF}'|sed -e 's/://g'`
    do
    if [ $i -ge ${LAG} ]
    then
    mail -s "Goldengate LAG detected in ${LAST_COL_OPNAME} TRAIL FILES on Server [${SERVER_NAME}]" ${MAIL_LIST} < ${LOG_FILE}
    #echo "Goldengate LAG detected in ${LAST_COL_OPNAME} TRAIL FILES on Server [${SERVER_NAME}]"
    fi
    done

    done

    # #############
    # END OF SCRIPT
    ###############




    ReplyDelete
  7. For a lag greater than 30 min set this parameter:

    LAG_IN_MINUTES=31

    for excluding processes from being reported I can see you are doing it right, but remember they will not trigger the alarm if the are lagged of OFF, but they will be still seen in the Email body:

    EXL_PROC_NAME="DONOTREMOVE|EPASAUD|RPASAUD1|RPASAUD2|RPASAUD3|RPASAUD4|RPASAUD5|RPASAUD6|RPASAUD7|RPASAUD8|RPASAUD9"

    ReplyDelete