Tuesday, August 3, 2010

Nagios - Customize Nagios Email Notifications

I finally got tired of getting Nagios notifications and not having all the info I wanted in them. Specifically, I was guessing if the problem had been acknowledged based on the notification interval (how long since I received the last notification), or logging in to Nagios to check.
I decided to customize the Nagios email notifications by editing the # 'host-notify-by-email' command definition, which is the notification type that I've been using.

Before, the # 'host-notify-by-email' command definition looked like this:

command[host-notify-by-email]=/bin/echo -e "***** Nagios *****\n\nHost "$HOSTALIAS$" is $HOSTSTATE$!\n\n$HOSTADDRESS$\n\nDate/Time: $DATETIME$\n" | /bin/mail -s 'Host $HOSTNAME$ is $HOSTSTATE$!' $CONTACTEMAIL$

which produced a notification that simply included the hostname, the state, and the date/time. I hopped on over to the Nagios Macros page at Sourceforge and grabbed what I wanted, then edited the # 'host-notify-by-email' command definition as follows:

# 'host-notify-by-email' command definition
define command{

     command_name host-notify-by-email
     command_line /bin/echo -e "***** Nagios *****\n\nThis is a $NOTIFICATIONTYPE$ notice that "$HOSTALIAS$" is $HOSTSTATE$!\n\nHost IP is $HOSTADDRESS$\n\nDuration is $HOSTDURATION$\n\nDate/Time: $LONGDATETIME$\n\nhttp://nagios.mydomain.com/nagios/\n\n\nNagios Summary\nTotal Unhandled Host Problems:$TOTALHOSTPROBLEMSUNHANDLED$\nTotal Unhandled Service Problems:$TOTALSERVICEPROBLEMSUNHANDLED$" | /bin/mail -s 'Host $HOSTNAME$ is $HOSTSTATE$!' $CONTACTEMAIL$
     }


Now my Nagios email notifications look like this:

***** Nagios *****

This is a PROBLEM notice that DevServer is DOWN!

Host IP is 192.168.1.100

Duration is 0d 0h 0m 0s

Date/Time: Tue Aug 3 13:19:25 PDT 2010

http://nagios.mydomain.com/nagios/

Nagios Summary
Total Unhandled Host Problems:1
Total Unhandled Service Problems:0



The "This is a PROBLEM notice" part is created by the $NOTIFICATIONTYPE$ macro. This is nice because it tells you what kind of notification it is ("PROBLEM", "RECOVERY", "ACKNOWLEDGEMENT", "FLAPPINGSTART", "FLAPPINGSTOP", "FLAPPINGDISABLED", "DOWNTIMESTART", "DOWNTIMEEND", or "DOWNTIMECANCELLED"), which is much more useful than just getting another DOWN notice when a problem is acknowledged.

4 comments:

  1. Great idea for a post, as the canned alerts from Nagios aren't always the best. Also, i like the summary bit at the bottom.

    For Host notifications I also include the $HOSTOUTPUT$ and $SHORTDATETIME$ macros. For example:

    command_line /usr/bin/printf "%b" "Notify Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\nDate: $SHORTDATETIME$

    For Service notifications, I appreciate the $SERVICEOUPUT$ macro which spits out the results of the check. For example:

    command_line /usr/bin/printf "%b" "Alert#$SERVICENOTIFICATIONNUMBER$\n$NOTIFICATIONCOMMENT$\n$SERVICEOUTPUT$\n$SHORTDATETIME$\ncmd:service-notify-by-epager" | /bin/mail -s "$NOTIFICATIONTYPE$ Service $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ for $SERVICEDURATION$" $CONTACTPAGER$

    ReplyDelete
  2. Thanks Chris, I love the $HOSTOUTPUT$ macro! I updated as follows:
    command_line /bin/echo -e "***** Nagios *****\n\nThis is a $NOTIFICATIONTYPE$ notice that "$HOSTALIAS$" is $HOSTSTATE$!\n\nHost IP is $HOSTADDRESS$\n\nPerfomance Data: $HOSTOUTPUT$\n\nDuration is $HOSTDURATION$\n\nDate/Time: $LONGDATETIME$\n\nhttps://nagios.mydomain.com/nagios/\n\n\n\nNagios Summary\nTotal Unhandled Host Problems:$TOTALHOSTPROBLEMSUNHANDLED$\nTotal Unhandled Service Problems:$TOTALSERVICEPROBLEMSUNHANDLED$" | /bin/mail -s 'Host $HOSTNAME$ is $HOSTSTATE$!' $CONTACTEMAIL$
    }

    And now my notifications look like:

    ***** Nagios *****

    This is a RECOVERY notice that DevServer is UP!

    Host IP is 192.168.1.100

    Perfomance Data: PING OK - Packet loss = 0%, RTA = 0.63 ms

    Duration is 0d 18h 18m 23s

    Date/Time: Thu Aug 5 11:07:45 PDT 2010

    https://nagios.mydomain.com/nagios/



    Nagios Summary
    Total Unhandled Host Problems:0
    Total Unhandled Service Problems:0

    ReplyDelete
  3. I added $HOSTACKAUTHORNAME$ and $HOSTACKCOMMENT$ so that ACKNOWLEDMENT type notices show the person that ack'd a problem and the comment they left.

    ReplyDelete
  4. I think you can also use $NOTIFICATIONCOMMENT$ and $NOTIFICATIONAUTHORNAME$ and that will extend to scheduled outages, service notifications etc.

    ReplyDelete

 
Contact our honeypot department if you are desperate to get blacklisted.