Mysterious load average alerts on Nagios xi 2011R3.2

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Mysterious load average alerts on Nagios xi 2011R3.2

Post by nagiosadmin42 »

We're running Nagios xi 2011R3.2 and are receiving intermittent load average alerts like the following:

Code: Select all

***** Nagios *****

Notification Type: PROBLEM

Service: Current Load
Host: localhost
Address: 127.0.0.1
State: CRITICAL

Date/Time: Tue Jul 3 17:04:18 PDT 2012

Additional Info:

CRITICAL - load average: 25.89, 25.82, 26.40
The mystery is what causes the alert, because when I receive the notification I immediately check the System Status page and it shows things are just fine:

Code: Select all

Load
1-min	0.16	
5-min	0.13	
15-min	0.09	


The service definition "Current Load" uses the command:

Code: Select all

$USER1$/check_load -w $ARG1$ -c $ARG2$ 
with
$ARG1$ = 5.0,4.0,3.0
$ARG2$ = 10.0,6.0,4.0
I've scanned /usr/local/nagios/var/nagios.log, and don't find any entries with high load average values.

Any ideas on where I should look for the cause of these alerts?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Mysterious load average alerts on Nagios xi 2011R3.2

Post by scottwilkerson »

By any chance do you have more than 1 Nagios server floating around (backup or development?)? Does the alert indicate what the server name for this Nagios server?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Mysterious load average alerts on Nagios xi 2011R3.2

Post by nagiosadmin42 »

Unfortunately, as shown in my original post, the alert says "localhost" with IP "127.0.0.1" so it's very hard to know exactly which system is sending it.

That was a good idea about checking for multiple Nagios servers, however we have only one production Nagios xi server. We were initially using the virtual machine image to try out Nagios xi, and that instance is shut down.
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Mysterious load average alerts on Nagios xi 2011R3.2

Post by nagiosadmin42 »

Ok, this is embarrassing... good catch on that multiple Nagios servers question. There WAS another Nagios Core dev system installed long ago, and we forgot all about it when we began investigating Nagios xi. We just logged onto that server and it is experiencing high load volumes due to some hadoop testing going on there. THANK YOU!
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Mysterious load average alerts on Nagios xi 2011R3.2

Post by nagiosadmin42 »

And, the "***** Nagios *****" header should have been the clue to all this... our production server alerts say "***** Nagios xi Alert *****".

d'oh!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Mysterious load average alerts on Nagios xi 2011R3.2

Post by scottwilkerson »

It's bound to happen sooner or later if you have several Nagios installs...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart