Changed the nagios.cfg as requested and it seemed to have reduced the load by half. Load metrics now between 20 and 25.
We are running it with 86 hosts and 1588 services, the vast majority were configured by the wizard, which I believe are active monitoring.
Top 3 processes are mysqld (user mysqld), nagios (user nagios), and check_esx3.pl (user nagios).
Performance graphs stopped updating
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: Performance graphs stopped updating
Hmm, something is not right with that plugin if it's consistently in the top 3, because it shouldn't be running for more than a couple of seconds. If it's hung, or in an endless loop that could eat a lot of system resources. You don't have that many checks running where it should be eating that much processor power. Try:
and kill that check plugin as well. Nagios should call it up again when the check needs to run. Keep an eye on it though. If the graphs still aren't showing up, try restarting the server.
Code: Select all
killall nagios
service nagios start
-
- Posts: 93
- Joined: Mon Feb 01, 2010 12:09 pm
Re: Performance graphs stopped updating
graphs are updating, but not keeping up to date. It appears that the information in the graphs tends to be 1-3 hours behind current time. I have attempted the killall command and it helps, but never fully catches up. This am the monitoring engine needed to be restarted, showing that it had stopped. This spiked the load matrix up to 53, 34, and 25 for 1,5,15 respectively.
Any other thoughts?
Current top process is consistently mysqld, regularly in the top 2.
Any other thoughts?
Current top process is consistently mysqld, regularly in the top 2.
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: Performance graphs stopped updating
Can you copy and paste the output from running top. I'd like to see the full output from it to get a better sense of where things are at.
Also, can you copy and paste the output from running: /usr/local/nagios/bin/nagiostats
Also, can you copy and paste the output from running: /usr/local/nagios/bin/nagiostats
-
- Posts: 93
- Joined: Mon Feb 01, 2010 12:09 pm
Re: Performance graphs stopped updating
From top:
from nagiostats:
Code: Select all
top - 12:01:55 up 1 day, 22:32, 1 user, load average: 42.93, 39.12, 39.31
Tasks: 263 total, 16 running, 239 sleeping, 0 stopped, 8 zombie
Cpu(s): 10.7%us, 88.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 3107560k total, 2295504k used, 812056k free, 283800k buffers
Swap: 1048568k total, 0k used, 1048568k free, 1297388k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13982 nagios 24 0 0 0 0 R 11.4 0.0 0:00.40 check_ifopersta
14010 nagios 25 0 27100 9072 4880 R 11.4 0.3 0:00.40 php
14056 nagios 25 0 11380 7364 1836 R 10.8 0.2 0:00.38 check_ifopersta
26999 nagios 25 0 15776 3820 880 R 10.5 0.1 4:59.05 nagios
3123 mysql 15 0 154m 34m 4924 S 8.0 1.1 291:15.60 mysqld
14235 nagios 25 0 6872 2792 1608 R 3.7 0.1 0:00.13 check_ifopersta
13918 nagios 18 0 4572 1100 948 S 2.0 0.0 0:00.07 check_rrdtraf
13964 nagios 18 0 4572 1104 948 S 2.0 0.0 0:00.07 check_rrdtraf
10427 nagios 18 0 29536 14m 5888 S 1.4 0.5 0:01.17 php
13827 nagios 18 0 4572 1104 948 S 1.4 0.0 0:00.07 check_rrdtraf
13925 nagios 18 0 4572 1104 948 S 1.4 0.0 0:00.06 check_rrdtraf
632 root 11 -5 0 0 0 S 1.1 0.0 18:38.41 kjournald
6781 nagios 15 0 15928 4080 1112 S 1.1 0.1 81:31.55 nagios
14277 nagios 25 0 4 4 0 R 1.1 0.0 0:00.04 sh
14292 nagios 25 0 4012 576 492 R 1.1 0.0 0:00.04 check_icmp
14339 nagios 25 0 4544 468 392 R 1.1 0.0 0:00.04 check_rrdtraf
25384 root 15 0 0 0 0 S 1.1 0.0 0:45.02 pdflush
Code: Select all
Nagios Stats 3.2.3
Copyright (c) 2003-2008 Ethan Galstad (http://www.nagios.org)
Last Modified: 10-03-2010
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 0s
Status File Version: 3.2.3
Program Running Time: 0d 2h 2m 31s
Nagios PID: 26999
Used/High/Total Command Buffers: 0 / 0 / 4096
Total Services: 1591
Services Checked: 1591
Services Scheduled: 1591
Services Actively Checked: 1591
Services Passively Checked: 0
Total Service State Change: 0.000 / 19.470 / 0.035 %
Active Service Latency: 0.007 / 1202.996 / 28.594 sec
Active Service Execution Time: 0.073 / 27.438 / 4.825 sec
Active Service State Change: 0.000 / 19.470 / 0.035 %
Active Services Last 1/5/15/60 min: 88 / 351 / 834 / 1522
Passive Service Latency: 0.000 / 0.000 / 0.000 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 1554 / 3 / 0 / 34
Services Flapping: 2
Services In Downtime: 0
Total Hosts: 87
Hosts Checked: 87
Hosts Scheduled: 87
Hosts Actively Checked: 87
Host Passively Checked: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 162.067 / 13.217 sec
Active Host Execution Time: 0.106 / 8.173 / 1.096 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 0 / 8 / 48 / 85
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 87 / 0 / 0
Hosts Flapping: 0
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 3 / 36 / 132
Scheduled: 0 / 27 / 106
On-demand: 3 / 9 / 26
Parallel: 0 / 27 / 106
Serial: 0 / 0 / 0
Cached: 3 / 9 / 26
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 193 / 759 / 2280
Scheduled: 193 / 759 / 2280
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
External Commands Last 1/5/15 min: 0 / 0 / 0
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: Performance graphs stopped updating
The the performance graphs is almost certainly coming from the cpu load issues. The real question will be determining what is causing that level of performance drain on your system.
Can you give us your system specs:
Hardware setup? (CPU Cores, RAM, Hard Disk setup)
Linux Distro? 32 or 64bit?
VM or Physical Box?
Can you give us your system specs:
Hardware setup? (CPU Cores, RAM, Hard Disk setup)
Linux Distro? 32 or 64bit?
VM or Physical Box?
-
- Posts: 93
- Joined: Mon Feb 01, 2010 12:09 pm
Re: Performance graphs stopped updating
Hardware setup - VM downloaded originally from Nagios. I believe it is a 32-bit setup running CentOS.
Running on an ESX server, 4 CPU's assigned to the VM with 4 GB RAM. Hard disk 16GB in size.
Running on an ESX server, 4 CPU's assigned to the VM with 4 GB RAM. Hard disk 16GB in size.
-
- Posts: 1128
- Joined: Wed Mar 03, 2010 12:38 pm
- Location: St. Paul, MN, USA
Re: Performance graphs stopped updating
He already gave us some of those Mike. 
Hardware setup? (CPU Cores, RAM, Hard Disk setup)
4 virtual CPU cores, roughly 2GHz each
3GB RAM
1GB swap
Linux Distro? 32 or 64bit?
CentOS 5, 32-bit
VM or Physical Box?
VMware on ESX.
I'd still like to know the type / number / configuration of disks, since that could be a bottleneck.

Hardware setup? (CPU Cores, RAM, Hard Disk setup)
4 virtual CPU cores, roughly 2GHz each
3GB RAM
1GB swap
Linux Distro? 32 or 64bit?
CentOS 5, 32-bit
VM or Physical Box?
VMware on ESX.
I'd still like to know the type / number / configuration of disks, since that could be a bottleneck.
I totally missed this before - how many check_esx3.pl services do you have? I remember in testing that one used WAY more resources than any normal plugin, since it has to load the entire VMware Perl API separately for each and every check, but I don't see it in the sample top output you included.Top 3 processes are mysqld (user mysqld), nagios (user nagios), and check_esx3.pl (user nagios).
-
- Posts: 93
- Joined: Mon Feb 01, 2010 12:09 pm
Re: Performance graphs stopped updating
Do you want the configuration of the physical ESX server (RAID 5 on 7.2k RPM disks) or additional information from the VM?
The check_esx plugin I had reinstalled based of the first conversations, and has not appeared in my glances in the top processes since.
The check_esx plugin I had reinstalled based of the first conversations, and has not appeared in my glances in the top processes since.
-
- Posts: 93
- Joined: Mon Feb 01, 2010 12:09 pm
Re: Performance graphs stopped updating
Any other suggestions? We are still having this issue.