Problems with Host Performance graphs and Bandwidth
-
- Posts: 74
- Joined: Thu Jan 17, 2013 8:44 am
- Location: Spain
Problems with Host Performance graphs and Bandwidth
Hi,
We have been always with problems in the pictures of Host Performance graphs. Sometimes, there are some blanks. This happen in the host but not in the ping service graphs.
Also we have monitored many routers, firewalls, and in the Bandwidth graphs there are many information missing. It doesn't draw the pictures corretly. Many blanks here too, the graphs is never continuos, maybe 5 minutes, but then you have 15-20 minutes with no graphs.
I've been investigating to improve the performance in our NagiosXI server with some manuals of Nagios XI as "Using_rrdcached_with_Nagios_XI" "Maximizing_XI_Performance" but we get still the same.
Has anybody get any similar problems, and have knowledge of how to fixed this?
Thank you!
We have been always with problems in the pictures of Host Performance graphs. Sometimes, there are some blanks. This happen in the host but not in the ping service graphs.
Also we have monitored many routers, firewalls, and in the Bandwidth graphs there are many information missing. It doesn't draw the pictures corretly. Many blanks here too, the graphs is never continuos, maybe 5 minutes, but then you have 15-20 minutes with no graphs.
I've been investigating to improve the performance in our NagiosXI server with some manuals of Nagios XI as "Using_rrdcached_with_Nagios_XI" "Maximizing_XI_Performance" but we get still the same.
Has anybody get any similar problems, and have knowledge of how to fixed this?
Thank you!
You do not have the required permissions to view the files attached to this post.
-
- Red Shirt
- Posts: 8334
- Joined: Thu Nov 15, 2012 1:20 pm
Re: Problems with Host Performance graphs and Bandwidth
What version of XI are you running?
Could you post a tail of the following logs?
Could you post a tail of the following logs?
Code: Select all
tail /usr/local/nagios/var/perfdata.log
tail /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 74
- Joined: Thu Jan 17, 2013 8:44 am
- Location: Spain
Re: Problems with Host Performance graphs and Bandwidth
The version is 2011R3.3
The tail of that commands is this:
The tail of that commands is this:
You do not have the required permissions to view the files attached to this post.
-
- Red Shirt
- Posts: 8334
- Joined: Thu Nov 15, 2012 1:20 pm
Re: Problems with Host Performance graphs and Bandwidth
It looks like you are hitting the max load and timeout settings for performance data processing:
In the file: /usr/local/nagios/etc/pnp/process_perfdata.pl
Change:
to:
In the file: /usr/local/nagios/etc/pnp/npcd.cfg
Change:
to:
Restart npcd:
Wait 15 minutes after the changes and then recheck the logs and verify if perdata is recorded as expected.
In the file: /usr/local/nagios/etc/pnp/process_perfdata.pl
Change:
Code: Select all
TIMEOUT = 5
Code: Select all
TIMEOUT = 10
Change:
Code: Select all
load_threshold = 10.0
Code: Select all
load_threshold = 30.0
Code: Select all
service npcd restart
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 74
- Joined: Thu Jan 17, 2013 8:44 am
- Location: Spain
Re: Problems with Host Performance graphs and Bandwidth
I did that changes and restarted npcd:
npcd_max_threads = 5
# sleep_time - how many seconds should npcd wait between dirscans
#
# sleep_time = 15 (default)
sleep_time = 15
# EXPERIMENTAL
#
# use_load_threshold - enables/disables load watching
#
# use_load_threshold = <0 / 1> (default: 0)
#
#use_load_threshold = 0
# EXPERIMENTAL
#
# load_threshold - npcd won't start new threads
# if your system load is over this threshold
#
# load_threshold = <float value> (default: 10.0)
#
# Hint: Do not use "," as decimal delimeter
#
load_threshold = 30.0
#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 10
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /usr/bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
But I still get this:
tail /usr/local/nagios/var/perfdata.log
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701-PID-5341 deleted
2013-01-21 19:42:19 [5341] [0] *** Timeout while processing Host: "NY-ARIES" Service: "my_mem_check"
2013-01-21 19:42:19 [5341] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715-PID-5344 deleted
2013-01-21 19:42:19 [5344] [0] *** Timeout while processing Host: "NYC-RLDCEX" Service: "my_mem_check"
2013-01-21 19:42:19 [5344] [0] *** process_perfdata.pl terminated on signal ALRM
tail /usr/local/nagios/var/npcd.log
[01-21-2013 19:40:18] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:18] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1358793595'
[01-21-2013 19:40:46] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793625'
[01-21-2013 19:41:52] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:41:52] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793687'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715'
I've just upgraded Nagios to 2012R1.4 and since I've done that, now I don't get any graph regarding bandwidth...
npcd_max_threads = 5
# sleep_time - how many seconds should npcd wait between dirscans
#
# sleep_time = 15 (default)
sleep_time = 15
# EXPERIMENTAL
#
# use_load_threshold - enables/disables load watching
#
# use_load_threshold = <0 / 1> (default: 0)
#
#use_load_threshold = 0
# EXPERIMENTAL
#
# load_threshold - npcd won't start new threads
# if your system load is over this threshold
#
# load_threshold = <float value> (default: 10.0)
#
# Hint: Do not use "," as decimal delimeter
#
load_threshold = 30.0
#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 10
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /usr/bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
But I still get this:
tail /usr/local/nagios/var/perfdata.log
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701-PID-5341 deleted
2013-01-21 19:42:19 [5341] [0] *** Timeout while processing Host: "NY-ARIES" Service: "my_mem_check"
2013-01-21 19:42:19 [5341] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715-PID-5344 deleted
2013-01-21 19:42:19 [5344] [0] *** Timeout while processing Host: "NYC-RLDCEX" Service: "my_mem_check"
2013-01-21 19:42:19 [5344] [0] *** process_perfdata.pl terminated on signal ALRM
tail /usr/local/nagios/var/npcd.log
[01-21-2013 19:40:18] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:18] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1358793595'
[01-21-2013 19:40:46] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793625'
[01-21-2013 19:41:52] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:41:52] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793687'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715'
I've just upgraded Nagios to 2012R1.4 and since I've done that, now I don't get any graph regarding bandwidth...
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Problems with Host Performance graphs and Bandwidth
Can you output the results of
Code: Select all
ll /usr/local/nagios/var/spool/perfdata|wc -l
-
- Posts: 74
- Joined: Thu Jan 17, 2013 8:44 am
- Location: Spain
Re: Problems with Host Performance graphs and Bandwidth
ll /usr/local/nagios/var/spool/perfdata|wc -l
3
3
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Problems with Host Performance graphs and Bandwidth
Are you still seeing new errors in the log?
-
- Posts: 74
- Joined: Thu Jan 17, 2013 8:44 am
- Location: Spain
Re: Problems with Host Performance graphs and Bandwidth
Yes, still the same:
tail /usr/local/nagios/var/perfdata.log
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633-PID-28824 deleted
2013-01-22 15:40:59 [28824] [0] *** Timeout while processing Host: "SRVADDASDC01001" Service: "VMware_Storage_Array_Datastore_DSPISDC01006PRODSAS72_Usage"
2013-01-22 15:40:59 [28824] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503-PID-466 deleted
2013-01-22 15:55:35 [466] [0] *** Timeout while processing Host: "www.ryanlabs.com" Service: "DNS_IP_Match"
2013-01-22 15:55:35 [466] [0] *** process_perfdata.pl terminated on signal ALRM
tail /usr/local/nagios/var/npcd.log
[01-22-2013 15:30:22] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:30:22] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865003'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865378'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865363'
[01-22-2013 15:40:59] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:40:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633'
[01-22-2013 15:55:35] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:55:35] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503'
tail /usr/local/nagios/var/perfdata.log
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633-PID-28824 deleted
2013-01-22 15:40:59 [28824] [0] *** Timeout while processing Host: "SRVADDASDC01001" Service: "VMware_Storage_Array_Datastore_DSPISDC01006PRODSAS72_Usage"
2013-01-22 15:40:59 [28824] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503-PID-466 deleted
2013-01-22 15:55:35 [466] [0] *** Timeout while processing Host: "www.ryanlabs.com" Service: "DNS_IP_Match"
2013-01-22 15:55:35 [466] [0] *** process_perfdata.pl terminated on signal ALRM
tail /usr/local/nagios/var/npcd.log
[01-22-2013 15:30:22] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:30:22] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865003'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865378'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865363'
[01-22-2013 15:40:59] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:40:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633'
[01-22-2013 15:55:35] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:55:35] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503'
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: Problems with Host Performance graphs and Bandwidth
Just to make sure we rule it out, can you also run these commands and post the output:
Code: Select all
ll /usr/local/nagios/var/spool/xdpe|wc -l
Code: Select all
ll /usr/local/nagios/var