Problems with Host Performance graphs and Bandwidth

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
David.adder
Posts: 74
Joined: Thu Jan 17, 2013 8:44 am
Location: Spain

Problems with Host Performance graphs and Bandwidth

Post by David.adder »

Hi,

We have been always with problems in the pictures of Host Performance graphs. Sometimes, there are some blanks. This happen in the host but not in the ping service graphs.

Also we have monitored many routers, firewalls, and in the Bandwidth graphs there are many information missing. It doesn't draw the pictures corretly. Many blanks here too, the graphs is never continuos, maybe 5 minutes, but then you have 15-20 minutes with no graphs.
Captura.JPG
I've been investigating to improve the performance in our NagiosXI server with some manuals of Nagios XI as "Using_rrdcached_with_Nagios_XI" "Maximizing_XI_Performance" but we get still the same.

Has anybody get any similar problems, and have knowledge of how to fixed this?

Thank you!
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Problems with Host Performance graphs and Bandwidth

Post by abrist »

What version of XI are you running?

Could you post a tail of the following logs?

Code: Select all

tail /usr/local/nagios/var/perfdata.log
tail /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
David.adder
Posts: 74
Joined: Thu Jan 17, 2013 8:44 am
Location: Spain

Re: Problems with Host Performance graphs and Bandwidth

Post by David.adder »

The version is 2011R3.3

The tail of that commands is this:
Captura1.JPG
Captura.JPG
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Problems with Host Performance graphs and Bandwidth

Post by abrist »

It looks like you are hitting the max load and timeout settings for performance data processing:

In the file: /usr/local/nagios/etc/pnp/process_perfdata.pl
Change:

Code: Select all

TIMEOUT = 5
to:

Code: Select all

TIMEOUT = 10
In the file: /usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

load_threshold = 10.0
to:

Code: Select all

load_threshold = 30.0
Restart npcd:

Code: Select all

service npcd restart
Wait 15 minutes after the changes and then recheck the logs and verify if perdata is recorded as expected.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
David.adder
Posts: 74
Joined: Thu Jan 17, 2013 8:44 am
Location: Spain

Re: Problems with Host Performance graphs and Bandwidth

Post by David.adder »

I did that changes and restarted npcd:

npcd_max_threads = 5

# sleep_time - how many seconds should npcd wait between dirscans
#
# sleep_time = 15 (default)

sleep_time = 15


# EXPERIMENTAL
#
# use_load_threshold - enables/disables load watching
#
# use_load_threshold = <0 / 1> (default: 0)
#

#use_load_threshold = 0


# EXPERIMENTAL
#
# load_threshold - npcd won't start new threads
# if your system load is over this threshold
#
# load_threshold = <float value> (default: 10.0)
#
# Hint: Do not use "," as decimal delimeter
#

load_threshold = 30.0

#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 10
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /usr/bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60

But I still get this:

tail /usr/local/nagios/var/perfdata.log
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701-PID-5341 deleted
2013-01-21 19:42:19 [5341] [0] *** Timeout while processing Host: "NY-ARIES" Service: "my_mem_check"
2013-01-21 19:42:19 [5341] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715-PID-5344 deleted
2013-01-21 19:42:19 [5344] [0] *** Timeout while processing Host: "NYC-RLDCEX" Service: "my_mem_check"
2013-01-21 19:42:19 [5344] [0] *** process_perfdata.pl terminated on signal ALRM

tail /usr/local/nagios/var/npcd.log
[01-21-2013 19:40:18] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:18] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1358793595'
[01-21-2013 19:40:46] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793625'
[01-21-2013 19:41:52] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:41:52] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793687'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715'

I've just upgraded Nagios to 2012R1.4 and since I've done that, now I don't get any graph regarding bandwidth...
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Problems with Host Performance graphs and Bandwidth

Post by scottwilkerson »

Can you output the results of

Code: Select all

ll /usr/local/nagios/var/spool/perfdata|wc -l
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
David.adder
Posts: 74
Joined: Thu Jan 17, 2013 8:44 am
Location: Spain

Re: Problems with Host Performance graphs and Bandwidth

Post by David.adder »

ll /usr/local/nagios/var/spool/perfdata|wc -l
3
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Problems with Host Performance graphs and Bandwidth

Post by scottwilkerson »

Are you still seeing new errors in the log?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
David.adder
Posts: 74
Joined: Thu Jan 17, 2013 8:44 am
Location: Spain

Re: Problems with Host Performance graphs and Bandwidth

Post by David.adder »

Yes, still the same:

tail /usr/local/nagios/var/perfdata.log
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633-PID-28824 deleted
2013-01-22 15:40:59 [28824] [0] *** Timeout while processing Host: "SRVADDASDC01001" Service: "VMware_Storage_Array_Datastore_DSPISDC01006PRODSAS72_Usage"
2013-01-22 15:40:59 [28824] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503-PID-466 deleted
2013-01-22 15:55:35 [466] [0] *** Timeout while processing Host: "www.ryanlabs.com" Service: "DNS_IP_Match"
2013-01-22 15:55:35 [466] [0] *** process_perfdata.pl terminated on signal ALRM

tail /usr/local/nagios/var/npcd.log
[01-22-2013 15:30:22] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:30:22] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865003'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865378'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865363'
[01-22-2013 15:40:59] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:40:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633'
[01-22-2013 15:55:35] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:55:35] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503'
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Problems with Host Performance graphs and Bandwidth

Post by mguthrie »

Just to make sure we rule it out, can you also run these commands and post the output:

Code: Select all

ll /usr/local/nagios/var/spool/xdpe|wc -l

Code: Select all

ll /usr/local/nagios/var