no performance data visible for last couple weeks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

no performance data visible for last couple weeks

Post by fao »

centos 6, 2011R1.7

In my performance graphs, I only see data from several weeks ago. Unlike some others, the graphs do show up. Unfortunately, they are blank

in nagios/share/perfdata/ I see all folders for all my hosts. Within those hosts, I see the xml and rrd files

in /var/lib/mrtg/ I only see mrtg.ok, the perms on that folder are 775 apache:nagios

Here is some strange output that I see in npcd.log

[10-19-2011 14:57:09] NPCD: File 'service-perfdata.1319028965-PID-26441' is an already in process PNP file. Leaving it untouched.
[10-19-2011 14:57:09] NPCD: DEBUG: load 1.170000/10.000000
[10-19-2011 14:57:09] NPCD: ThreadCounter 1/5 File is service-perfdata.1319028980-PID-26756
[10-19-2011 14:57:09] NPCD: File 'service-perfdata.1319028980-PID-26756' is an already in process PNP file. Leaving it untouched.
[10-19-2011 14:57:09] NPCD: DEBUG: load 1.170000/10.000000
[10-19-2011 14:57:09] NPCD: ThreadCounter 1/5 File is service-perfdata.1319028995-PID-27101
[10-19-2011 14:57:09] NPCD: File 'service-perfdata.1319028995-PID-27101' is an already in process PNP file. Leaving it untouched.
[10-19-2011 14:57:09] NPCD: DEBUG: load 1.170000/10.000000
[10-19-2011 14:57:09] NPCD: ThreadCounter 1/5 File is service-perfdata.1319029010-PID-27482
[10-19-2011 14:57:09] NPCD: File 'service-perfdata.1319029010-PID-27482' is an already in process PNP file. Leaving it untouched.
[10-19-2011 14:57:09] NPCD: DEBUG: load 1.170000/10.000000
[10-19-2011 14:57:09] NPCD: ThreadCounter 1/5 File is service-perfdata.1319029025
[10-19-2011 14:57:09] NPCD: Regular File: service-perfdata.1319029025
[10-19-2011 14:57:09] NPCD: A thread was started on thread_counter = 1
[10-19-2011 14:57:09] NPCD: Have to wait: Filecounter = 481 - thread_counter = 2
[10-19-2011 14:57:09] NPCD: Processing file service-perfdata.1319029025 with ID -1226388624 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319029025
[10-19-2011 14:57:09] NPCD: Processing file 'service-perfdata.1319029025'
[10-19-2011 14:57:09] NPCD: ERROR: Executed command exits with return code '6'
[10-19-2011 14:57:09] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1319029025'
[10-19-2011 14:57:09] NPCD: ERROR: Executed command exits with return code '6'
[10-19-2011 14:57:09] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319029025'
[10-19-2011 14:57:09] NPCD: No more files to process... waiting for 15 seconds

any ideas?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: no performance data visible for last couple weeks

Post by mguthrie »

This might be a permissions issue, can you run the following procedure and see if it resolves the issue:

There will be some missing files and error output from this script, which is normal.
http://library.nagios.com/library/produ ... -nagios-xi
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

Re: no performance data visible for last couple weeks

Post by fao »

I ran that script before and it broke everything. Took me three days to fix.

for instance, that script sets numerous directory to use the group "users", a group which has no members. Please keep in mind that I am running the VM that you guys provided, which makes it even stranger.

Please let me know why it sets the group to "users"
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: no performance data visible for last couple weeks

Post by nscott »

fao,

We've had a bit of problem with this, but I do believe I have a quick and comprehensive fix for you. First lets make sure that the problem is what I think it is. In the following example <rrd file> is some rrd file that you know to exist (preferably one from the /var/lib/mrtg/ directory. Type of the following:

Code: Select all

/usr/local/nagios/libexec/check_rrdtraf -f '/var/lib/mrtg/<rrd file>' -w 1 -c 2
That should return a completely clean value. If it says ANYTHING about errors, then that is definitely the issue. Thankfully, the fix is simple:

Code: Select all

yum install bc
Nicholas Scott
Former Nagios employee
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

Re: no performance data visible for last couple weeks

Post by fao »

hey mguthrie, you are correct that the package 'bc' was missing

I installed it 2 hours ago then restarted nagiosxi, nagios, and npcd

Unfortunately, I still see nothing in the /var/lib/mrtg folder

[root@nagios ~]# ls -al /var/lib/mrtg/
total 8
drwxrwxr-x. 2 apache nagios 4096 Aug 29 15:25 .
drwxr-xr-x. 24 root root 4096 Oct 19 10:45 ..
-rw-r--r--. 1 apache nagios 0 Oct 21 17:00 mrtg.ok

nothing there unfortunately

i look at npcd.log and I see a lot of this


[10-21-2011 17:12:54] NPCD: Processing file service-perfdata.1319209972 with ID -1250956432 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319209972
[10-21-2011 17:12:54] NPCD: Processing file 'service-perfdata.1319209972'
[10-21-2011 17:12:54] NPCD: Processing file service-perfdata.1319209957 with ID -1240466576 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319209957
[10-21-2011 17:12:54] NPCD: Processing file 'service-perfdata.1319209957'
[10-21-2011 17:12:54] NPCD: ERROR: Executed command exits with return code '6'
[10-21-2011 17:12:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319209972'
[10-21-2011 17:12:54] NPCD: ERROR: Executed command exits with return code '6'
[10-21-2011 17:12:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319209957'
[10-21-2011 17:12:54] NPCD: No more files to process... waiting for 15 seconds

and in perfdata.log

2011-10-13 15:01:46 [15247] [0] *** ERROR: /usr/local/nagios/var/stats is not writable or does not exist

But it is writable and does exist

[root@nagios2 var]# ls -al stats/
total 12
drwxrwxr-x. 2 apache nagios 4096 Oct 21 17:13 .
drwxrwxr-x. 6 apache nagios 4096 Oct 21 17:14 ..
-rw-rw-rw- 1 nagios nagios 32 Oct 21 17:13 21986833


Any ideas?
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: no performance data visible for last couple weeks

Post by nscott »

Your /var/lib/mrtg/ should be owned and grouped with root.

chown root.root /var/lib/mrtg -R

The ownership on the stats directory should be nagios.nagios,

What are the permissions in your nagios/share/spool/perfdata ?
Nicholas Scott
Former Nagios employee
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

Re: no performance data visible for last couple weeks

Post by fao »

tks

I do not have a share/spool/perfdata directory. Should I?

the permissions on nagios/var/spool/perfdata are apache.nagios

w/in the perfdata/

some file have the permission nagios.users and others nagios.nagios

I still don't understand why the "users" group keeps popping up when no one belongs to that group.
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

Re: no performance data visible for last couple weeks

Post by fao »

here is most recent output of npcd.log

[10-24-2011 10:48:32] NPCD: DEBUG: load 0.560000/10.000000
[10-24-2011 10:48:32] NPCD: ThreadCounter 1/5 File is service-perfdata.1319446041-PID-27709
[10-24-2011 10:48:32] NPCD: File 'service-perfdata.1319446041-PID-27709' is an already in process PNP file. Leaving it untouched.
[10-24-2011 10:48:32] NPCD: DEBUG: load 0.560000/10.000000
[10-24-2011 10:48:32] NPCD: ThreadCounter 1/5 File is service-perfdata.1319446056-PID-27708
[10-24-2011 10:48:32] NPCD: File 'service-perfdata.1319446056-PID-27708' is an already in process PNP file. Leaving it untouched.
[10-24-2011 10:48:32] NPCD: DEBUG: load 0.560000/10.000000
[10-24-2011 10:48:32] NPCD: ThreadCounter 1/5 File is service-perfdata.1319446071-PID-28083
[10-24-2011 10:48:32] NPCD: File 'service-perfdata.1319446071-PID-28083' is an already in process PNP file. Leaving it untouched.
[10-24-2011 10:48:32] NPCD: DEBUG: load 0.560000/10.000000
[10-24-2011 10:48:32] NPCD: ThreadCounter 1/5 File is service-perfdata.1319446086-PID-28380
[10-24-2011 10:48:32] NPCD: File 'service-perfdata.1319446086-PID-28380' is an already in process PNP file. Leaving it untouched.
[10-24-2011 10:48:32] NPCD: DEBUG: load 0.560000/10.000000
[10-24-2011 10:48:32] NPCD: ThreadCounter 1/5 File is service-perfdata.1319446101
[10-24-2011 10:48:32] NPCD: Regular File: service-perfdata.1319446101
[10-24-2011 10:48:32] NPCD: A thread was started on thread_counter = 1
[10-24-2011 10:48:32] NPCD: Have to wait: Filecounter = 49387 - thread_counter = 2
[10-24-2011 10:48:32] NPCD: Processing file service-perfdata.1319446101 with ID -1227990160 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319446101
[10-24-2011 10:48:32] NPCD: Processing file 'service-perfdata.1319446101'
[10-24-2011 10:48:32] NPCD: ERROR: Executed command exits with return code '6'
[10-24-2011 10:48:32] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1319446101'
[10-24-2011 10:48:32] NPCD: No more files to process... waiting for 15 seconds


my perfdata.log is full of timeout errorsq


2011-09-30 17:11:15 [26026] [0] *** TIMEOUT: Timeout after 5 secs. ***
2011-09-30 17:11:15 [26026] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2011-09-30 17:11:15 [26026] [0] *** TIMEOUT: Please check your npcd.cfg
2011-09-30 17:11:15 [26026] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1317395460-PID-26026 deleted
2011-09-30 17:11:15 [26026] [0] *** Timeout while processing Host: "AFIT-INFRA05" Service: "CPU_Usage"
2011-09-30 17:11:15 [26026] [0] *** process_perfdata.pl terminated on signal ALRM
2011-10-05 05:00:43 [25184] [0] *** TIMEOUT: Timeout after 5 secs. ***
2011-10-05 05:00:43 [25184] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2011-10-05 05:00:43 [25184] [0] *** TIMEOUT: Please check your npcd.cfg
2011-10-05 05:00:43 [25184] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1317783623-PID-25184 deleted
2011-10-05 05:00:43 [25184] [0] *** Timeout while processing Host: "HQLQAPIRES1" Service: "__Disk_Usage"
2011-10-05 05:00:43 [25184] [0] *** process_perfdata.pl terminated on signal ALRM
2011-10-05 17:37:18 [3285] [0] *** TIMEOUT: Timeout after 5 secs. ***
2011-10-05 17:37:18 [3285] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2011-10-05 17:37:18 [3285] [0] *** TIMEOUT: Please check your npcd.cfg
2011-10-05 17:37:18 [3285] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//host-perfdata.1317829028-PID-3285 deleted
2011-10-05 17:37:18 [3285] [0] *** Timeout while processing Host: "LPRAPP04" Service: "_HOST_"
2011-10-05 17:37:18 [3285] [0] *** process_perfdata.pl terminated on signal ALRM
2011-10-05 17:37:18 [3287] [0] *** TIMEOUT: Timeout after 5 secs. ***
2011-10-05 17:37:18 [3287] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2011-10-05 17:37:18 [3287] [0] *** TIMEOUT: Please check your npcd.cfg
2011-10-05 17:37:18 [3287] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//host-perfdata.1317829002-PID-3287 deleted
2011-10-05 17:37:18 [3287] [0] *** Timeout while processing Host: "HQLPRTOMC02" Service: "_HOST_"
2011-10-05 17:37:18 [3287] [0] *** process_perfdata.pl terminated on signal ALRM
2011-10-05 22:31:01 [19906] [0] *** TIMEOUT: Timeout after 5 secs. ***
2011-10-05 22:31:01 [19906] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2011-10-05 22:31:01 [19906] [0] *** TIMEOUT: Please check your npcd.cfg
2011-10-05 22:31:03 [19906] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1317846647-PID-19906 deleted
2011-10-05 22:31:03 [19906] [0] *** Timeout while processing Host: "HQLPQTOMC02" Service: "CPU_Stats"
2011-10-05 22:31:03 [19906] [0] *** process_perfdata.pl terminated on signal ALRM
2011-10-13 15:01:46 [15247] [0] *** ERROR: /usr/local/nagios/var/stats is not writable or does not exist
fao
Posts: 99
Joined: Thu Feb 03, 2011 3:05 am

Re: no performance data visible for last couple weeks

Post by fao »

alright, now I am getting some basic graphs

chmod -R g+x nagios/share/perfdata

seems to have done the trick
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: no performance data visible for last couple weeks

Post by mguthrie »

Ok, thanks for the update!