graphs Stopped after Password Change

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

graphs Stopped after Password Change

Post by Gavin »

Hi,

We're running 2012R1.4

Someone inadvertently reset the 'nagiosadmin' password via. the normal user management interface. At the same time, we deleted one of our admin users. I have a feeling that this user was created by cloning the 'nagiosadmin' user (this won't happen again), and I remember seeing a bug where cloning items actually moves certain parameters rather than copying them? Either way, ever since then, graphing has ceased functioning.

I've since reset the security tokens via. the GUI, and the nagiosadmin user now has an alphanumeric password.

I've restarted every Nagios service, and there are still no graphs. I've included an excerpt of some logs at the bottom of this ticket. My digging also led me to find that the 'backend_ticket' for the nagiosadmin user (in the PostgreSQL db) is only 8 characters long, and all other users are 64. I was also surprised to see that the nagiosadmin user has a user_id of 18? Is that normal?

Any help would be appreciated...

Thanks,

Gavin

--------------------

Log sample taken at 12:41

/usr/local/nagios/var/perfdata.log

Code: Select all

2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_runtime.rrd 1359022731:1.772237
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_rows.rrd 1359022731:497
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_errors.rrd 1359022731:1
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_invalid.rrd 1359022731:0
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_skipped.rrd 1359022731:5
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_update.rrd 1359022731:491
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_create.rrd 1359022731:0
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:07 [24778] [1] PNP exiting (runtime 0.00019s) ...
/usr/local/nagios/var/npcd.log

Code: Select all

[01-24-2013 12:40:36] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:40:51] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:40:51] NPCD: DEBUG: load 2.790000/40.000000
[01-24-2013 12:40:51] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:40:51] NPCD: DEBUG: load 2.790000/40.000000
[01-24-2013 12:40:51] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:40:51] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:06] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:06] NPCD: DEBUG: load 2.530000/40.000000
[01-24-2013 12:41:06] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:06] NPCD: DEBUG: load 2.530000/40.000000
[01-24-2013 12:41:06] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:06] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:21] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:21] NPCD: DEBUG: load 20.070000/40.000000
[01-24-2013 12:41:21] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:21] NPCD: DEBUG: load 20.070000/40.000000
[01-24-2013 12:41:21] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:21] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:36] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:36] NPCD: DEBUG: load 15.830000/40.000000
[01-24-2013 12:41:36] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:36] NPCD: DEBUG: load 15.830000/40.000000
[01-24-2013 12:41:36] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:36] NPCD: No more files to process... waiting for 15 seconds
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: graphs Stopped after Password Change

Post by yancy »

Gavin,

Can you double check the file permissions on process_perfdata.pl

Code: Select all

  ll /usr/local/nagios/libexec/process_perfdata.pl 
I'll have to do some digging to answer your other questions, or someone more knowledgeable can chime in.


Regards,

-Yancy
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

Re: graphs Stopped after Password Change

Post by Gavin »

Hi Nancy,

We ran the permissions reset. Permissions on that file are as follows:

Code: Select all

-rwxr-xr-x 1 nagios nagios 42K Dec 17 11:17 /usr/local/nagios/libexec/process_perfdata.pl*
Thanks,

Gavin
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: graphs Stopped after Password Change

Post by yancy »

Gavin,

how about your perfdata directory

Code: Select all

 ll /usr/local/nagios/share/perfdata 
if that checks out, try cracking open that directory and running rrdtraf against a rrd file

for example:

Code: Select all

  /usr/local/nagios/libexec/check_rrdtraf -f nrpe_diskspace.rrd -w 1 -c 2 
regards,

-Yancy
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: graphs Stopped after Password Change

Post by chrisp »

Hi Yancy,

I sit next to Gavin in our office. He's gone home but I am still here, trying to get this fixed ASAP.

There are plenty of files in there: -

Code: Select all

% ll /usr/local/nagios/share/perfdata | wc -l
136
and here's the check_rrdtraf test on my test server: -

Code: Select all

% /usr/local/nagios/libexec/check_rrdtraf -f rainbow-it.net/Check_HTTP_-_Port_80.rrd -w 1 -c 2 
OK - Current BW in: 16.00bps Out: 0bps|in=16.000000b/s;1;2 out=0b/s;1;2
Definitely no data showing on that graph: -
rainbow-it.net_HTTP_80.png
You do not have the required permissions to view the files attached to this post.
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: graphs Stopped after Password Change

Post by yancy »

chrisp,

can you check what the file permissions are on that directory.

Code: Select all

 ll /usr/local/nagios/share/perfdata 

the permissions should be as follows:
drwxrwxrwx 2 nagios nagios
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: graphs Stopped after Password Change

Post by chrisp »

Current permissions are 775 nagios:nagios, not 777!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: graphs Stopped after Password Change

Post by scottwilkerson »

Can you post the settings for the following commands

Code: Select all

process-service-perfdata-file-bulk
process-host-perfdata-file-bulk
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: graphs Stopped after Password Change

Post by chrisp »

Right, this is extremely helpful and utterly confounding and frustrating. This is how it looks right now: -

Code: Select all

% grep -A1 bulk /usr/local/nagios/etc/commands.cfg 
       command_name                  		process-host-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                  		process-host-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                  		process-service-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                  		process-service-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
But this is what it looked like last night: -

Code: Select all

% grep -A1 bulk /usr/local/nagios/etc/commands.cfg
       command_name                         process-host-perfdata-file-bulk
       command_line                         /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                         process-host-perfdata-file-pnp-bulk
       command_line                         /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                         process-service-perfdata-file-bulk
       command_line                         /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                         process-service-perfdata-file-pnp-bulk
       command_line                         /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/perfdata/service-perfdata.$TIMET$
Can you shed any light on how these values might have reverted back to a broken state? This really concerns me greatly, as I can't think of any actions that we may have deliberately taken, which would have altered the file like this.

After putting the file back to the worky state & restarting stuff, we have graph data again: -
whyarethecommandschanging.png
It's probably worth noting that rrdcached gets very upset and "service rrdcached restart" fails to properly kill the old processes (there are 2 when it's in the upset state), so I had to do "killall rrdcached" before I could get it back on track.
You do not have the required permissions to view the files attached to this post.
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: graphs Stopped after Password Change

Post by chrisp »

I just rebooted and the commands.cfg has changed again: -

Code: Select all

# grep -A1 bulk /usr/local/nagios/etc/commands.cfg            
       command_name                  		process-host-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                  		process-host-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                  		process-service-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                  		process-service-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$