Nagios xi seems to be reporting without agent

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

There is a Nagios xi installation that appears to be working correctly, installed by IT at my company. It is monitoring Windows servers.

I was asked to look into install nagios for the Debian servers that we have here. I chose a way that appeared to be 'Debian appropriate' over a way that appeared to use Ubuntu repos, being unaware of the difference between xi and what was available in the repos. When I was told that xi was commercial, I elected to uninstall what I had installed and attempt to install the way the PDF file instructed.

However, after uninstalling nagios with:
sudo apt-get remove nagios-nrpe-server nagios-plugins-basic

... I discovered that Nagiosxi is still getting information about the server:
FTP-TEST-Template Current Load Ok 1d 3h 36m 59s 1/4 2010-10-28 12:08:18 OK - load average: 0.07, 0.15, 0.21
Current Users Ok 1d 3h 36m 20s 1/4 2010-10-28 12:08:58 USERS OK - 3 users currently logged in
Home Partition Ok 1d 2h 5m 14s 1/4 2010-10-28 12:10:03 DISK OK - free space: / 2430 MB (37% inode=95%):
PING Ok 38m 57s 1/4 2010-10-28 12:06:20 PING OK - Packet loss = 0%, RTA = 3.95 ms
Root Partition Ok 1d 3h 35m 4s 1/4 2010-10-28 12:05:13 DISK OK - free space: / 2431 MB (37% inode=95%):
SSH Critical 1d 3h 34m 25s 4/4 2010-10-28 12:08:52 CRITICAL - Socket timeout after 10 seconds
Swap Usage Ok 1d 3h 33m 47s 1/4 2010-10-28 12:07:22 SWAP OK - 100% free (1023 MB out of 1023 MB)
Total Processes Ok 1d 3h 38m 15s 1/4 2010-10-28 12:09:48 PROCS OK: 53 processes with STATE = RSZDT
Some odd things about this are the labels that are being used ( I can't find them anywhere ), and the fact that disk space is supposedly being reported without an agent present on FTP-TEST-Template (running Lenny).

The IP address displayed matched, but I took the server down anyway. While the server was down, monitoring started reporting it was down. When I booted it back up, the server continued to report all of the above.

sudo ps aux reveals no processes that have nrpe, nagios, or even agent in the names.

Since this server is a vmware guest, I stopped vmware tools with sudo /etc/init.d/vmware-tools stop. ESX shows that tools isn't running; Nagiosxi continues to report.

... what is going on??

There is a possibility that some other version of nagios was installed that I've been unable to track down, but, if so, it's doing a great job of not showing up in the ps.
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA

Re: Nagios xi seems to be reporting without agent

Post by tonyyarusso »

Well, I can answer some of this at least.

The service labels are defined in xi, not on your machine, so that's why you don't see them in your configs.

The PPA package does use Ubuntu resources (Launchpad) for itself, but should work fine on Debian, properly pulling the dependencies from Debian repositories.

One thing I would suggest for the service info is to look at the "Last Check Time", and see if it's recent (after you tried removing NRPE). It's possible it just hasn't run another check yet.

Also check the two commands keith4 mentioned:

Code: Select all

tyarusso@ubuntu-desktop:~$ sudo lsof | grep 5666
nrpe        877     nagios    4u     IPv4       4310       0t0        TCP *:5666 (LISTEN)
tyarusso@ubuntu-desktop:~$ sudo netstat -an | grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN 
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Re: Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

The times in the output, about "2010-10-28 12:05:13" were pretty current to the time the question was posted. ( Fortunately, I did think to make sure that I was at least looking at recent date-stamps... I like to try to do *some* work before pinging others. )

And I also checked the commands that were suggested, but never said anything because they yielded no answers, with the exception of nmap, which I didn't install in order to see what nmap reported.

Here is the recent data ( still clocking along, well after removal, and then rebooting (twice, I think)):
FTP-TEST-Template Current Load Ok 1d 7h 10m 7s 1/4 2010-10-28 15:38:18 OK - load average: 0.35, 0.31, 0.23
Current Users Ok 1d 7h 9m 28s 1/4 2010-10-28 15:38:58 USERS OK - 1 users currently logged in
Home Partition Ok 1d 5h 38m 22s 1/4 2010-10-28 15:40:03 DISK OK - free space: / 2424 MB (37% inode=95%):
PING Ok 4h 12m 5s 1/4 2010-10-28 15:41:20 PING OK - Packet loss = 0%, RTA = 3.58 ms
Root Partition Ok 1d 7h 8m 12s 1/4 2010-10-28 15:40:13 DISK OK - free space: / 2424 MB (37% inode=95%):
SSH Critical 1d 7h 7m 33s 4/4 2010-10-28 15:38:52 CRITICAL - Socket timeout after 10 seconds
Swap Usage Ok 1d 7h 6m 55s 1/4 2010-10-28 15:42:22 SWAP OK - 100% free (1023 MB out of 1023 MB)
Total Processes Ok 1d 7h 11m 23s 1/4 2010-10-28 15:39:48 PROCS OK: 53 processes with STATE = RSZDT
and:
jae@ftpxlenny0:~$ sudo lsof | grep 5666
jae@ftpxlenny0:~$ sudo netstat -an | grep 5666
jae@ftpxlenny0:~$ sudo netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:36244 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:21 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
udp 0 0 0.0.0.0:902 0.0.0.0:*
udp 0 0 0.0.0.0:111 0.0.0.0:*
udp 0 0 0.0.0.0:41212 0.0.0.0:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 5 [ ] DGRAM 5173 /dev/log
unix 2 [ ] DGRAM 2556 @/org/kernel/udev/udevd
unix 2 [ ACC ] STREAM LISTENING 5190 /var/run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 5989 /opt/sbin/proftpd132b/var/proftpd/proftpd.sock
unix 3 [ ] STREAM CONNECTED 7543
unix 3 [ ] STREAM CONNECTED 7542
unix 2 [ ] DGRAM 7541
unix 2 [ ] DGRAM 6175
unix 2 [ ] DGRAM 5192
and just for kicks:
jae@ftpxlenny0:~$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 2104 688 ? Ss 11:26 0:02 init [2]
root 2 0.0 0.0 0 0 ? S< 11:26 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S< 11:26 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S< 11:26 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 11:26 0:00 [watchdog/0]
root 6 0.0 0.0 0 0 ? S< 11:26 0:00 [events/0]
root 7 0.0 0.0 0 0 ? S< 11:26 0:00 [khelper]
root 39 0.0 0.0 0 0 ? S< 11:26 0:00 [kblockd/0]
root 41 0.0 0.0 0 0 ? S< 11:26 0:00 [kacpid]
root 42 0.0 0.0 0 0 ? S< 11:26 0:00 [kacpi_notify]
root 170 0.0 0.0 0 0 ? S< 11:26 0:00 [kseriod]
root 207 0.0 0.0 0 0 ? S 11:26 0:00 [pdflush]
root 208 0.0 0.0 0 0 ? S 11:26 0:00 [pdflush]
root 209 0.0 0.0 0 0 ? S< 11:26 0:00 [kswapd0]
root 210 0.0 0.0 0 0 ? S< 11:26 0:00 [aio/0]
root 754 0.0 0.0 0 0 ? S< 11:26 0:00 [ata/0]
root 755 0.0 0.0 0 0 ? S< 11:26 0:00 [ata_aux]
root 934 0.0 0.0 0 0 ? S< 11:26 0:00 [scsi_eh_0]
root 1067 0.0 0.0 0 0 ? S< 11:26 0:00 [kjournald]
root 1139 0.0 0.3 2288 776 ? S<s 11:26 0:00 udevd --daemon
root 1669 0.0 0.0 0 0 ? S< 11:26 0:00 [kpsmoused]
root 1935 0.0 0.0 0 0 ? S< 11:26 0:00 [kjournald]
daemon 1987 0.0 0.1 1896 508 ? Ss 11:26 0:00 /sbin/portmap
statd 1998 0.0 0.2 1960 720 ? Ss 11:26 0:00 /sbin/rpc.statd
root 2191 0.0 0.6 27536 1548 ? Sl 11:26 0:00 /usr/sbin/rsyslogd -c3
root 2202 0.0 0.2 1768 572 ? Ss 11:26 0:00 /usr/sbin/acpid
root 2279 0.0 0.4 5416 1040 ? S<s 11:26 0:00 /usr/sbin/sshd
101 2745 0.0 0.3 6272 924 ? Ss 11:26 0:00 /usr/sbin/exim4 -bd -q30m
nobody 2771 0.0 0.4 3108 1160 ? Ss 11:26 0:00 proftpd: (accepting connections)
daemon 2776 0.0 0.1 2048 440 ? Ss 11:26 0:00 /usr/sbin/atd
root 2796 0.0 0.3 3428 788 ? Ss 11:26 0:00 /usr/sbin/cron
root 2813 0.0 0.4 2628 1188 tty1 Ss 11:26 0:00 /bin/login --
root 2815 0.0 0.1 1768 508 tty2 Ss+ 11:26 0:00 /sbin/getty 38400 tty2
root 2817 0.0 0.1 1768 504 tty3 Ss+ 11:26 0:00 /sbin/getty 38400 tty3
root 2819 0.0 0.1 1768 504 tty4 Ss+ 11:26 0:00 /sbin/getty 38400 tty4
root 2821 0.0 0.1 1768 504 tty5 Ss+ 11:26 0:00 /sbin/getty 38400 tty5
root 2823 0.0 0.1 1768 508 tty6 Ss+ 11:26 0:00 /sbin/getty 38400 tty6
jae 2837 0.0 1.1 5556 2860 tty1 S+ 11:27 0:00 -bash
root 3151 0.0 1.0 8160 2608 ? S<s 15:44 0:00 sshd: jae [priv]
jae 3156 0.0 0.5 8160 1444 ? S< 15:44 0:00 sshd: jae@pts/0
jae 3157 0.0 1.1 5552 2832 pts/0 S<s 15:44 0:00 -bash
jae 3181 0.0 0.4 3716 1032 pts/0 R<+ 15:49 0:00 ps aux
... any thoughts?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios xi seems to be reporting without agent

Post by mguthrie »

check to see if the following packages are installed:

nagios-nrpe-plugin
nagios-plugins

They may have also been installed as dependencies.


Otherwise do a locate or find for nrpe and see what shows up.
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Re: Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

$ dpkg -l | egrep -e 'nrpe' -e 'nagios' -e 'plugin'
rc nagios-nrpe-server 2.12-1 Nagios Remote Plugin Executor Server
rc nagios-plugins-basic 1.4.15-1~bpo50+1 Plugins for the nagios network monitoring and management syste
jae@ftpxlenny0:~$ dpkg-query -S "nrpe"
nagios-nrpe-server: /etc/nagios/nrpe_local.cfg
nagios-nrpe-server: /etc/init.d/nagios-nrpe-server
nagios-nrpe-server: /etc/default/nagios-nrpe-server

nagios-nrpe-server: /etc/nagios/nrpe.cfg
jae@ftpxlenny0:~$ dpkg-query -S "nagios"
nagios-nrpe-server: /etc/nagios/nrpe_local.cfg
nagios-plugins-basic: /etc/nagios-plugins
nagios-nrpe-server: /etc/init.d/nagios-nrpe-server
nagios-nrpe-server: /etc/default/nagios-nrpe-server
nagios-nrpe-server: /etc/nagios/nrpe.cfg
nagios-nrpe-server: /etc/nagios
nagios-plugins-basic: /etc/nagios-plugins/config
jae@ftpxlenny0:~$ dpkg-query -S "plugin"
nagios-plugins-basic: /etc/nagios-plugins
libkrb53: /usr/lib/krb5/plugins
linux-headers-2.6.26-2-686: /usr/src/linux-headers-2.6.26-2-686/include/config/snd/pcm/oss/plugins.h
nagios-plugins-basic: /etc/nagios-plugins/config
libkrb53: /usr/lib/krb5/plugins/krb5
jae@ftpxlenny0:~$ sudo apt-get remove nagios-nrpe-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package nagios-nrpe-server is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
... So, it looks like a mismatched dependency issue.

sudo apt-get install nagios-nrpe-server --no-install-recommends
sudo apt-get remove nagios-nrpe-server

... leaves the machine in the same state. I have no idea what installed to /etc/default .

But at least we can rule out omniscience now.
( And I have some idea of where to look for the rabbit trails. )
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Re: Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

... Can I just delete the files that are listed in dpkg-query -S ?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios xi seems to be reporting without agent

Post by mguthrie »

I'm not totally clear on what install steps you took, but this might be handy to know. The directory locations for nagios related packages will be different if you install from source, or apt, or yum.

If you use the "apt-get autoremove" instead of "remove" it will remove any installed dependencies that came with a package.

You *should* be ok to just delete those files manually and then reinstall using our documentation; *should* being the operative word ;)
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Re: Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

Ok, I thought for sure that manually uninstalling the files would fix my issue, but no. I continue to see entries in Nagiosxi without an active Nagios agent on the box ( as far as I can tell. ) The times in the Nagiosxi panel are current to the last few minutes. I've resorted to locate ( mlocate ):

sudo updatedb

sudo mlocate nagios
/etc/nagios
/etc/nagios-plugins
/etc/apt/sources.list.d/nagiosinc.list
/etc/nagios/nrpe.cfg
/etc/nagios/nrpe.cfg~
/etc/nagios/nrpe_local.cfg
/etc/nagios/test_cfg
/etc/nagios/test_cfg/nrpe.cfg
/etc/nagios/test_cfg/nrpe.cfg~
/etc/nagios/test_cfg/nrpe_local.cfg
/etc/nagios-plugins/config
/etc/nagios-plugins/config/apt.cfg
/etc/nagios-plugins/config/dhcp.cfg
/etc/nagios-plugins/config/disk.cfg
/etc/nagios-plugins/config/dummy.cfg
/etc/nagios-plugins/config/ftp.cfg
/etc/nagios-plugins/config/http.cfg
/etc/nagios-plugins/config/load.cfg
/etc/nagios-plugins/config/mail.cfg
/etc/nagios-plugins/config/news.cfg
/etc/nagios-plugins/config/ntp.cfg
/etc/nagios-plugins/config/ping.cfg
/etc/nagios-plugins/config/procs.cfg
/etc/nagios-plugins/config/real.cfg
/etc/nagios-plugins/config/ssh.cfg
/etc/nagios-plugins/config/tcp_udp.cfg
/etc/nagios-plugins/config/telnet.cfg
/etc/nagios-plugins/config/users.cfg
/var/cache/apt/archives/nagios-nrpe-server_2.12-1_i386.deb
/var/cache/apt/archives/nagios-plugins-basic_1.4.15-1~bpo50+1_i386.deb
/var/lib/apt/lists/ppa.launchpad.net_nagiosinc_ppa_ubuntu_dists_lucid_Release
/var/lib/apt/lists/ppa.launchpad.net_nagiosinc_ppa_ubuntu_dists_lucid_Release.gpg
/var/lib/apt/lists/ppa.launchpad.net_nagiosinc_ppa_ubuntu_dists_lucid_main_binary-i386_Packages
/var/lib/dpkg/info/nagios-nrpe-server.list
/var/lib/dpkg/info/nagios-nrpe-server.postrm
/var/lib/dpkg/info/nagios-plugins-basic.list
/var/lib/dpkg/info/nagios-plugins-basic.postrm
I see two deb packages ( which, as far as I know, can't be used to run the agent while still packaged ), a number of config files, and a few directories, but no executables. I also see something under 'info', which I would expect would be 'information' related, and not executable.

sudo mlocate nrpe
/etc/nagios/nrpe.cfg
/etc/nagios/nrpe.cfg~
/etc/nagios/nrpe_local.cfg
/etc/nagios/test_cfg/nrpe.cfg
/etc/nagios/test_cfg/nrpe.cfg~
/etc/nagios/test_cfg/nrpe_local.cfg
/var/cache/apt/archives/nagios-nrpe-server_2.12-1_i386.deb
/var/lib/dpkg/info/nagios-nrpe-server.list
/var/lib/dpkg/info/nagios-nrpe-server.postrm
About the same goes for the list above...

Here's the output listed in Nagiosxi:
FTP-TEST-Template Current Load Ok 5d 2h 47m 48s 1/4 2010-11-01 11:18:18 OK - load average: 0.70, 0.41, 0.38
Current Users Ok 5d 2h 47m 9s 1/4 2010-11-01 11:19:02 USERS OK - 1 users currently logged in
Home Partition Ok 5d 1h 16m 3s 1/4 2010-11-01 11:20:03 DISK OK - free space: / 2376 MB (36% inode=95%):
PING Ok 3d 23h 49m 46s 1/4 2010-11-01 11:16:20 PING OK - Packet loss = 0%, RTA = 3.82 ms
Root Partition Ok 5d 2h 45m 53s 1/4 2010-11-01 11:20:13 DISK OK - free space: / 2376 MB (36% inode=95%):
SSH Critical 5d 2h 45m 14s 4/4 2010-11-01 11:18:52 CRITICAL - Socket timeout after 10 seconds
Swap Usage Ok 5d 2h 44m 36s 1/4 2010-11-01 11:17:22 SWAP OK - 100% free (1023 MB out of 1023 MB)
Total Processes Ok 5d 2h 49m 4s 1/4 2010-11-01 11:19:53 PROCS OK: 51 processes with STATE = RSZDT
It looks as if the plugins cfg files may match the output, but who is doing the reporting :?:
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios xi seems to be reporting without agent

Post by mguthrie »

The two binaries I know of that send passive checks are nrpe and send_nsca. You don't happen to have anything in the /usr/local/nagios/bin directory do you?

What install steps did you take to set up the nrpe checks initially? The directory locations are different depending what you used (apt, source, etc).
cognitiaclaeves
Posts: 7
Joined: Thu Oct 28, 2010 11:51 am

Re: Nagios xi seems to be reporting without agent

Post by cognitiaclaeves »

No, there is no directory under /usr/local/ named nagios or anything similar to nagios or nrpe.

...

I left out the exact install steps because I didn't want to give the illusion that I knew all the steps that went into the install. I know what I did when I attempted to set up the nagios agent on the target box (it is an FTP server), but it's possible someone else from IT did something afterward as well. ( I could ask, but they are Windows admins, so I would anticipate that I should not take what they think they did exactly at face value. )

My part was installing issuing apt-get-install nagios-server, balking at the number of dependencies it wanted to install, and then hitting #nagios to ask if there was a more reasonable way. ... and, after chatting with keith4, installing with:

sudo apt-get install nagios-plugins-basic nagios-nrpe-server

... and then futzing with it after being told that I should use backports. ( I didn't get the backports version installed correctly until about the third time. )

... and then preparing documentation on how IT should install Nagios on the Lenny boxes, if they ran across an instance where they needed it.

( I settled with:
sudo apt-get -t lenny-backports install nagios-plugins-basic nagios-nrpe-server --no-install-recommends )

Unfortunately, the next email that I received seemed to indicate that someone on IT had found another way to install Nagios, but that it didn't work anyway. ( So I don't know what went into making that assessment on their part. )

IT then reported that they were unable to change the drives that were being monitored, which led me to discovering that changing the config file for the agent did not result in any changes reflected on xi, and then, that uninstalling the agent from the target server ( not xi ) resulted in no change in what xi reported to be currently up and running on the target server.

I've run several commands, all documented here, where I've attempted to track what could be sending information out on behalf of this server. ... Did I miss something in the commands that I used to try to track it down?

Keith4 mentioned that there could be an incorrect config on xi that made it look like it was communicating with the expected server, but reporting stats on itself instead. He also stated that taking down the server where I installed the agent and observing that the status on reflected that the server was down -- was a potentially inconclusive test. Keith4 suggested that perhaps the target server was being accessed through another service ( such as SSH** ), and implying that the agent wouldn't need to be installed on the target server in order to observer the data points referred to above.

** Shutting down the SSH service had no effect on what was reported on xi for the target server.

By the way, executing "nmap -p1-10000 localhost" on the target server resulted in this:
Starting Nmap 4.62 ( http://nmap.org ) at 2010-11-01 13:19 CDT
Interesting ports on localhost (127.0.0.1):
Not shown: 9996 closed ports
PORT STATE SERVICE
21/tcp open ftp
22/tcp open ssh
25/tcp open smtp
111/tcp open rpcbind

Nmap done: 1 IP address (1 host up) scanned in 0.299 seconds