Wizard "Network Switch / Router" - check_xi_service_mrtgtraf

Post by **srrhd** » Fri Jun 17, 2011 2:38 am

Hi,

I tryied to use the Wizard "Network Switch / Router" but it seems not able to give the badwidth informations. If it gives the right interface's status, the IN and OUT BW remain to 0Mbps.

Looking at the configuration of the check_xi_service_mrtgtraf command, it relies on check_rrdtraf. But this plugin seems to only check the rrd files. So i guess that the problem is that those files aren't updated. So my question is which process is supposed to poll the equipment and write the data into those files ?
Did i miss something and should have install something else ?

Aurelien.

PS: I don't know if there is a link but in /var/log/httpd/error_log i got:

Code: Select all

[Fri Jun 17 11:00:45 2011] [error] [client 192.168.80.10] PHP Notice:  Undefined variable: longKey in /usr/local/nagiosxi/html/includes/configwizards/switch/switch.inc.php on line 805, referer: http://192.168.80.188/nagiosxi/config/monitoringwizard.php?update=1&nextstep=2&nsp=c940937936c559176c2af89242872a13&wizard=switch

Post by **srrhd** » Fri Jun 17, 2011 5:43 am

It may be related to this post:
http://support.nagios.com/forum/viewtop ... =16&t=2435
If yes you are already working on it, but just for information, the community that we used for polling has RW rights.

Aurelien.

Post by **nscott** » Fri Jun 17, 2011 11:31 am

MRTG is in charge of updating those RRDs, and shouldn't have to install anything else, everything should already be installed. In my tracking down of the bug, I've found this methodology of where the disconnect is happening. I'll be super verbose with what I did because this problem is a bit of a phantom.

You'll need to get the name of the device being monitored with MRTG. In most cases thats just <switchIP> , then go to the /var/lib/mrtg/ directory, now first thing to check is

./rrdtool fetch <switchIP>_1.rrd AVERAGE -1h

This will return everything thats been written to the database for the last hour, if these are all NaN then there is an issue with MRTG, if its all zero then the device is either not functioning properly or the bandwidth IS actually zero. One of the entries in the test switch was

1308320400: 2.671294333e+04 8.398140000e+03

So that means at the time MRTG logged 26,712 bytes down and 8,398 bytes up (26.7KB/s down, 8.3KB/s up), these number correlate with that testing environment, so I don't think the MRTG logging is the issue, however if you're getting zeros or NaNs here, then we'll need to delve deeper into MRTG/the SNMP monitoring setup.

Next is to check the check command that Nagios uses to reference those very RRDs we just verified were good (hopefully!) so we'll need to go to /usr/local/nagios/libexec/ and put in

./check_rrdtraf -f /var/lib/mrtg/<rrdfile_you_want_to_check> -w 2,2 -c 5,5

This SHOULD pull the same values you got from the rrdtool fetch command above. If isn't then there is a disconnect. As soon as I issued the command from rrdtraf, bandwidth started showing up on my duplicated problem box, or at least the next check showed bandwidth. Can you see if this changes anything? Also, something to keep in mind is that if the switch is getting some traffic, but not very much, Nagiosxi's default check is to display it in Mbps, which WILL truncate off low bandwidths, and will show 0Mbps.

Another question, in the Switch wizard, when it polls the data from the switch, is the max bandwidth on each port showing up correctly?

Give that a shot, let me know how it goes!

Post by **srrhd** » Mon Jun 20, 2011 1:03 am

Ok so i guess we might have found the problem. If i have well understood you procedure, the executable "rrdtool" should exist in /var/lib/mrtg/

Unfortunately it isn't !

When listing /var/lib/mrtg/ i got all my "DEVICEIP_INTERFACENUMBER.rrd" files and a "mrtg.ok" but nothing else.

I found the xecutable under /usr/bin/, here is what i got:

[root@localhost mrtg]# /usr/bin/rrdtool fetch 10.100.201.253_7008.rrd AVERAGE -1h
ERROR: unknown option '-1'
[root@localhost mrtg]# /usr/bin/rrdtool fetch 10.100.201.253_7008.rrd AVERAGE | more
ds0 ds1

1308465000: nan nan
1308465300: nan nan
1308465600: nan nan
1308465900: nan nan
1308466200: nan nan
1308466500: nan nan
1308466800: nan nan
1308467100: nan nan
[...]
1308549300: nan nan
1308549600: nan nan
1308549900: nan nan
1308550200: nan nan
1308550500: nan nan
1308550800: nan nan
1308551100: nan nan
1308551400: nan nan

So i guess we'll need to delve deeper into MRTG.

And to answer your last question, yes the wizard gave the right max bandwidth for each interface.

Aurelien.

Post by **nscott** » Mon Jun 20, 2011 12:08 pm

Hmm, well with you getting NaN, and that seems to correlate with the the http log that the switch wizard failed. Have you tried recreating the switch in the wizard, I think thats the root cause of the issue.

Post by **srrhd** » Wed Jun 22, 2011 3:36 am

I have creted the host using the wizard, so i guess it won't be usefull to recreate it this way.
I got the impression that it is the large number of services that cause the problem, but it's only an impression.

Yesterday morning for no reason all the concerned services were OK then few of them get UNKNOW status again.
When looking in /var/lib/mrtg/ all the rrd file are own by root:root, i guess the wizard create them this way. I did a "chown nagios:user" on them to try but the problem persist.

Post by **nscott** » Wed Jun 22, 2011 3:09 pm

srrhd,

I just found a bug that is causing the switch wizard to create cfgmaker output that uses 2c as the version, which MRTG won't accept. Can you check your /var/spool/mail/root for warnings from MRTG saying that it doesn't know what to do with the c.

Post by **srrhd** » Mon Jun 27, 2011 5:36 am

Hi Nicholas,

I had plenty until the 1st of June but since not even one. Might an update have solved this already ?

Post by **nscott** » Mon Jun 27, 2011 9:48 am

Its possible this may have been thrown on the heap of updates. If its working for you now, then thats great! But if its not:

Add this line to the global your /etc/mrtg/mrtg.cfg , under the ThreshDir line

WorkDir: /var/lib/mrtg

Then delete the config for the switch (in this very same mrtg.cfg) that is not writing to RRD files properly. Then recreate it in Nagios using SNMP v2, not v2c.

GldRush98 · Post by **GldRush98** » Mon Jun 27, 2011 10:22 am

I am not sure this is the same issue, but I feel it might be.

I updated to 2011R1.5 this morning and added on a new switch.
After adding the switch and reviewing its services, all of the bandwidth statuses are in an error condition with:
"/var/lib/mrtg/[IP Address]_[index].rrd does not exist."

I never had a problem befor 2011R1.5, and now I'm regretting applying the update.

Observe:

yetanothernagiosissue.JPG

I tried

Add this line to the global your /etc/mrtg/mrtg.cfg , under the ThreshDir line

WorkDir: /var/lib/mrtg

Then delete the config for the switch (in this very same mrtg.cfg) that is not writing to RRD files properly. Then recreate it in Nagios using SNMP v2, not v2c.

But that didn't change anything.

Nagios Support Forum

Wizard "Network Switch / Router" - check_xi_service_mrtgtraf

Wizard "Network Switch / Router" - check_xi_service_mrtgtraf

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg

Re: Wizard "Network Switch / Router" - check_xi_service_mrtg