I tryied to use the Wizard "Network Switch / Router" but it seems not able to give the badwidth informations. If it gives the right interface's status, the IN and OUT BW remain to 0Mbps.
Looking at the configuration of the check_xi_service_mrtgtraf command, it relies on check_rrdtraf. But this plugin seems to only check the rrd files. So i guess that the problem is that those files aren't updated. So my question is which process is supposed to poll the equipment and write the data into those files ?
Did i miss something and should have install something else ?
Aurelien.
PS: I don't know if there is a link but in /var/log/httpd/error_log i got:
[Fri Jun 17 11:00:45 2011] [error] [client 192.168.80.10] PHP Notice: Undefined variable: longKey in /usr/local/nagiosxi/html/includes/configwizards/switch/switch.inc.php on line 805, referer: http://192.168.80.188/nagiosxi/config/monitoringwizard.php?update=1&nextstep=2&nsp=c940937936c559176c2af89242872a13&wizard=switch
Configuration: Nagios xi 2014R2.5
manually upgraded on 64bits CentOS 6 with nothing extra, no gnome, no proxy, no SSL
Add-ons: Hypermap, Minemap, Ping Action, Traceroute Action, Network Replay, Graph Explorer, Latest Alert
MRTG is in charge of updating those RRDs, and shouldn't have to install anything else, everything should already be installed. In my tracking down of the bug, I've found this methodology of where the disconnect is happening. I'll be super verbose with what I did because this problem is a bit of a phantom.
You'll need to get the name of the device being monitored with MRTG. In most cases thats just <switchIP> , then go to the /var/lib/mrtg/ directory, now first thing to check is
./rrdtool fetch <switchIP>_1.rrd AVERAGE -1h
This will return everything thats been written to the database for the last hour, if these are all NaN then there is an issue with MRTG, if its all zero then the device is either not functioning properly or the bandwidth IS actually zero. One of the entries in the test switch was
1308320400: 2.671294333e+04 8.398140000e+03
So that means at the time MRTG logged 26,712 bytes down and 8,398 bytes up (26.7KB/s down, 8.3KB/s up), these number correlate with that testing environment, so I don't think the MRTG logging is the issue, however if you're getting zeros or NaNs here, then we'll need to delve deeper into MRTG/the SNMP monitoring setup.
Next is to check the check command that Nagios uses to reference those very RRDs we just verified were good (hopefully!) so we'll need to go to /usr/local/nagios/libexec/ and put in
This SHOULD pull the same values you got from the rrdtool fetch command above. If isn't then there is a disconnect. As soon as I issued the command from rrdtraf, bandwidth started showing up on my duplicated problem box, or at least the next check showed bandwidth. Can you see if this changes anything? Also, something to keep in mind is that if the switch is getting some traffic, but not very much, Nagiosxi's default check is to display it in Mbps, which WILL truncate off low bandwidths, and will show 0Mbps.
Another question, in the Switch wizard, when it polls the data from the switch, is the max bandwidth on each port showing up correctly?
Ok so i guess we might have found the problem. If i have well understood you procedure, the executable "rrdtool" should exist in /var/lib/mrtg/
Unfortunately it isn't !
When listing /var/lib/mrtg/ i got all my "DEVICEIP_INTERFACENUMBER.rrd" files and a "mrtg.ok" but nothing else.
I found the xecutable under /usr/bin/, here is what i got:
[root@localhost mrtg]# /usr/bin/rrdtool fetch 10.100.201.253_7008.rrd AVERAGE -1h
ERROR: unknown option '-1'
[root@localhost mrtg]# /usr/bin/rrdtool fetch 10.100.201.253_7008.rrd AVERAGE | more
ds0 ds1
1308465000: nan nan
1308465300: nan nan
1308465600: nan nan
1308465900: nan nan
1308466200: nan nan
1308466500: nan nan
1308466800: nan nan
1308467100: nan nan
[...]
1308549300: nan nan
1308549600: nan nan
1308549900: nan nan
1308550200: nan nan
1308550500: nan nan
1308550800: nan nan
1308551100: nan nan
1308551400: nan nan
So i guess we'll need to delve deeper into MRTG.
And to answer your last question, yes the wizard gave the right max bandwidth for each interface.
Aurelien.
Configuration: Nagios xi 2014R2.5
manually upgraded on 64bits CentOS 6 with nothing extra, no gnome, no proxy, no SSL
Add-ons: Hypermap, Minemap, Ping Action, Traceroute Action, Network Replay, Graph Explorer, Latest Alert
Hmm, well with you getting NaN, and that seems to correlate with the the http log that the switch wizard failed. Have you tried recreating the switch in the wizard, I think thats the root cause of the issue.
I have creted the host using the wizard, so i guess it won't be usefull to recreate it this way.
I got the impression that it is the large number of services that cause the problem, but it's only an impression.
Yesterday morning for no reason all the concerned services were OK then few of them get UNKNOW status again.
When looking in /var/lib/mrtg/ all the rrd file are own by root:root, i guess the wizard create them this way. I did a "chown nagios:user" on them to try but the problem persist.
Configuration: Nagios xi 2014R2.5
manually upgraded on 64bits CentOS 6 with nothing extra, no gnome, no proxy, no SSL
Add-ons: Hypermap, Minemap, Ping Action, Traceroute Action, Network Replay, Graph Explorer, Latest Alert
I just found a bug that is causing the switch wizard to create cfgmaker output that uses 2c as the version, which MRTG won't accept. Can you check your /var/spool/mail/root for warnings from MRTG saying that it doesn't know what to do with the c.
Its possible this may have been thrown on the heap of updates. If its working for you now, then thats great! But if its not:
Add this line to the global your /etc/mrtg/mrtg.cfg , under the ThreshDir line
WorkDir: /var/lib/mrtg
Then delete the config for the switch (in this very same mrtg.cfg) that is not writing to RRD files properly. Then recreate it in Nagios using SNMP v2, not v2c.
I am not sure this is the same issue, but I feel it might be.
I updated to 2011R1.5 this morning and added on a new switch.
After adding the switch and reviewing its services, all of the bandwidth statuses are in an error condition with:
"/var/lib/mrtg/[IP Address]_[index].rrd does not exist."
I never had a problem befor 2011R1.5, and now I'm regretting applying the update.
Observe:
yetanothernagiosissue.JPG
I tried
Add this line to the global your /etc/mrtg/mrtg.cfg , under the ThreshDir line
WorkDir: /var/lib/mrtg
Then delete the config for the switch (in this very same mrtg.cfg) that is not writing to RRD files properly. Then recreate it in Nagios using SNMP v2, not v2c.
But that didn't change anything.
You do not have the required permissions to view the files attached to this post.