Hello -
I have a Nagios Core (4.3.4) + php4nagios (0.6.25) on running on RHEL 7.3 VM and every now and then ALL the php4nagios graphs will stop updating. It's pretty random, say once every two weeks, random day/time. It's fixed by restarting Nagios [service nagios stop | start].
All other Nagios functions seem to be running fine when this happens.
I've setup a Nagios check to detect when it happen using a file age check on /usr/local/pnp4nagios/var/perfdata/localhost/_HOST_.xml which should get updated every check cycle (60sec in this case). So when this file hasn't been updated after 5mins it cuts a CRITICAL alert and I'll find php4nagios graphs aren't updating.
Any suggestions on where to go from here? Is it Nagios or php4nagios that updates the .xml files?
Regards
Matt
pnp4nagios graphs randomly stop updating
-
- Posts: 9
- Joined: Mon Mar 13, 2017 9:50 pm
-
- Posts: 3739
- Joined: Thu May 05, 2016 3:54 pm
Re: pnp4nagios graphs randomly stop updating
Here's the official documentation regarding performance data:
https://assets.nagios.com/downloads/nag ... fdata.html
pnp4nagios consumes that file and stuffs it into RRDs (which are tied to the xml files you mentioned). Those RRDs are what pnp4nagios reads from when it generates graphs.
You might bump-up the logging level for pnp4nagios's process_perfdata.pl script and keep an eye on the logfile. Where that file is located depends on how you setup pnp4nagios, but the name is process_perfdata.cfg.
If you're using NPCD, you can bump up some logging for that as well.
https://assets.nagios.com/downloads/nag ... fdata.html
Nagios writes performance data to the file specified by the service_perfdata_file and host_perfdata_file directives in your main nagios.cfg file. The format in which Nagios writes data to those files is specified by the host_perfdata_file_template and service_perfdata_file_template directives in that same file.agentdavidson wrote:Is it Nagios or php4nagios that updates the .xml files?
pnp4nagios consumes that file and stuffs it into RRDs (which are tied to the xml files you mentioned). Those RRDs are what pnp4nagios reads from when it generates graphs.
You might bump-up the logging level for pnp4nagios's process_perfdata.pl script and keep an eye on the logfile. Where that file is located depends on how you setup pnp4nagios, but the name is process_perfdata.cfg.
If you're using NPCD, you can bump up some logging for that as well.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- Posts: 9117
- Joined: Mon Sep 23, 2013 8:40 am
-
- Posts: 9
- Joined: Mon Mar 13, 2017 9:50 pm
Re: pnp4nagios graphs randomly stop updating
Thanks for the info. I've found that when the graphs stop updating /usr/local/pnp4nagios/var/perfdata.log stops updating also.
For example this morning the graphs stopped updating around 3:23am and the log file also comes to a halt at that time.
[root@zpredvmnet1:/usr/local/pnp4nagios/var] tail perfdata.log
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0020031/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4437
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0012001 / _HOST_ (rta=21.936001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
2018-02-27 03:23:33 [30855] [2] data2rrd called
2018-02-27 03:23:33 [30855] [2] RRDs::update /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd 1519654605:21.936001:0
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4438
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0052501 / _HOST_ (rta=16.988001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
It's like Nagios just stops calling the php4nagios script.
I'll bump up the logging level as suggested as see what falls out. [EDIT - seems I'm already @ level 2 (debug)]
For example this morning the graphs stopped updating around 3:23am and the log file also comes to a halt at that time.
[root@zpredvmnet1:/usr/local/pnp4nagios/var] tail perfdata.log
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0020031/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4437
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0012001 / _HOST_ (rta=21.936001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
2018-02-27 03:23:33 [30855] [2] data2rrd called
2018-02-27 03:23:33 [30855] [2] RRDs::update /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd 1519654605:21.936001:0
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4438
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0052501 / _HOST_ (rta=16.988001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
It's like Nagios just stops calling the php4nagios script.
I'll bump up the logging level as suggested as see what falls out. [EDIT - seems I'm already @ level 2 (debug)]
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: pnp4nagios graphs randomly stop updating
I am not familiar with how that version of pnp works, does it have config files like a npcd.cfg? does is run the npcd daemon? If so there should be logs for that as well
Can you share any of the configs?
I did some searching on the web and did see some similar responses but didn't see a solution. Did you ask the pnp4nagios creators?
Can you share any of the configs?
I did some searching on the web and did see some similar responses but didn't see a solution. Did you ask the pnp4nagios creators?
-
- Posts: 9
- Joined: Mon Mar 13, 2017 9:50 pm
Re: pnp4nagios graphs randomly stop updating
I'm running PHP4Nagios in Bulk Mode (without NCPD or npcdmod) and the doco on the PNP4Nagios site (http://docs.pnp4nagios.org/_detail/bulk ... 6%3Aconfig) suggests that it is the Nagios process that calls process_perfdata.pl.
The behaviour I'm seeing looks a lot like Nagios just stops calling process_perfdata.pl.
And that I can kick things back into life with a restart of Nagios also seems to point to a Nagios issue, despite all other Nagios functionality (polling, alerting, perfdata etc) working fine at the time.
I am still looking into this as time permits and will share any progress or findings.
Cheers.
The behaviour I'm seeing looks a lot like Nagios just stops calling process_perfdata.pl.
And that I can kick things back into life with a restart of Nagios also seems to point to a Nagios issue, despite all other Nagios functionality (polling, alerting, perfdata etc) working fine at the time.
I am still looking into this as time permits and will share any progress or findings.
Cheers.
-
- Madmin
- Posts: 9190
- Joined: Thu Oct 30, 2014 9:02 am
Re: pnp4nagios graphs randomly stop updating
Can you post the following files so we can check the settings?
When the graphs stop, do you see any errors in the nagios.log file?
Code: Select all
process_perfdata.cfg
nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 9
- Joined: Mon Mar 13, 2017 9:50 pm
Re: pnp4nagios graphs randomly stop updating
I've some further information to hand. I found that in /var/log/messages the issue seems to be preceded by increasing numbers of the following errors.
They start popping up at an irregular frequency between 5-15mins but by the time the graphs stop updating they are happening every 15secs which aligns with the service_perfdata_file_processing_interval and host_perfdata_file_processing_interval settings in nagios.cfg
In other words, it degrades to a point where it is balking every time it runs.
The other interesting aspect is that when I find the graphs have stopped updating if I run the process_perfdata.pl commands manually the graphs will get updated to current time. Takes a good few mins but they eventually catch up to current time. I can then stop | start nagios and I don't get get left with gaps in the graphs.
I can post process_perfdata.cfg and nagios.cfg if that is required but they are pretty much straight out of the box.
Code: Select all
Jun 5 19:19:37 myserver nagios: Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata" - errno: Cannot allocate memory
Jun 5 19:19:37 myserver nagios: Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata" - errno: Cannot allocate memory
In other words, it degrades to a point where it is balking every time it runs.
The other interesting aspect is that when I find the graphs have stopped updating if I run the process_perfdata.pl commands manually the graphs will get updated to current time. Takes a good few mins but they eventually catch up to current time. I can then stop | start nagios and I don't get get left with gaps in the graphs.
I can post process_perfdata.cfg and nagios.cfg if that is required but they are pretty much straight out of the box.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: pnp4nagios graphs randomly stop updating
This looks like the system is either out of memory or the pnp4nagios you are using cannot allocate it properly.
This would be a question for the pnp4nagios developers. We are not the developers for pnp4nagios.
This would be a question for the pnp4nagios developers. We are not the developers for pnp4nagios.