Performance graph gaps

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Performance graph gaps

Post by nosajche »

Hello,

We are running into some issues with gaps our performance graphs.

All services are showing green but for hours at a time, chunks of data are missing from our performance graphs, typically off regular working hours. We have followed the instructions found in the documentation and made a few changes but still have not pinned down what is going on.

We took the following actions:
  • 1. Upped the verbosity of both NPCD and perfdata
    2. Confirmed the nagios account has not expired
    3. We noted errors re: load threshhold and adjusted the load_threshold of NPCD to 20 and restarted NPCD.

Here are the spooled files count-- it doesn't meet the 20k number cited in the article.

Code: Select all

$ ls /usr/local/nagios/var/spool/perfdata/ | wc -l
2
$ ls /usr/local/nagios/var/spool/xidpe/ | wc -l
4707
From perfdata.log, logging stops being written to it exactly when the missing data starts on the GUI.

From npcd.log, we are seeing the following for every check:

Code: Select all

[10-09-2020 11:17:32] NPCD: ThreadCounter 0/5 File is 1599774829.perfdata.service-PID-15586
[10-09-2020 11:17:32] NPCD: File '1599774829.perfdata.service-PID-15586' is an already in process PNP file. Leaving it untouched.
[10-09-2020 11:17:32] NPCD: DEBUG: load 1.970000/20.000000
[10-09-2020 11:17:32] NPCD: ThreadCounter 0/5 File is 1600195788.perfdata.host-PID-20283
[10-09-2020 11:17:32] NPCD: File '1600195788.perfdata.host-PID-20283' is an already in process PNP file. Leaving it untouched.
Additionally, we saw some of the following errors in messages.log:

Code: Select all

Oct  9 06:36:57 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1602239817.perfdata.host" - errno: Cannot allocate memory
Oct  9 06:37:11 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1602239831.perfdata.service" - errno: Cannot allocate memory
Oct  9 06:37:12 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1602239831.perfdata.host" - errno: Cannot allocate memory
Oct  9 06:37:27 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1602239847.perfdata.service" - errno: Cannot allocate memory
Oct  9 06:37:27 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1602239847.perfdata.host" - errno: Cannot allocate memory


However, we have confirmed that we did not stress the memory on the ESXi. The VM has 4 CPU and 8 GB RAM for reference.

Any ideas where we should be looking to resolve?

Thanks,
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: Performance graph gaps

Post by nosajche »

Hello,

Did some more digging and found some PHP errors due to low memory_limit:

Code: Select all

PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 79 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 354 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
We did the following:
  • 1. Deleted all files in: /usr/local/nagios/var/spool/perfdata folder.
    2. Changed php.ini memory_limit to 1024 MB.
    3. Restarted httpd service
    4. Restarted perfdataproc.php from nagios user
After these steps, data started graphing again.

A few follow up questions:
--What causes the perfdata pile-up in that location-- is there a way to detect that?
--Is there a general guidance for how we should optimize Nagios XI for deployments with a larger number of services/hosts?
--Are there any other dependency settings other than PHP that need to be tweaked to optimize for the VM and intended environment size?


Thanks.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Performance graph gaps

Post by cdienger »

The process that move files from the xidpe folder to the perfdata is a php job so reaching a php limit would explain why it would be failing to move things from that directory. It likely impacts the the next step which is for process_perfdata.pl to process the contents of the perfdata directory. I've attached a chart showing the flow of performance data.

Tweaking the PHP limits is a common recommendation. Check out https://support.nagios.com/kb/article/n ... e-611.html which covers increasing the memory limit as well as a few more settings in the php.ini.

https://assets.nagios.com/downloads/nag ... ios-XI.pdf covers some other performance tweaks for the XI system. I usually recommend at least following the steps to add a ramdisk for perfdata.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: Performance graph gaps

Post by nosajche »

Thanks for this info.

Everything was working fine for a few hours but the graphs stopped generating a few hours later. However, this time the logs do not mention any PHP errors and there are no files in the xidpe or perfdata folders.

There is only the following message in the messages log:

Code: Select all

Oct 10 01:00:39 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1602306038.perfdata.host" - errno: Cannot allocate memory
Oct 10 01:00:39 dltfanxi1 nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1602306039.perfdata.service" - errno: Cannot allocate memory
The httpd log does not have any errors but I increased the memory limit in PHP and restarted httpd anyway but am still getting the same problem....

Code: Select all

PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 250 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 541
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 81 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 551
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 79 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 79 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 81 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 551
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 79 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 79 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 550
PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 32 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 541
PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 32 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 541
PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 32 bytes) in /data/local/nagiosxi/cron/perfdataproc.php on line 541
These errors populate as soon as you try to restart the perfdataproc.php.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Performance graph gaps

Post by cdienger »

How large are are the perfdata files under /usr/localnagios/var/ ? Try removing them with:

Code: Select all

systemctl stop nagios
mv /usr/local/nagios/var/host-perfdata ~
mv /usr/local/nagios/var/service-perfdata ~
systemctl start nagios
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: Performance graph gaps

Post by nosajche »

The perfdata files are pretty small:

Code: Select all

$ ll -h | grep perfdata
-rw-r--r-- 1 nagios nagios 6.4K Oct 15 10:10 host-perfdata
-rw-rw-r-- 1 nagios nagios 5.7M Oct 10 14:04 perfdata.log
-rw-r--r-- 1 nagios nagios 116K Oct 15 10:10 service-perfdata

The same cycle keeps happening--

1. PHP runs out of memory (even though its been increase to 2 GB on an 8 GB memory VM)
2. perdataproc.php stops running
3. /var/spool/xidpe/ increases and never processes the files

Deleting the xdipe folder contents and restarting the perdataproc.php process as nagios user works for a few hours, and then the cycle repeats.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance graph gaps

Post by tgriep »

Ticket open for this issue so we'll work through the issue there. Closing this post.
Be sure to check out our Knowledgebase for helpful articles and solutions!