Nagios Performance graphing - Memory

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
brandont
Posts: 5
Joined: Thu Oct 26, 2023 10:10 am

Nagios Performance graphing - Memory

Post by brandont »

Greetings:

Currently on latest available version, 2024R1.

The issue is *some* of our Windows Servers are not plotting the memory usage in performance graphing. If I change the graphing date to 365-days, it appears this issue happened in June (or it's a false-positive)....

Example below:
Nagios Mem.PNG
Going into Host Status Details, the service status is 'Ok.'
Nagios Mem2.PNG
I'm not finding what the difference is between the servers that are plotting memory correctly and the ones that are not. There's some online user documentation suggesting to delete the "Memory_Usage.rrd" file. Is this the correct approach here? Does this .rrd file get recreated?

Last thing I want to do is hose the monitoring on these servers so nothing is visible...
You do not have the required permissions to view the files attached to this post.
bbahn
Posts: 317
Joined: Thu Jan 12, 2023 5:42 pm

Re: Nagios Performance graphing - Memory

Post by bbahn »

Hello @brandont,

If you delete the .rrd file and the accompanying .xml file, but you will lose your historical data if you do this. If you do go for this plan, I suggest instead moving or renaming the file to preserve that data. This is the path I would go down if your data is corrupted somehow and that is causing the issue, but I think that may not be the issue.

Can you alter /usr/local/nagios/etc/pnp/process_perfdata.cfg to change

Code: Select all

LOG_LEVEL = 0
To:

Code: Select all

LOG_LEVEL = 2
per https://support.nagios.com/kb/article.php?id=9

then after some time, check /usr/local/nagios/var/perfdata.log and look for any errors, exits or returns. These will likely help identify what is going on. It may also help to know what version of NCPA you're using.
Actively advancing awesome answers with ardent alliteration, aptly addressing all ambiguities. Amplify your acumen and avail our amicable assistance. Eagerly awaiting your astute assessments of our advice.
brandont
Posts: 5
Joined: Thu Oct 26, 2023 10:10 am

Re: Nagios Performance graphing - Memory

Post by brandont »

Thanks for the thoughts. A quick follow-up:

Historical data is not extremely important. The .rrd and .xml files, if I were to delete/move those files from the host(s) that have this issue, do these files get recreated? Is there some post-work that needs to be done after-the-fact? Without knowing more, it seems like once I remove those two files, I would break the monitoring of these host altogether and make the problem worse.
brandont
Posts: 5
Joined: Thu Oct 26, 2023 10:10 am

Re: Nagios Performance graphing - Memory

Post by brandont »

The 'Log Level' is already at '2.'
The NCPA agent installed is 2.4.0.

Here's sample output of a webserver that is having the issues:

2023-12-18 10:08:15 [57912] [1] Found Performance Data for <WebServerHost> / Memory_Usage ('available'=11.63GiB;;; 'total'=16.00GiB;;; 'percent'=27.30%;90;95; 'free'=11.63GiB;;; 'used'=4.37GiB;;;)
2023-12-18 10:08:15 [57912] [2] No Custom Template found for check_xi_ncpa (/usr/local/nagios/etc/pnp/check_commands/check_xi_ncpa.cfg)
2023-12-18 10:08:15 [57912] [2] RRD Datatype is GAUGE
2023-12-18 10:08:15 [57912] [2] Template is check_xi_ncpa.php
2023-12-18 10:08:15 [57912] [2] No Custom Template found for check_xi_ncpa (/usr/local/nagios/etc/pnp/check_commands/check_xi_ncpa.cfg)
2023-12-18 10:08:15 [57912] [2] RRD Datatype is GAUGE
2023-12-18 10:08:15 [57912] [2] Template is check_xi_ncpa.php
2023-12-18 10:08:15 [57912] [2] No Custom Template found for check_xi_ncpa (/usr/local/nagios/etc/pnp/check_commands/check_xi_ncpa.cfg)
2023-12-18 10:08:15 [57912] [2] RRD Datatype is GAUGE
2023-12-18 10:08:15 [57912] [2] Template is check_xi_ncpa.php
2023-12-18 10:08:15 [57912] [2] No Custom Template found for check_xi_ncpa (/usr/local/nagios/etc/pnp/check_commands/check_xi_ncpa.cfg)
2023-12-18 10:08:15 [57912] [2] RRD Datatype is GAUGE
2023-12-18 10:08:15 [57912] [2] Template is check_xi_ncpa.php
2023-12-18 10:08:15 [57912] [2] No Custom Template found for check_xi_ncpa (/usr/local/nagios/etc/pnp/check_commands/check_xi_ncpa.cfg)
2023-12-18 10:08:15 [57912] [2] RRD Datatype is GAUGE
2023-12-18 10:08:15 [57912] [2] Template is check_xi_ncpa.php
2023-12-18 10:08:15 [57912] [2] data2rrd called
2023-12-18 10:08:15 [57912] [2] RRDs::update /usr/local/nagios/share/perfdata/<WebServerHost>/Memory_Usage.rrd 1702915672:11.63:16.00:27.30:11.63:4.37
2023-12-18 10:08:15 [57912] [1] RRDs::update ERROR /usr/local/nagios/share/perfdata/<WebServerHost>/Memory_Usage.rrd: found extra data on update argument: 27.30:11.63:4.37
2023-12-18 10:08:15 [57912] [2] Processing Line 36
2023-12-18 10:08:15 [57912] [2] Datatype set to 'SERVICEPERFDATA'
User avatar
swolf
Developer
Posts: 361
Joined: Tue Jun 06, 2017 9:48 am

Re: Nagios Performance graphing - Memory

Post by swolf »

If you don't need the historical data, deleting/moving the .rrd files should be okay. The other thing I'd check is the Service Status page for any service where you'd expect a graph but don't have one. Go to the "Advanced" tab and look at the table called "Advanced Status Details" - at the bottom there should be an entry for "Performance Data". If that entry has text (specifically of the form label=value[unit];value;value;value...), moving the RRD file should cause it to regenerate with the correct schema. If you don't have an entry there, the check will need to be reconfigured so that it does produce useful performance data.

Hopefully this helps - let me know what you find feel free to ask about any additional questions or concerns
-Sebastian
Developer @ Nagios 2017-05-15 thru 2024-08-06
helgaella99
Posts: 2
Joined: Tue May 21, 2024 3:08 am

Re: Nagios Performance graphing - Memory

Post by helgaella99 »

It sounds frustrating to deal with inconsistent memory usage plotting on your Windows Servers. Since you've noticed that changing the graphing date reveals the issue started around June, it might be related to a specific update or configuration change that occurred around that time.

One thing to check is whether there are any updates or patches applied to Drift Hunters your servers around June that could have affected the performance monitoring tools. Also, ensure that all servers are using compatible and updated monitoring agents, as discrepancies in agent versions can sometimes lead to issues like the one you're experiencing.

If these steps don't resolve the issue, you might want to consider reaching out to the support community for the specific monitoring tool you're using, as they might have more targeted advice or patches for known issues.