Custom check with perfdata but no graph

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jvaira
Posts: 59
Joined: Tue Dec 22, 2015 7:40 pm

Custom check with perfdata but no graph

Post by jvaira »

Hello,
I have written a custom check that collects server temperature readings and everything looks to be working except for the performance graph. I am even seeing the performance data in the advanced tab of the service status detail page. Please see attached screenshots for details.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Custom check with perfdata but no graph

Post by tgriep »

First, go to the following folder

Code: Select all

/usr/local/nagios/share/perfdata/ads-lv-node-115
Delete the .rrd and .xml file with the name of the service and that should allow them to be recreated.
Wait for 15 to 30 minutes for them to update in the GUI and see if that allows the graphs to populate with data.

If this does not work, lets enable debugging for performance graphing by doing the following.
Edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
Change

Code: Select all

log_level = 0
To:

Code: Select all

log_level = 2
Save it

Edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change

Code: Select all

LOG_LEVEL = 0
to

Code: Select all

LOG_LEVEL = 2
Save out the file and restart these services by running

Code: Select all

service npcd restart
service nagios restart

Let the system run for 20 to 30 minutes and post the following files here so we can see what the errors are for that Service check when it tries to update the files.

Code: Select all

/usr/local/nagios/var/perfdata.log
/usr/local/nagios/var/npcd.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
jvaira
Posts: 59
Joined: Tue Dec 22, 2015 7:40 pm

Re: Custom check with perfdata but no graph

Post by jvaira »

Hell Tom,
The rrd and xml files did not even exist so I went ahead and just enabled the logging that you mentioned. Attached are the log files.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Custom check with perfdata but no graph

Post by tgriep »

Thanks for the log files.
I did not see any errors for the Temp_Check service but I did see that the performance data applications were timing out.

When the Nagios XI server gets loaded it will stop graphing performance data to keep it running smoothly for it's checks.
Those settings can be edited to keep that from happening. To do this edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
find the Timeout setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

TIMEOUT = 30
Save the file
then edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
find the load_threshold setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

load_threshold = 50.0
Save out the file and restart these services by running

Code: Select all

service npcd restart
service nagios restart
Let it run for 20 to 30 minutes and check that service again.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jvaira
Posts: 59
Joined: Tue Dec 22, 2015 7:40 pm

Re: Custom check with perfdata but no graph

Post by jvaira »

Hello Tom,
After making these changes I am still not seeing data in the graphs. One thing I did notice is that it is not limited to just this check and performance graphing for all checks seems to have stopped around Thursday last week. I am seeing unusually high user cpu usage ( screen shot 1 ) and an apache process that says it is using 650% cpu ( screen shot 2 ). I have already rebooted the machine to see if it would clear that process but is just immediately popped back up. Any ideas?
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Custom check with perfdata but no graph

Post by tgriep »

Could you post your Nagios XI System Profile so we can review it to see if we can find out why the load is so high?
To get your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to the forum post or PM it to me if you do not want to post it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jvaira
Posts: 59
Joined: Tue Dec 22, 2015 7:40 pm

Re: Custom check with perfdata but no graph

Post by jvaira »

Tom,
I was able to resolve the issue with the high user CPU but the load still seems fairly high and is hovering around 9 - 10. I have sent you a PM with the system profile.

Thanks
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Custom check with perfdata but no graph

Post by tgriep »

Is it still the Apache process showing the highest consistent load?
In the Apache error log file, I saw this script running and causing errors.

Code: Select all

/tmp/123.sh
Is this what you found and fixed?

Also, something is running a curl command but the log does not show what it is.


Other than users that are connecting to the XI interface and a Fusion server, I did not see any thing that stands out for Apache load.

You can increase the PHP limits outlined in this article to see if it helps.
https://support.nagios.com/kb/article/n ... e-611.html


The only big thing I see is that the I/O wait is very high. This means that the system is spending a lot of time waiting to write to disk and that causes issues and slowness.
If the system is hosted in a virtual environment, move it to a faster disk subsystem and that will help a lot.

You can add a RAMDisk to the system to move some of the Disk I/O to memory to help the performance of the server. It is not a cure but is should help speed things up.
https://assets.nagios.com/downloads/nag ... giosXI.pdf


When the Nagios XI server gets loaded it will stop graphing performance data to keep it running smoothly for it's checks.
Those settings can be edited to keep that from happening. To do this edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
find the Timeout setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

TIMEOUT = 30
Save the file
then edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
find the load_threshold setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

load_threshold = 50.0
Save out the file and restart these services by running

Code: Select all

systemctl restart npcd
systemctl restart nagios

That should keep the graphing function from stopping on the server when it is loaded.
Be sure to check out our Knowledgebase for helpful articles and solutions!