Some performance graphs are "randomly" missing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
john.newman
Posts: 6
Joined: Wed Dec 21, 2011 1:26 pm

Some performance graphs are "randomly" missing

Post by john.newman »

All,

This has been an issue for a while now, I would like to get it resolved if possible. Our XI config has the "graph explorer" page, which seems to work pretty well and is a great source of information. However there's one odd issue: certain services under certain hosts just do not show up in here. Given that it is mostly working, I don't think its some sort of permission problem on the monitoring server. Here is a trivialized example of what I am seeing in the graph explorer -> scalable performance graph:

Host A [this is correct]
CPU Load
Disk Usage
Mem Usage

Host B
CPU Load
Mem Usage

Host C
Disk Usage
Mem Usage

Host D
Mem Usage

Host E
CPU Load

etc


Now in the configuration, I only have a total of three services defined. [again, this is a triviailized example]. All three of them, CPU, Mem, Disk have "Retain status information" = ON and "Process perf data" = ON, and these services are simply all applied to one host group, which includes Host A-E. And, Hosts B-E were a copy of A originally. So, how in the world would I see this inconsistency? It doesn't make a whole lot of sense to me... I could see that if I defined a separate service per host and didn't check those, but the three service objects should be distributed across every host in the group exactly the same.

The service detail page looks fine, and all the checks are working. It's just this graph page seems to pick and choose whatever it wants in some odd way. :?: What am I missing?

TIA :geek:
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Some performance graphs are "randomly" missing

Post by mguthrie »

You might have the 1.0 version of the graph explorer, which had a bug like this. Here's the latest version, you can install it through the Admin->Manage Components page. See if it resolves your issue.
You do not have the required permissions to view the files attached to this post.
john.newman
Posts: 6
Joined: Wed Dec 21, 2011 1:26 pm

Re: Some performance graphs are "randomly" missing

Post by john.newman »

uh .. ok. And .. do you work for nagios? can i trust that link? Seems kind of odd to use a random download like that... is there anything on the website or release notes about this?

thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Some performance graphs are "randomly" missing

Post by scottwilkerson »

john.newman wrote:uh .. ok. And .. do you work for nagios? can i trust that link? Seems kind of odd to use a random download like that... is there anything on the website or release notes about this?

thanks
John,

It's safe.

Mike does work for Nagios. Those of us that work for Nagios have bright green names with the Nagios logo above it.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Some performance graphs are "randomly" missing

Post by mguthrie »

John,

The graph explorer component is considered a customer-only download, so if you don't have a support and maintenance contract you won't be able to download the new version. I thought it simpler to post it to the thread directly for simplicity :)
john.newman
Posts: 6
Joined: Wed Dec 21, 2011 1:26 pm

Re: Some performance graphs are "randomly" missing

Post by john.newman »

i see.. ok thanks.

one thing, you should do an svn export instead of a checkout - you gave me the .svn folder. :D

i can figure this out ... but while i have your attention, if you don't mind, where do i put this graphexplorer directory, and do i have to run any chown / chmod/ restart services .. thanks much
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Some performance graphs are "randomly" missing

Post by scottwilkerson »

You can install the whole zip file through the Admin->Manage Components page
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
john.newman
Posts: 6
Joined: Wed Dec 21, 2011 1:26 pm

Re: Some performance graphs are "randomly" missing

Post by john.newman »

ok well that was easy. thanks. I'm glad I asked, I would have spent an hour or two digging through the filesystem messing around. very nice feature there. 8-)

It seems to have fixed the original problem I posted about. All of the service checks are showing up now. (at least it looks like all of them .. there's a few hundred, i'll have to take a closer look and make sure none are missing, but just a first glance looks all there.) 8-)

However. it seems to be including "retired" services. Our configuration has gone through many changes over the past several months. We used to have a "ping" service defined on all of the hosts, which is completely pointless as just defining the __HOST__ effectively creates the ping check. So we've removed that. However now in the graph explorer, some hosts are showing this "Ping" service, some are not.

Actually this may not be a bug, if I roll the filter back to -365 days, there is some perf data there in the graph from a very long time ago. There's some other "retired services" that show up and there's old data there as well. So I guess it's probably not a bug - but is there a way to disable these from showing up, or go in and purge the old perf data for them? Any way to hide these would be nice - but this is not nearly as big of a deal as what I had in the first post.

Perhaps this was intentional, as it's historic data and until _I_ delete it, it's probably correct on your part to continue to present it. I guess it depends on how you look at it, to me the graph explorer should be a 1:1 match with the current service detail list, but perhaps you delibarately include any perf data that is still there.

Thoughts? I'm happy though as at least the original problem is fixed. Thanks for that. :!:
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Some performance graphs are "randomly" missing

Post by mguthrie »

Yeah, the tricky part with the old data is that npcd (the performance processing service) generates performance data based on hostname/service_description, while XI has unique ID's for all hosts and services, so that way you can rename a service and you don't lose any historical data for it. However, upon renaming, a new set of rrd data get's created. Currently we chose to leave the old authorized services there in the event that someone wants to retain the performance data after a name change.

However, if you can clear the expired data by removing the associated rrd and XML files for a particular service. These files are located in:
/usr/local/nagios/share/perfdata/<host_name>/<service_description>.rrd
/usr/local/nagios/share/perfdata/<host_name>/<service_description>.xml

At some point we need to create a "garbage cleaner" type of feature that will allow you to do most of this from the UI, but currently this must be done manually.

Hope that helps!