NCPA 2.4.0 agent issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sahilrana
Posts: 32
Joined: Sat Feb 20, 2021 6:55 am

NCPA 2.4.0 agent issue

Post by sahilrana »

We updated the NPCA agent to the latest version 2.4.0 last week and we have found an issue. The agent is fetching RAM data from the VMs.
All other data is being fetched correctly. It's only the RAM data from the VMs not being fetched.
Is this a known issue?

We installed the older NCPA agent 2.3.1 and it started fetching the RAM data so there is some issue with 2.4.0 agent.
User avatar
kfanselow
Posts: 247
Joined: Tue Aug 31, 2021 3:25 pm

Re: NCPA 2.4.0 agent issue

Post by kfanselow »

Hi sahilrana,

Are the checks failing in their entirety or are you seeing a performance graph issue similar to this ?

https://github.com/NagiosEnterprises/ncpa/issues/845

Also could you provide the output from a "Run Check Command" with the token redacted as in the example below:

Navigate via configure (top) -> Core Config Manager -> Serivces (left).

Find the "Memory Usage" service for one of the hosts with 2.4.0 installed and select Edit (wrench icon on the right side). The click on the run command button and it's follow up prompt . The output should look like this (note the token and IP have been redacted from the string)

Code: Select all

[nagios@kf-centos-79 ~]$ /usr/local/nagios/libexec/check_ncpa.py -H REDACTED -t 'REDACTED' -P 5693 -M memory/virtual -u 'Gi' -w '50' -c '80'
CRITICAL: Memory usage was 88.90 % (Available: 0.40 GiB, Total: 3.65 GiB, Free: 0.13 GiB, Used: 2.85 GiB) | 'available'=0.40GiB;;; 'total'=3.65GiB;;; 'free'=0.13GiB;;; 'used'=2.85GiB;;;
Thanks and Best Regards,
Keith
sahilrana
Posts: 32
Joined: Sat Feb 20, 2021 6:55 am

Re: NCPA 2.4.0 agent issue

Post by sahilrana »

Hi Keith,

Yes, it's the same issue as mentioned in the link. RAM data is not there in the performance graphs.

Here is the output.

[nagios@abc.domain.com ~]$ /usr/local/nagios/libexec/check_ncpa.py -H x.x.x.x -T 119 -t 'token' -P 5693 -M memory/virtual -u 'Gi' -w '80' -c '90'
OK: Memory usage was 26.50 % (Available: 23.53 GiB, Total: 32.00 GiB, Free: 23.53 GiB, Used: 8.47 GiB) | 'available'=23.53GiB;;; 'total'=32.00GiB;;; 'percent'=26.50%;80;90; 'free'=23.53GiB;;; 'used'=8.47GiB;;;
User avatar
kfanselow
Posts: 247
Joined: Tue Aug 31, 2021 3:25 pm

Re: NCPA 2.4.0 agent issue

Post by kfanselow »

Hi sahilrana,

Thanks for confirming. I was able to replicate the difference in output and we suspect the issue may have to do with a mismatch in the number of inputs for the existing round robin database. Could you confirm which version of Nagios XI you are using ?


Thanks and Best Regards,
Keith
sahilrana
Posts: 32
Joined: Sat Feb 20, 2021 6:55 am

Re: NCPA 2.4.0 agent issue

Post by sahilrana »

Hi Keith,

NagiosXi version is 5.8.7. I think its the latest one.
You do not have the required permissions to view the files attached to this post.
User avatar
kfanselow
Posts: 247
Joined: Tue Aug 31, 2021 3:25 pm

Re: NCPA 2.4.0 agent issue

Post by kfanselow »

Hi sahilrana,

I'm filing a bug report on the issue. After discussing it with our developers there are two options in the mean time:

1) Stay at NCPA version 2.3.1

2) You can remove the rrd and xml file for the memory usage graphs and it should start over with the updated number of data sources.

If you would like to use the second option you can find the rrd and xml files in the host subdirectory of perfdata on your XI server ( see below - change HOSTNAME to the hostname or IP of the remote system )

Code: Select all

/usr/local/nagios/share/perfdata/HOSTNAME
For example:

Code: Select all

 
rm /usr/local/nagios/share/perfdata/10.1.2.3/Memory_Usage.rrd
rm /usr/local/nagios/share/perfdata/10.1.2.3/Memory_Usage.xml 


Hope this is useful.

Thanks and Best Regards,
Keith
sahilrana
Posts: 32
Joined: Sat Feb 20, 2021 6:55 am

Re: NCPA 2.4.0 agent issue

Post by sahilrana »

Hi Keith,

I used the second option and I am getting the error that the files donot exist. I am assuming the hostname or IP addresss in the command is of the remote server where agent is installed.
The server I tried has 2.4.0 agent installed. Please see the attached error.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA 2.4.0 agent issue

Post by ssax »

The directory name is based on the host name, is the host name in XI for this server the IP address or something else?
- It's likely something else so you'd need to use that something else in place of THEHOSTNAME in the commands below

Code: Select all

rm /usr/local/nagios/share/perfdata/THEHOSTNAME/Memory_Usage.rrd
rm /usr/local/nagios/share/perfdata/THEHOSTNAME/Memory_Usage.xml 
sahilrana
Posts: 32
Joined: Sat Feb 20, 2021 6:55 am

Re: NCPA 2.4.0 agent issue

Post by sahilrana »

I tried both with IP address and hostname.
In anycase this is not a feasible alternate since this command is required to be run against all hostnames, right?

For now, I am rolling back to the previous version (2.3.1).
Is there any expected time this bug issue resolution for 2.4.0 agent?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA 2.4.0 agent issue

Post by ssax »

We're unable to give an ETA at this time, development is aware of the issue though.

Development would be alerted to the bug report updates as well:

https://github.com/NagiosEnterprises/ncpa/issues/845

This would technically fix it but don't run this:

https://support.nagios.com/kb/article/n ... g-149.html

But because the ordering of them is different the resulting data will not be correct:

Code: Select all

2.3.x: | 'available'=0.89GiB;;; 'total'=1.80GiB;;;  'free'=0.17GiB;;; 'used'=0.65GiB;;;
2.4.0: | 'available'=0.89GiB;;; 'total'=1.80GiB;;; 'percent'=50.50%;80;90; 'free'=0.17GiB;;; 'used'=0.65GiB;;;
What it would do is add a datasource to the RRD on the end and then all data would be shifted over, so the new percent one would have the old free data which would through mess up the data.

Usually if you look in the .xml file if you have issues, this section would have an error in it:

Code: Select all

  <RRD>
    <RC>0</RC>
    <TXT>successful updated</TXT>
  </RRD>