RRDCached issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

RRDCached issues

Post by hbouma »

I am getting the following errors in our /var/log/messages related to rrdcached:

Code: Select all

Sep 16 11:49:08 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-var.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-var.rrd: found extra data on update argument: 3.03:4.86)
Sep 16 11:49:09 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-opt.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-opt.rrd: found extra data on update argument: 0.69:0.95)
Sep 16 11:49:10 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /_HOST_.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /_HOST_.rrd: expected 4 data source readings (got 1) from 1600270238)
We are running Nagios XI 5.6.10 on RHEL 7 VMs. Offloaded database with rrdcached 1.4.4
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: RRDCached issues

Post by scottwilkerson »

hbouma wrote:I am getting the following errors in our /var/log/messages related to rrdcached:

Code: Select all

Sep 16 11:49:08 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-var.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-var.rrd: found extra data on update argument: 3.03:4.86)
Sep 16 11:49:09 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-opt.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /Disk_Usage_on__dev_mapper_vg00-opt.rrd: found extra data on update argument: 0.69:0.95)
Sep 16 11:49:10 XXXXXXXXXXXXXXXXXXXX rrdcached[6547]: queue_thread_main: rrd_update_r (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /_HOST_.rrd) failed with status -1. (/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX /_HOST_.rrd: expected 4 data source readings (got 1) from 1600270238)
We are running Nagios XI 5.6.10 on RHEL 7 VMs. Offloaded database with rrdcached 1.4.4
Did you change the command for these to return a different quantity of performance data that was there before? It seems you are getting a different amount of perfdata that the RRDs expect
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: RRDCached issues

Post by hbouma »

This is being pulled from the NCPA agent checks using NCPA version 2.1.5. It is using the built in disk check functionality.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: RRDCached issues

Post by scottwilkerson »

hbouma wrote:This is being pulled from the NCPA agent checks using NCPA version 2.1.5. It is using the built in disk check functionality.
Can you share the command it is using?

Also, could you show a screenshot of the advanced tab for the service as well as a pic of the performance graph for the service

Thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: RRDCached issues

Post by hbouma »

Here is an example of one of the commands:

check_ncpa.py -H HOSTNAME -t 'TOKEN' -P 5693 -M 'disk/logical/|var|log' -w 80 -c 90
2020-09-18 07_17_59-Nagios XI.png
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: RRDCached issues

Post by scottwilkerson »

scottwilkerson wrote:as well as a pic of the performance graph for the service
thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: RRDCached issues

Post by hbouma »

Performance graph is blank.
2020-09-21 07_27_36-Nagios XI.png
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: RRDCached issues

Post by scottwilkerson »

Ok, so this performace graph just has one metric "used" and the current check command you are using contains 3 metrics used, free, total

At some point the command must have changed.

The only way to rectify this is to remove the rrd for this service from

Code: Select all

/usr/local/nagios/share/perfdata/XXXXXXXXXXXXXXXXXXXX/SERVICENAME.rrd
and let it get re-created with all the metrics
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: RRDCached issues

Post by hbouma »

So, this has happened across multiple checks for multiple servers. Is it possible that something messed up the metric info on so many?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: RRDCached issues

Post by scottwilkerson »

hbouma wrote:So, this has happened across multiple checks for multiple servers. Is it possible that something messed up the metric info on so many?
That does seem odd, have you made any changes across them? or commands?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart