Hi team,
We are getting the issue with service-perfdata.out file usage is more. Could you please suggest how to decrease this.
In our nagios core pnp4 nagios is also configured.
[root@1 nagios]# cd var/
[root@ var]# ls
archives nagios.configtest nagios.log objects.precache rw spool
host-perfdata.out nagios.lock objects.cache retention.dat service-perfdata.out status.dat
[root@1 var]# du -sh *
0 archives
35G host-perfdata.out
4.0K nagios.configtest
4.0K nagios.lock
421M nagios.log
2.7M objects.cache
2.7M objects.precache
4.2M retention.dat
40K rw
598G service-perfdata.out
8.0K spool
4.2M status.dat
[root@1 var]# pwd
/usr/local/nagios/var
service-perfdata.out got increasing huge size
-
- Posts: 222
- Joined: Thu Jul 06, 2017 8:55 am
-
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: service-perfdata.out got increasing huge size
Hello @grayloglearn,
How long has this been happening and what is the status of npcd?
If it's not running, you can re-start it but with that many files backed up, you'll need to remove them otherwise the server will not be able to process them and keep up with incoming files.
It's also possible that the load on the server is too high, hitting the max threshold settings causing the files to spool up.
How long has this been happening and what is the status of npcd?
Code: Select all
systemctl status npcd.service
# or
service npcd status
It's also possible that the load on the server is too high, hitting the max threshold settings causing the files to spool up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 222
- Joined: Thu Jul 06, 2017 8:55 am
Re: service-perfdata.out got increasing huge size
Hi team,
we have restarted the npcd service. but even though we are able to see the same usage.
[root@XXXX var]# tail +n 50 service-perfdata.out
tail: cannot open ‘+n’ for reading: No such file or directory
tail: cannot open ‘50’ for reading: No such file or directory
==> service-perfdata.out <==
599G service-perfdata.out. just want to check with you how to trim the data in this files but when we see the service-perfdata.out first 50 lines we are able to see today entry only then how service-perfdata.out is in huge not understand.
1568080719 app_fraepkdb fs_/oracle/stage OK 1 HARD 0.000 0.740 OK - 0.0% used (0.00 of 9.8 GB), (levels at 80.0/90.0%), trend: 0.00B / 24 hours /oracle/stage=1.8515625MB;8000;9000;0;10000.0
1568080719 app_fraepkdb lparstat OK 1 HARD 0.000 0.742 OK: AIX lparstat, user=2.6% sys=2.2% wait=0.1% idle=95.1% physc=0.02 app=1246867123 user=2.6%;;;; sys=2.2%;;;; wait=0.1%;;;; idle=95.1%;;;; physc=0.02;;;; entc=10.5%;;;; lbusy=0.4;;;; app=1246867123;;;;
1568080719 de-ffm-iproxy02 Number of threads OK 1 HARD 0.000 0.727 OK - 224 threads threads=224;2000.0;4000.0;0;
1568080719 app_fraepkdb fs_/tmp OK 1 HARD 0.000 0.740 OK - 25.3% used (2.03 of 8.0 GB), (levels at 80.0/90.0%), trend: +907.81B / 24 hours /tmp=2074.5703125MB;6553;7372;0;8192.0
1568080719 app_fraepkdb fs_/var OK 1 HARD 0.000 0.742 OK - 70.2% used (0.88 of 1.2 GB), (levels at 80.0/90.0%), trend: +115.89KB / 24 hours /var=898.44921875MB;1024;1152;0;1280.0
1568080719 app_fraepkdb Check_MK OK 1 HARD 0.093 0.000 OK - Agent version 1.1.10, execution time 0.1 sec execution_time=0.064
1568080719 htpmsdbtest session-usage OK 1 HARD 0.136 0.000 OK - 11.69% of session resources usedsession_usage=11.69%;80;100
1568080719 app_fraepkdb fs_/usr OK 1 HARD 0.000 0.742 OK - 79.7% used (5.53 of 6.9 GB), (levels at 80.0/90.0%), trend: +544.53KB / 24 hours /usr=5661.95703125MB;5683;6393;0;7104.0
1568080719 de-ffm-iproxy02 fs_/usr OK 1 HARD 0.000 0.732 OK - 25.9% used (2.52 of 9.7 GB), (levels at 80.0/90.0%), trend: +5.08B / 24 hours /usr=2581.40625MB;7961;8956;0;9951.3046875
1568080719 PROVISDBPROD process-usage OK 1 HARD 0.140 0.000 OK - 49.33% of process resources usedprocess_usage=49.33%;80;100
Please suggest how we can resolve this.
we have restarted the npcd service. but even though we are able to see the same usage.
[root@XXXX var]# tail +n 50 service-perfdata.out
tail: cannot open ‘+n’ for reading: No such file or directory
tail: cannot open ‘50’ for reading: No such file or directory
==> service-perfdata.out <==
599G service-perfdata.out. just want to check with you how to trim the data in this files but when we see the service-perfdata.out first 50 lines we are able to see today entry only then how service-perfdata.out is in huge not understand.
1568080719 app_fraepkdb fs_/oracle/stage OK 1 HARD 0.000 0.740 OK - 0.0% used (0.00 of 9.8 GB), (levels at 80.0/90.0%), trend: 0.00B / 24 hours /oracle/stage=1.8515625MB;8000;9000;0;10000.0
1568080719 app_fraepkdb lparstat OK 1 HARD 0.000 0.742 OK: AIX lparstat, user=2.6% sys=2.2% wait=0.1% idle=95.1% physc=0.02 app=1246867123 user=2.6%;;;; sys=2.2%;;;; wait=0.1%;;;; idle=95.1%;;;; physc=0.02;;;; entc=10.5%;;;; lbusy=0.4;;;; app=1246867123;;;;
1568080719 de-ffm-iproxy02 Number of threads OK 1 HARD 0.000 0.727 OK - 224 threads threads=224;2000.0;4000.0;0;
1568080719 app_fraepkdb fs_/tmp OK 1 HARD 0.000 0.740 OK - 25.3% used (2.03 of 8.0 GB), (levels at 80.0/90.0%), trend: +907.81B / 24 hours /tmp=2074.5703125MB;6553;7372;0;8192.0
1568080719 app_fraepkdb fs_/var OK 1 HARD 0.000 0.742 OK - 70.2% used (0.88 of 1.2 GB), (levels at 80.0/90.0%), trend: +115.89KB / 24 hours /var=898.44921875MB;1024;1152;0;1280.0
1568080719 app_fraepkdb Check_MK OK 1 HARD 0.093 0.000 OK - Agent version 1.1.10, execution time 0.1 sec execution_time=0.064
1568080719 htpmsdbtest session-usage OK 1 HARD 0.136 0.000 OK - 11.69% of session resources usedsession_usage=11.69%;80;100
1568080719 app_fraepkdb fs_/usr OK 1 HARD 0.000 0.742 OK - 79.7% used (5.53 of 6.9 GB), (levels at 80.0/90.0%), trend: +544.53KB / 24 hours /usr=5661.95703125MB;5683;6393;0;7104.0
1568080719 de-ffm-iproxy02 fs_/usr OK 1 HARD 0.000 0.732 OK - 25.9% used (2.52 of 9.7 GB), (levels at 80.0/90.0%), trend: +5.08B / 24 hours /usr=2581.40625MB;7961;8956;0;9951.3046875
1568080719 PROVISDBPROD process-usage OK 1 HARD 0.140 0.000 OK - 49.33% of process resources usedprocess_usage=49.33%;80;100
Please suggest how we can resolve this.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: service-perfdata.out got increasing huge size
You can truncate it by running
But finding out why it isn't getting reaped by your pnp installation is a different matter
Can you show the output of
Do you get graphs in PNP4nagios ? Is it setup correctly to read these files?
Have you considered Nagios XI that has graphing already setup properly?
https://www.nagios.com/products/nagios-xi/
Code: Select all
cat /dev/null > service-perfdata.out
Can you show the output of
Code: Select all
grep perfdata /usr/local/nagios/etc/nagios.cfg
ls -al /usr/local/nagios/var|grep perfdata
Have you considered Nagios XI that has graphing already setup properly?
https://www.nagios.com/products/nagios-xi/
-
- Posts: 222
- Joined: Thu Jul 06, 2017 8:55 am
Re: service-perfdata.out got increasing huge size
Hi Team,
thanks for the reply, Need some help we have observed that file contain data from aug 2016. Could you please suggest command to remove the data from aug 2016 to dec 2016.
thanks for the reply, Need some help we have observed that file contain data from aug 2016. Could you please suggest command to remove the data from aug 2016 to dec 2016.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: service-perfdata.out got increasing huge size
I don't even know if this files is being used by any systems. With it being 598G I doubt it, but if you want to just keep the last xxxx line you could run something like this
Code: Select all
tail - xxxx service-perfdata.out > service-perfdata.out_new
mv service-perfdata.out_new service-perfdata.out
-
- Posts: 222
- Joined: Thu Jul 06, 2017 8:55 am
Re: service-perfdata.out got increasing huge size
I just want to give some details hope this details will help you to figure it out the problem
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}
# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}
#
# Bulk with NPCD mode
#
define command {
command_name process-service-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$
}
define command {
command_name process-host-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$
The red colored/Bold is having 600GB now. We don't know how to resolve this.
But we will delete the 2016 data in service-perfdata.out then we will check it out how its working. For this i need command to remove the data from 2016 jan to 2016 dec. So that we wll check , could you please help with such commands. We have tried but we could not find the right command.
you ask about some details but we did not provide now we are providing the details please check.
[root@XXXXX ~]# grep perfdata /usr/local/nagios/etc/nagios.cfg
perfdata_timeout=5
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
#host_perfdata_file=/usr/local/nagios/var/host-perfdata
#service_perfdata_file=/usr/local/nagios/var/service-perfdata
#host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTION TIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICE DESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDA TA$
#host_perfdata_file_mode=a
#service_perfdata_file_mode=a
#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0
#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file
# These options determine wether the core will process empty perfdata
# If you don't require empty perfdata - saving some cpu cycles
#host_perfdata_process_empty_results=1
#service_perfdata_process_empty_results=1
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP E::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTAT ETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$H OSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHO STSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file
[root@XXXX~]# ls -al /usr/local/nagios/var|grep perfdata
-rw-r--r-- 1 nagios nagios 37084081895 Sep 18 09:12 host-perfdata.out
-rw-r--r-- 1 nagios nagios 647629058601 Sep 18 09:12 service-perfdata.out
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}
# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}
#
# Bulk with NPCD mode
#
define command {
command_name process-service-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$
}
define command {
command_name process-host-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$
The red colored/Bold is having 600GB now. We don't know how to resolve this.
But we will delete the 2016 data in service-perfdata.out then we will check it out how its working. For this i need command to remove the data from 2016 jan to 2016 dec. So that we wll check , could you please help with such commands. We have tried but we could not find the right command.
you ask about some details but we did not provide now we are providing the details please check.
[root@XXXXX ~]# grep perfdata /usr/local/nagios/etc/nagios.cfg
perfdata_timeout=5
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
#host_perfdata_file=/usr/local/nagios/var/host-perfdata
#service_perfdata_file=/usr/local/nagios/var/service-perfdata
#host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTION TIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICE DESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDA TA$
#host_perfdata_file_mode=a
#service_perfdata_file_mode=a
#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0
#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file
# These options determine wether the core will process empty perfdata
# If you don't require empty perfdata - saving some cpu cycles
#host_perfdata_process_empty_results=1
#service_perfdata_process_empty_results=1
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP E::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTAT ETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$H OSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHO STSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file
[root@XXXX~]# ls -al /usr/local/nagios/var|grep perfdata
-rw-r--r-- 1 nagios nagios 37084081895 Sep 18 09:12 host-perfdata.out
-rw-r--r-- 1 nagios nagios 647629058601 Sep 18 09:12 service-perfdata.out
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: service-perfdata.out got increasing huge size
Do you have something on your system that is doing something with the file you are appending to at : ?
/usr/local/nagios/var/host-perfdata.out
/usr/local/nagios/var/service-perfdata.out
/usr/local/nagios/var/host-perfdata.out
/usr/local/nagios/var/service-perfdata.out
-
- Posts: 222
- Joined: Thu Jul 06, 2017 8:55 am
Re: service-perfdata.out got increasing huge size
Actually we are also surprised after seeing service-perfdata.out file. We are really not sure why the file is gradually increasing.
Not aware of appending data too.
But once we open the file using timestamp we could see that file consist the data from 2016 so finally i decided that i want to remove the 2016 data in that file so that we can do some free on that file.
If any chance to give the command to remove the only 2016 jan to 2016 dec data in service-perdata.out??
Not aware of appending data too.
But once we open the file using timestamp we could see that file consist the data from 2016 so finally i decided that i want to remove the 2016 data in that file so that we can do some free on that file.
If any chance to give the command to remove the only 2016 jan to 2016 dec data in service-perdata.out??
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: service-perfdata.out got increasing huge size
I don't have a command to do that but you would somehow need to process the file where the first field is less than 1483189199