Nagios xi Jobs self-monitor

meganwilliford · Post by **meganwilliford** » Fri Jul 24, 2020 2:05 pm

Hello, I'm trying to find out some more info about the Nagios xi Jobs monitor and when to be concerned of the stale jobs or what to do to prevent them. We get warnings a couple times a week for stale jobs but it usually heals itself pretty quickly. Are there any suggestions to remediate this such as running a maintenance job every so often? And after how many seconds should we be worried about the stale sysstats and dbmaint?

Here is an example of the output, the 787 seconds old is probably the highest we've seen it get before clearing on the next monitor run.
System Statistics (sysstat) stale (367 seconds old), Database Maintenance (dbmaint) stale (787 seconds old)

ssax · Post by **ssax** » Fri Jul 24, 2020 5:11 pm

One thing to check is to make sure your date/times all match up and are accurate.

Please send the FULL output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is contained/stored on another server and/or you've changed the root mysql password

Code: Select all

mysql -h 127.0.0.1 -uroot -pnagiosxi -e 'SELECT NOW(); SELECT @@GLOBAL.time_zone, @@SESSION.time_zone;'
date
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
grep "date.timezone =" /etc/php.ini

Please PM me a copy of your profile and I'll take a look to see if I can find any issues, you can download it from Admin > System Profile > Download Profile button.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

This next command may fail, that's okay, not all systems have postgresql:

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi

meganwilliford · Post by **meganwilliford** » Mon Jul 27, 2020 2:44 pm

Thanks for the reply! PM sent.

ssax · Post by **ssax** » Mon Jul 27, 2020 5:12 pm

Do you have backend cache enabled in Admin > Performance Settings > Backend Cache?

Are you seeing any failures in /var/log/cron?

Please run these commands and let me know if it resolves your issue:

Code: Select all

systemctl stop httpd
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop mod-gearman-worker
systemctl stop gearmand
systemctl stop ramdisk
pkill -9 -u nagios
pkill -9 -u apache
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -f /usr/local/nagiosxi/var/dbmaint.lock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -f /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagios/var/ndo2db.pid
rm -f /usr/local/nagios/var/ndo2db.sock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /us/local/nagiosxi/var/subsys/ndo2db
rm -f /var/run/nagios/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/httpd/httpd.pid
rm -f /usr/local/nagiosxi/var/subsys/npcd.pid
systemctl restart mariadb
systemctl start ramdisk
systemctl start gearmand
systemctl start mod-gearman-worker
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl start httpd
systemctl restart snmptt

meganwilliford · Post by **meganwilliford** » Fri Aug 28, 2020 9:10 am

We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.

scottwilkerson · Post by **scottwilkerson** » Fri Aug 28, 2020 9:12 am

meganwilliford wrote:We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.

Ok, Closing thread

Nagios Support Forum

Nagios xi Jobs self-monitor

Nagios xi Jobs self-monitor

Re: Nagios xi Jobs self-monitor

Re: Nagios xi Jobs self-monitor

Re: Nagios xi Jobs self-monitor

Re: Nagios xi Jobs self-monitor

Re: Nagios xi Jobs self-monitor