Nagios xi Jobs self-monitor

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Nagios xi Jobs self-monitor

Post by meganwilliford »

Hello, I'm trying to find out some more info about the Nagios xi Jobs monitor and when to be concerned of the stale jobs or what to do to prevent them. We get warnings a couple times a week for stale jobs but it usually heals itself pretty quickly. Are there any suggestions to remediate this such as running a maintenance job every so often? And after how many seconds should we be worried about the stale sysstats and dbmaint?

Here is an example of the output, the 787 seconds old is probably the highest we've seen it get before clearing on the next monitor run.
System Statistics (sysstat) stale (367 seconds old), Database Maintenance (dbmaint) stale (787 seconds old)
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios xi Jobs self-monitor

Post by ssax »

One thing to check is to make sure your date/times all match up and are accurate.

Please send the FULL output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is contained/stored on another server and/or you've changed the root mysql password

Code: Select all

mysql -h 127.0.0.1 -uroot -pnagiosxi -e 'SELECT NOW(); SELECT @@GLOBAL.time_zone, @@SESSION.time_zone;'
date
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
grep "date.timezone =" /etc/php.ini
Please PM me a copy of your profile and I'll take a look to see if I can find any issues, you can download it from Admin > System Profile > Download Profile button.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
This next command may fail, that's okay, not all systems have postgresql:

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Nagios xi Jobs self-monitor

Post by meganwilliford »

Thanks for the reply! PM sent.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios xi Jobs self-monitor

Post by ssax »

Do you have backend cache enabled in Admin > Performance Settings > Backend Cache?

Are you seeing any failures in /var/log/cron?

Please run these commands and let me know if it resolves your issue:

Code: Select all

systemctl stop httpd
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop mod-gearman-worker
systemctl stop gearmand
systemctl stop ramdisk
pkill -9 -u nagios
pkill -9 -u apache
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -f /usr/local/nagiosxi/var/dbmaint.lock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -f /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagios/var/ndo2db.pid
rm -f /usr/local/nagios/var/ndo2db.sock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /us/local/nagiosxi/var/subsys/ndo2db
rm -f /var/run/nagios/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/httpd/httpd.pid
rm -f /usr/local/nagiosxi/var/subsys/npcd.pid
systemctl restart mariadb
systemctl start ramdisk
systemctl start gearmand
systemctl start mod-gearman-worker
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl start httpd
systemctl restart snmptt
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Nagios xi Jobs self-monitor

Post by meganwilliford »

We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios xi Jobs self-monitor

Post by scottwilkerson »

meganwilliford wrote:We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.
Ok, Closing thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart