Hello, I'm trying to find out some more info about the Nagios xi Jobs monitor and when to be concerned of the stale jobs or what to do to prevent them. We get warnings a couple times a week for stale jobs but it usually heals itself pretty quickly. Are there any suggestions to remediate this such as running a maintenance job every so often? And after how many seconds should we be worried about the stale sysstats and dbmaint?
Here is an example of the output, the 787 seconds old is probably the highest we've seen it get before clearing on the next monitor run.
System Statistics (sysstat) stale (367 seconds old), Database Maintenance (dbmaint) stale (787 seconds old)
Nagios xi Jobs self-monitor
-
- Dreams In Code
- Posts: 7682
- Joined: Wed Feb 11, 2015 12:54 pm
Re: Nagios xi Jobs self-monitor
One thing to check is to make sure your date/times all match up and are accurate.
Please send the FULL output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is contained/stored on another server and/or you've changed the root mysql password
Please PM me a copy of your profile and I'll take a look to see if I can find any issues, you can download it from Admin > System Profile > Download Profile button.
Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
This next command may fail, that's okay, not all systems have postgresql:
Please send the FULL output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is contained/stored on another server and/or you've changed the root mysql password
Code: Select all
mysql -h 127.0.0.1 -uroot -pnagiosxi -e 'SELECT NOW(); SELECT @@GLOBAL.time_zone, @@SESSION.time_zone;'
date
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
grep "date.timezone =" /etc/php.ini
Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Code: Select all
echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
-
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: Nagios xi Jobs self-monitor
Thanks for the reply! PM sent.
-
- Dreams In Code
- Posts: 7682
- Joined: Wed Feb 11, 2015 12:54 pm
Re: Nagios xi Jobs self-monitor
Do you have backend cache enabled in Admin > Performance Settings > Backend Cache?
Are you seeing any failures in /var/log/cron?
Please run these commands and let me know if it resolves your issue:
Are you seeing any failures in /var/log/cron?
Please run these commands and let me know if it resolves your issue:
Code: Select all
systemctl stop httpd
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop mod-gearman-worker
systemctl stop gearmand
systemctl stop ramdisk
pkill -9 -u nagios
pkill -9 -u apache
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -f /usr/local/nagiosxi/var/dbmaint.lock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -f /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagios/var/ndo2db.pid
rm -f /usr/local/nagios/var/ndo2db.sock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /us/local/nagiosxi/var/subsys/ndo2db
rm -f /var/run/nagios/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/httpd/httpd.pid
rm -f /usr/local/nagiosxi/var/subsys/npcd.pid
systemctl restart mariadb
systemctl start ramdisk
systemctl start gearmand
systemctl start mod-gearman-worker
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl start httpd
systemctl restart snmptt
-
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: Nagios xi Jobs self-monitor
We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Nagios xi Jobs self-monitor
Ok, Closing threadmeganwilliford wrote:We do not have backend cache enabled. Of course as soon as I made this post we stopped getting the Nagios xi Jobs warnings which is why it's taken me a while to reply. I've documented the steps you provided below if it does start occurring again. Thank you! This post can be closed.