Monitoring Engine crashing (Nagios xi 5.7.2)

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

Hello,

Since upgrading from 5.6.6 to 5.7.2 we've been experiencing some issues with the monitoring engine on multiple of our Nagios xi instances.

The monitoring engine has crashed a few times or when applying configurations the monitoring engine detects problems then after a while will repair itself and report as OK.

Is there any way to troubleshoot this?

Thanks!
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by jdunitz »

You can look at the following logs, which may provide clues:

Code: Select all

/var/log/messages
/var/log/mariadb/mariadb.log
/usr/local/nagios/var/nagios.log (and other logs in that same directory)

also, the ipcs command will show you if you have hundreds of messages queued up, that's a sign of a problem:

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xef000040 34439168   nagios     600        705536       689
--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

Nothing is sticking out in the logs but it does look like there are a thousands of messages queued up. What problem could that be a sign of and do you know how we can prevent the messages from queuing up?

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xffffffff 0          root       600        0            0
0x000004d2 65537      root       666        0            0
0xdf000200 9371650    nagios     600        0            0
0x02000200 10256387   nagios     600        3590144      3506
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

Also I wanted to mention, the message queue results I posted above is only from one of our nagios xi instances. The others that are also having monitoring engine issues do not have any queued up messages.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by ssax »

Please PM me a copy of your profile from each xi server, you can download it from Admin > System Profile > Download Profile button.

xi 5.7+ should not use the kernel message queue unless you downgraded NDO3 back to NDO2DB to resolve an issue.

What is the output of this command?
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

mysql -uroot -pnagiosxi -h 127.0.0.1 -P 3306 nagios -e "desc nagios_hoststatus;desc nagios_servicestatus;"
If you run this tail command run for a few minutes do you see any errors pop up? (PM me the output)

Code: Select all

tail -Fn0 /usr/local/nagios/var/nagios.log /usr/local/nagiosxi/var/cmdsubsys.log /usr/local/nagiosxi/var/eventman.log
Please PM your /usr/local/nagios/var/nagios.log as well.
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

PM sent!
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

This morning I was able to watch the monitoring engine crash and it was at the exact time our backups are scheduled for (0400 PT).
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by ssax »

I was literally going to ask you that, I didn't see anything in your profiles.

Are they xi scheduled backups or 3rd party backups?

If it's an xi backup, please send these files:

Code: Select all

/usr/local/nagiosxi/var/components/scheduledbackups.log
/etc/php.ini
Additionally, please send the output of this command so we can check your DB tables:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by meganwilliford »

PM sent!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Monitoring Engine crashing (Nagios xi 5.7.2)

Post by ssax »

Reply sent.