Nagios monitoring engine stopped

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jmonn
Posts: 9
Joined: Mon Dec 14, 2015 10:27 am

Nagios monitoring engine stopped

Post by jmonn »

Hi,

We upgraded nagios XI a few weeks ago from 5.7 to 5.8 (now in 5.8.3). While it worked like a charm then, service nagios monitoring engine now stops sometimes (a few times over the last weeks, that is a lot for monitoring) and I haven't been able to find any cause for it, no error in nagios.log for example.

Where should I look for errors about this service stopping ?

Thanks,

Jeremy
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Nagios monitoring engine stopped

Post by gsmith »

Hi Jeremy,

Sorry to hear about the frequent crashes. Please pm me your system profile, to do so:

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message and then reply to this post to bring it up in the queue.

You can look at this document for a list of the logs and what their descriptions:

https://assets.nagios.com/downloads/nagiosxi/docs/Nagios-XI-Log-Locations-And-Descriptions.pdf

How did you know the system went down?
Did the outages occur randomly or at approximately the same time of day/night?
Did any other non-Nagios systems go down aqt the same time?

Thanks
jmonn
Posts: 9
Joined: Mon Dec 14, 2015 10:27 am

Re: Nagios monitoring engine stopped

Post by jmonn »

Hi,

We noticed the crash because we received no more emails from the monitoring service. No other system is impacted (AFAIK). Crash ocurs randomly for me, for the last 30 days we can visualize it with the CPU graph from nagios itslef, as you can see on the graph attached.

No specific errors in the logs other than than the "Caught SIGSEGV" then "Caught SIGTERM". I read it could be a memory leak from a plugin, but that is kind of hard to debug... :-/

Regards
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios monitoring engine stopped

Post by benjaminsmith »

Hi,

Please send us the system profile and we'll review the logs for any errors. In the meantime, let's run a tail command on the database log.

Code: Select all

tail /var/log/mariadb
If there are any errors (e.g. crashed database tables), then go ahead and run the repair script as root and let us know if you notice any improvement.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Also, do you have test server set up in your environment and have you made any performance modifications to this system? If so, which ones?

Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
jmonn
Posts: 9
Joined: Mon Dec 14, 2015 10:27 am

Re: Nagios monitoring engine stopped

Post by jmonn »

Hello,

It was indeed crashed mysql tables, but I had to myisamchk the tables (with mariadb stopped). Now, why Nagios and mariadb are stopped brutally, probably a plugin leaking memory, but that is hard to find...

Regards
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios monitoring engine stopped

Post by benjaminsmith »

Hi,

Thanks for the update. The new backend database application may have stopped causing the nagios process to quit. You can keep tabs on the nagios process by running the Nagios Server Wizard on this system.

If you continue to have trouble with corrupt tables, I would recommend converting the tables to innodb.

We have a guide on our knowledgebase on how to do this.

Database Storage Engine and High CPU usage in Nagios XI

Let us know if you need further assistance.

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!