Hi guys,
We are trying to migrate from centos 6 to centos 7 using same version of nagios 5.7.3
With reference to https://support.nagios.com/forum/viewto ... 16&t=62421, we installed a new VM with centos 7 and nagios 5.7.3
Additionally, we did a backup and restore from the centos 6 with the same nagios version, but after the restore it seems that monitoring engine stops working from time to time which leads to graphs not populating. As per topic pasted above, we did NOT execute the restore_repair.sh script.
Can you kindly guide us on what we need to check in order to solve the issue?
Please also find attached the profile.
Best regards,
nms
Monitoring engine stops working and graphs not populating
-
- Posts: 222
- Joined: Wed Sep 28, 2016 9:35 am
Monitoring engine stops working and graphs not populating
You do not have the required permissions to view the files attached to this post.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: Monitoring engine stops working and graphs not populatin
Hi nms,
Hope you are having a great day!! ...
I looked at the "profile.zip" and noticed a few things.
You lost connection with your database:
.. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
You have lots of INSERT issues:
[1625748999] NDO-3: The following query failed while MySQL appears to be connected:
[1625748999] NDO-3: INSERT INTO nagios_servicechecks (instance_id, start_time, start_time_usec, end_time, end_time_usec, service_object_id, check_type, current_check_attempt, max_check_attempts, state, state_type, timeout, early_timeout, execution_time, latency, return_code, output, long_output, perfdata, command_object_id, command_args, command_line) VALUES (1,FROM_UNIXTIME(1625748996),424887,FROM_UNIXTIME(1625748997),772627,35300,0,1,3,0,1,120,0,1.347740,5.354317,0,'NACK statistics on voicemo for VFNL-WYLS are nack_insf=0:nack_cris=0:nack_nacc=0:nack_nbty=0:nack_nrat=0:nack_wdis=0:nack_tmny=0:nack_nena=0:nack_nbill=0:','','nack_insf=0;nack_cris=0;nack_nacc=0;nack_nbty=0;nack_nrat=0;nack_wdis=0;nack_tmny=0;nack_nena=0;nack_nbill=0;',0,'','') ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), start_time = VALUES(start_time), start_time_usec = VALUES(start_time_usec), end_time = VALUES(end_time), end_time_usec = VALUES(end_time_usec), service_object_id = VALUES(service_object_id), check_type = VALUES(check_type), current_check_attempt = VALUES(current_check_attempt), max_check_attempts = VALUES(max_check_attempts), state = VALUES(state), state_type = VALUES(state_type), timeout = VALUES(timeout), early_timeout = VALUES(early_timeout), execution_time = VALUES(execution_time), latency = VALUES(latency), return_code = VALUES(return_code), output = VALUES(output), long_output = VALUES(long_output), perfdata = VALUES(perfdata), command_object_id = VALUES(command_object_id), command_args = VALUES(command_args), command_line = VALUES(command_line)
You ran out of memory:
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20866 (nagios) total-vm:10844kB, anon-rss:176kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20872 (nagios) total-vm:10844kB, anon-rss:180kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20860 (nagios) total-vm:127009036kB, anon-rss:15040676kB, file-rss:0kB, shmem-rss:0kB
Please try the below commands:
As to the out of memory issue, I noticed you have huge amount of checks running at around "08:56" AM today.
You can see that in "/var/log/messages"
Please check and see why you have that many running check at once, which caused you ran out of memory ... I think.
Best Regards,
Vinh
Hope you are having a great day!! ...

I looked at the "profile.zip" and noticed a few things.
You lost connection with your database:
.. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
You have lots of INSERT issues:
[1625748999] NDO-3: The following query failed while MySQL appears to be connected:
[1625748999] NDO-3: INSERT INTO nagios_servicechecks (instance_id, start_time, start_time_usec, end_time, end_time_usec, service_object_id, check_type, current_check_attempt, max_check_attempts, state, state_type, timeout, early_timeout, execution_time, latency, return_code, output, long_output, perfdata, command_object_id, command_args, command_line) VALUES (1,FROM_UNIXTIME(1625748996),424887,FROM_UNIXTIME(1625748997),772627,35300,0,1,3,0,1,120,0,1.347740,5.354317,0,'NACK statistics on voicemo for VFNL-WYLS are nack_insf=0:nack_cris=0:nack_nacc=0:nack_nbty=0:nack_nrat=0:nack_wdis=0:nack_tmny=0:nack_nena=0:nack_nbill=0:','','nack_insf=0;nack_cris=0;nack_nacc=0;nack_nbty=0;nack_nrat=0;nack_wdis=0;nack_tmny=0;nack_nena=0;nack_nbill=0;',0,'','') ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), start_time = VALUES(start_time), start_time_usec = VALUES(start_time_usec), end_time = VALUES(end_time), end_time_usec = VALUES(end_time_usec), service_object_id = VALUES(service_object_id), check_type = VALUES(check_type), current_check_attempt = VALUES(current_check_attempt), max_check_attempts = VALUES(max_check_attempts), state = VALUES(state), state_type = VALUES(state_type), timeout = VALUES(timeout), early_timeout = VALUES(early_timeout), execution_time = VALUES(execution_time), latency = VALUES(latency), return_code = VALUES(return_code), output = VALUES(output), long_output = VALUES(long_output), perfdata = VALUES(perfdata), command_object_id = VALUES(command_object_id), command_args = VALUES(command_args), command_line = VALUES(command_line)
You ran out of memory:
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20866 (nagios) total-vm:10844kB, anon-rss:176kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20872 (nagios) total-vm:10844kB, anon-rss:180kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20860 (nagios) total-vm:127009036kB, anon-rss:15040676kB, file-rss:0kB, shmem-rss:0kB
Please try the below commands:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
systemctl restart mariadb.service
systemctl stop httpd
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
pkill -9 -u apache
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
systemctl restart mariadb
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -h 127.0.0.1 -uroot -pnagiosxi nagiosxi
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
You can see that in "/var/log/messages"
Please check and see why you have that many running check at once, which caused you ran out of memory ... I think.
Best Regards,
Vinh
-
- Posts: 222
- Joined: Wed Sep 28, 2016 9:35 am
Re: Monitoring engine stops working and graphs not populatin
Hi,
We have performed a new fresh installation of Centos7 bearing the same version of Nagios (5.7.3) as the same one running on the Centos6.
Restored from a backup successfully, ran a DB repair but ran into these issues:
1. NDO
From the nagios log I see a lot of these messages:
The Database (now mariadb) is running on the same server, there is no offload involved here.
2. When looking at the Nagios state I saw it as down (killed). Restarted the service but I see this error:
3. Although the Nagios process restarts successfully, from time to time this is being killed as we noticed happening in the GUI.
4. The measurements for bandwidth all dropped to 0. I could see from the mrtg cfg file that the rrdtool perl library was 5.10.1, but when checking on the new centos7 installation I see that this is version 5.16.3. Thus I updated the mrtg cfg file, however, I'm not sure if I need to restart anything here.
5. graphs are not being populated.
Can you kindly assist in solving these issues as we need to make sure Nagios is running fine on CentOS7 before we move forth on upgrading the other instances?
*Note that the commands given to be performed have also been executed.
Re-attaching the profile. Rgds,
Matthew
We have performed a new fresh installation of Centos7 bearing the same version of Nagios (5.7.3) as the same one running on the Centos6.
Restored from a backup successfully, ran a DB repair but ran into these issues:
1. NDO
From the nagios log I see a lot of these messages:
Code: Select all
NDO-3: The following query failed while MySQL appears to be connected:
2. When looking at the Nagios state I saw it as down (killed). Restarted the service but I see this error:
Code: Select all
: WARNING: RLIMIT_NPROC is 63444, total max estimated processes is 159492! You should increase your limits (ulimit -u, or limits.conf)
5. graphs are not being populated.
Can you kindly assist in solving these issues as we need to make sure Nagios is running fine on CentOS7 before we move forth on upgrading the other instances?
*Note that the commands given to be performed have also been executed.
Re-attaching the profile. Rgds,
Matthew
You do not have the required permissions to view the files attached to this post.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: Monitoring engine stops working and graphs not populatin
Hi,
Hope you are having a great Monday!! ...
I noticed the following errors:
I have attached the "xi_573.sql" file.
Here's how to install:
Assuming that you downloaded the "xi_573.sql" file and put that under "/tmp".
I also found this post that you can take a look at for increasing the NPROC setting.
https://serverfault.com/questions/62861 ... n-centos-7
Also, please run the following commands as root and post the output here.
Also, please run the below command to check your "max_connection" settings:
If your Max Connection is below "151", please see the KB below on how to increase the Max Connections:
https://support.nagios.com/kb/article/n ... s-513.html
I also noticed that at around "12:29", you ran out of memory and swap based on the "/var/log/messages":
Seems like there are lots of check_nrpe running, any ideas why all at the same time?
Please check the "/var/log/messages" on your Nagios XI system for more details.
Please talk to your system admin, if you think you need more memory addeded.
Best Regards,
VInh
Hope you are having a great Monday!! ...

I noticed the following errors:
Code: Select all
<p><pre>SQL Error [nagiosxi] : Table 'nagiosxi.xi_commands' doesn't exist</pre></p>
<p><pre>SQL Error [nagiosxi] : Table 'nagiosxi.xi_sysstat' doesn't exist</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
Here's how to install:
Assuming that you downloaded the "xi_573.sql" file and put that under "/tmp".
Code: Select all
cd /tmp
mysql -f -uroot -pnagiosxi nagiosxi < /tmp/xi_573.sql
systemctl restart mariadb
systemctl restart nagios
I also found this post that you can take a look at for increasing the NPROC setting.
https://serverfault.com/questions/62861 ... n-centos-7
Also, please run the following commands as root and post the output here.
Code: Select all
echo "SELECT table_schema as 'Database', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC;" |mysql -t -u root -pnagiosxi
Code: Select all
mysql -u root -pnagiosxi -e "show global status like '%used_connections%'; show variables like 'max_connections';"
https://support.nagios.com/kb/article/n ... s-513.html
I also noticed that at around "12:29", you ran out of memory and swap based on the "/var/log/messages":
Code: Select all
Jul 12 12:29:18 bru-nms-nagios-p kernel: Out of memory: Kill process 9305 (nagios) score 874 or sacrifice child
Jul 12 12:29:18 bru-nms-nagios-p kernel: Killed process 9313 (nagios) total-vm:10844kB, anon-rss:220kB, file-rss:0kB, shmem-rss:0kB
Jul 12 12:29:18 bru-nms-nagios-p kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jul 12 12:29:18 bru-nms-nagios-p kernel: systemd-journal cpuset=/ mems_allowed=0
Please check the "/var/log/messages" on your Nagios XI system for more details.
Please talk to your system admin, if you think you need more memory addeded.
Best Regards,
VInh
You do not have the required permissions to view the files attached to this post.
-
- Posts: 222
- Joined: Wed Sep 28, 2016 9:35 am
Re: Monitoring engine stops working and graphs not populatin
Hi,
I noticed that the reason for OOM was due that the Nagios process was being killed after reaching 100%. It was noticed that after downgrading the NDO-3 it was all back to normal. graphs were being updated too.
At this stage, I think we can lock this thread.
I noticed that the reason for OOM was due that the Nagios process was being killed after reaching 100%. It was noticed that after downgrading the NDO-3 it was all back to normal. graphs were being updated too.
At this stage, I think we can lock this thread.
-
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Monitoring engine stops working and graphs not populatin
Hi,
Sounds good. We'll close this out but feel free to open another post if you have any new questions.At this stage, I think we can lock this thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Monitoring engine stops working and graphs not populatin
Hi,
Sounds good. We'll close this out but feel free to open another post if you have any new questions.At this stage, I think we can lock this thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!