High Load Issues on Nagios xi

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
lciucci
Posts: 1
Joined: Fri Dec 13, 2024 11:09 am

High Load Issues on Nagios xi

Post by lciucci »

Hi everyone,

Since around September 29, we’ve been experiencing significant increases in the load on our Nagios xi server. According to the logs, the system load and the number of processes often exceed the set thresholds.

We observed that snmptt was receiving numerous traps and invoking the event handler, which greatly contributed to the increased load. For now, we’ve decided to disable snmptt, and the load has decreased substantially. However, occasional spikes persist, which weren’t present before late September.

Here are some of the logs we were seeing repeatedly every few seconds when snmptt was active:

2024-12-01 23:59:58 External command [1726740285] PROCESS_SERVICE_CHECK_RESULT;0.0.0.0;SNMP Traps;1;Management processor is currently in reset. (6061): The management processor is in the process of being reset. / sysName.0 (OCTETSTR):ILOCZJD1J01KN. cpqHoTrapFlags.0 (INTEGER):8 returned error Command failed
2024-12-01 23:59:58 Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;0.0.0.0;SNMP Traps;1;...
Runtime Warning 2024-12-01 23:59:58 Warning: Passive check result was received for service 'SNMP Traps' on host '0.0.0.0', but the host could not be found!



After disabling snmptt, the load decreased but didn’t stabilize entirely. We’re still seeing occasional spikes (4-5 times daily) in the current load:

2024-12-13 17:00:46 localhost Current Load OK HARD 1 of 4 OK - load average: 1.04, 0.87, 2.48
2024-12-13 16:50:47 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.32, 1.59, 3.99
2024-12-13 16:40:47 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 1.90, 7.22, 6.91
2024-12-13 15:55:47 localhost Current Load OK HARD 1 of 4 OK - load average: 1.01, 1.19, 2.91
2024-12-13 15:50:46 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.29, 1.40, 3.71
2024-12-13 15:36:02 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 5.65, 15.32, 8.46
2024-12-13 13:45:46 localhost Current Load OK HARD 1 of 4 OK - load average: 0.84, 1.01, 2.63
2024-12-13 13:40:47 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.66, 1.35, 3.35
2024-12-13 13:26:05 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 4.81, 12.89, 7.46
2024-12-13 12:34:33 localhost Current Load OK HARD 1 of 4 OK - load average: 0.40, 1.33, 2.96
2024-12-13 12:29:56 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.23, 2.15, 3.69
2024-12-13 12:24:56 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 1.20, 5.22, 4.95
2024-12-13 07:20:47 localhost Current Load OK HARD 1 of 4 OK - load average: 1.97, 1.17, 2.48
2024-12-13 07:10:47 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.95, 1.82, 3.89
2024-12-13 07:00:47 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 1.69, 6.51, 6.35
2024-12-13 06:15:47 localhost Current Load OK HARD 1 of 4 OK - load average: 1.62, 1.34, 2.59
2024-12-13 06:05:47 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 1.30, 2.18, 3.86
2024-12-13 05:51:00 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 8.02, 14.79, 7.68
2024-12-13 05:05:47 localhost Current Load OK HARD 1 of 4 OK - load average: 0.58, 1.19, 2.54
2024-12-13 05:00:49 localhost Current Load WARNING HARD 4 of 4 WARNING - load average: 0.55, 1.78, 3.13
2024-12-13 04:56:03 localhost Current Load CRITICAL HARD 4 of 4 CRITICAL - load average: 1.56, 4.10, 4.15
2024-12-13 04:05:47 localhost Current Load OK HARD 1 of 4 OK - load average: 1.71, 1.43, 2.76


Another issue we’ve noticed is that the Nagios xi interface often becomes unresponsive for several minutes when accessing the event log. During these times, mysqld consumes a significant amount of CPU:

root@nagioseit:/tmp# ps aux | sort -nrk 3,3 | head -n 5
mysql 1495 33.8 3.7 13750368 605256 ? Ssl Oct29 21943:12 /usr/sbin/mysqld
nagios 1139874 3.0 0.3 148772 63428 ? S 17:34 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 1139875 2.9 0.3 148772 63268 ? S 17:34 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios 1139872 2.9 0.3 148772 62616 ? S 17:34 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 1139877 2.6 0.3 148772 62848 ? S 17:34 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php


System Details:
Nagios xi version: 5.11.2
OS: Ubuntu 22.04.3 LTS
Kernel: 5.15.0-86-generic
Gnome: Not installed
We would appreciate any advice on troubleshooting and resolving these issues, If you need additional information or logs, let me know.

Thank you in advance for your support!
DoubleDoubleA
Posts: 199
Joined: Thu Feb 09, 2017 5:07 pm

Re: High Load Issues on Nagios xi

Post by DoubleDoubleA »

Hi @lciucci,

Load involves a lot of factors: number of checks, active or passive, check interval, plugin language. Other things can add load to the server, including reporting.

What has changed recently in your monitoring environment? At the vert least it sounds like you have more traps coming in. What else has changed?

In this case it might be best to put in a ticket https://answerhub.nagios.com/support/s/ with the commercial support team. They can do a much more full job of troubleshooting than we will be able to on the forum. They'll get your system profile and starting working through it with you.

Aaron