Nagios xi 5.8.5 high CPU usage

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
User avatar
ignacio.sanchez
Posts: 22
Joined: Thu Jun 25, 2020 4:46 am

Nagios xi 5.8.5 high CPU usage

Post by ignacio.sanchez »

Hello.

We have 4 Nagios xi instances already updated to 5.8.5, but we are facing high CPU usage on only one of them where I can't find the exact reason.

The server has 4 CPUs and 16GB RAM.

Already tried:

Code: Select all

check_result_reaper_frequency=3
max_check_result_reaper_time=10

Code: Select all

max_concurrent_checks=20
(if I set it to the default "0", CPU usage will reach 311.95,232.62,145.09)

CPU usage is as below now (but webserver interface is so slow)

Code: Select all

top - 08:28:32 up 1 day, 34 min,  1 user,  load average: 29.86, 25.89, 25.12
Tasks: 212 total,  28 running, 184 sleeping,   0 stopped,   0 zombie
%Cpu(s): 77.8 us, 19.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
KiB Mem : 16431432 total,  8797700 free,  2255444 used,  5378288 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 13016932 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21233 mysql     20   0 2630900 707264   9848 S  20.8  4.3 227:45.00 mysqld
  401 nagios    20   0  251172  61528   2264 R  18.5  0.4   0:03.33 curl
  952 nagios    20   0  243004  52804   2264 R  15.3  0.3   0:02.59 curl
 1953 nagios    20   0  243004  38900   2264 R  15.3  0.2   0:01.35 curl
 5145 apache    20   0  643764  33060   6108 R  15.3  0.2   0:04.50 httpd
 2770 apache    20   0  119916   8480   2352 R  14.9  0.1   0:00.79 status.cgi
30747 apache    20   0  643892  32512   5656 R  14.6  0.2   0:08.59 httpd
31795 nagios    20   0  241748  95892   1352 R  14.3  0.6   0:04.14 send_nrdp.sh
 3242 nagios    20   0  153116  11312   2236 R  12.7  0.1   0:00.39 check_ifopersta
31200 nagios    20   0  178320  55424   4392 R  11.7  0.3   0:14.84 mrtg
17779 apache    20   0  643776  33440   6204 S  11.0  0.2   0:07.58 httpd
 3241 nagios    20   0  148164  10444   2196 R  10.4  0.1   0:00.32 check_ifopersta
 3307 nagios    20   0  148032  10180   2192 R  10.1  0.1   0:00.31 check_ifopersta
 7696 apache    20   0  643828  33368   6112 R  10.1  0.2   0:12.37 httpd
24838 apache    20   0  638756  28468   6244 S  10.1  0.2   0:28.56 httpd
 3208 nagios    20   0  147636   9912   2192 R   8.4  0.1   0:00.26 check_ifopersta
 3214 nagios    20   0  146448   8864   2188 R   8.1  0.1   0:00.25 check_ifopersta
 3239 nagios    20   0  269904   6668   4824 S   8.1  0.0   0:00.25 curl
 3277 nagios    20   0  205792  11660   4104 S   8.1  0.1   0:00.25 python
 3339 nagios    20   0  146448   8868   2188 R   8.1  0.1   0:00.25 check_ifopersta
  534 root      20   0   64000  27652  27276 S   7.5  0.2  38:35.42 systemd-journal
21348 apache    20   0  642904  32364   7200 R   7.5  0.2   0:24.72 httpd
 3333 nagios    20   0  141372   7652   2140 R   6.5  0.0   0:00.20 check_ifopersta
 3374 nagios    20   0  182792   8748   3488 R   4.9  0.1   0:00.15 python
 2257 nagios    20   0  113700   2092   1388 S   4.2  0.0   0:00.99 send_nrdp.sh
15687 nagios    20   0 1047404  33956   3608 R   4.2  0.2  63:49.93 nagios
 3406 nagios    20   0  138204   4500   2112 R   2.9  0.0   0:00.09 check_ifopersta
 3413 nagios    20   0  138600   5024   2128 R   2.9  0.0   0:00.09 check_ifopersta
 1085 root      20   0  592540  21180  18980 S   2.3  0.1  10:28.06 rsyslogd
    9 root      20   0       0      0      0 R   1.3  0.0  31:53.37 rcu_sched
 3015 nagios    20   0  115536   1700   1368 S   1.3  0.0   0:00.04 check_rrdtraf
 3435 nagios    20   0  135216   3576   2072 R   1.3  0.0   0:00.04 check_ifopersta
31147 nagios    20   0  445680  26204  10544 S   1.3  0.2   0:00.59 php
 3019 nagios    20   0  115536   1708   1368 S   1.0  0.0   0:00.03 check_rrdtraf
 3445 nagios    20   0   18868   3172   1584 R   1.0  0.0   0:00.03 python
 1753 root      20   0  162236   2436   1592 R   0.6  0.0   0:00.11 top
    1 root      20   0  191308   4240   2620 S   0.3  0.0   3:29.58 systemd
    6 root      20   0       0      0      0 S   0.3  0.0   1:34.27 ksoftirqd/0
   47 root      39  19       0      0      0 R   0.3  0.0   0:37.35 khugepaged
  712 dbus      20   0   58392   2668   1828 S   0.3  0.0   3:41.77 dbus-daemon
  717 root      20   0   26492   1848   1456 S   0.3  0.0   1:33.76 systemd-logind
 3243 nagios    20   0  115536   1700   1368 R   0.3  0.0   0:00.01 check_rrdtraf
 3460 nagios    20   0  113284   1332   1148 R   0.3  0.0   0:00.01 sh
 6524 root      20   0  159592   6184   4796 S   0.3  0.0   0:00.45 sshd
15691 nagios    20   0   10844   1116    820 S   0.3  0.0   2:16.41 nagios
15693 nagios    20   0   10844   1116    820 S   0.3  0.0   2:17.48 nagios
31186 nagios    20   0  445680  26072  10444 S   0.3  0.2   0:00.50 php
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.37 kthreadd
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    7 root      rt   0       0      0      0 S   0.0  0.0   0:23.52 migration/0
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
   10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 lru-add-drain
   11 root      rt   0       0      0      0 S   0.0  0.0   0:16.20 watchdog/0
   12 root      rt   0       0      0      0 S   0.0  0.0   0:16.16 watchdog/1
   13 root      rt   0       0      0      0 S   0.0  0.0   0:24.32 migration/1
   14 root      20   0       0      0      0 S   0.0  0.0   1:06.53 ksoftirqd/1
   16 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H
   17 root      rt   0       0      0      0 S   0.0  0.0   0:16.63 watchdog/2
   18 root      rt   0       0      0      0 S   0.0  0.0   0:25.49 migration/2
   19 root      20   0       0      0      0 S   0.0  0.0   1:04.31 ksoftirqd/2
The total number of hosts is 214 and 3450 services (but will increase soon)

I changed the following options in php.ini file too:

Code: Select all

max_execution_time = 120
max_input_vars = 50000
memory_limit = 1024M
Obviously, you'll need more information to troubleshoot the issue, so don't hesitate to ask.

Thanks in advance!
You do not have the required permissions to view the files attached to this post.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Nagios xi 5.8.5 high CPU usage

Post by dchurch »

1. Please try running the database repair script, and let me know if that is successful. Run the following as root from the terminal.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
See Repairing The Nagios xi Database for complete instructions

2. Sometimes decreasing the amount of time the audit logs stay in the database can help.
The Max Audit Log Age setting is controlled through the admin screens. You can get to it through Admin => System Config -> Performance Settings, then click on the Database tab. The default in Nagios xi 5.7.3 and later is 180 days.

I usually recommend the following settings for better performance on larger (1000+ hosts/services) Nagios xi installs:

- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30

See this document: Nagios xi Database Optimization

3. If that still doesn't work

Try truncating some of the poorly-optimized "paper-trail" tables:

If MySQL:

Code: Select all

mysql -uroot -pnagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
If PostgreSQL:

Code: Select all

psql -U nagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
4. If you're still having issues with mysqld taking up 100% of the CPU

What is the output from the following command?

Code: Select all

mysql -uroot -pnagiosxi --table <<< "select * from (select table_name, round(((data_length + index_length) / 1024 / 1024), 2) as sz from information_schema.tables where table_schema like 'nagios%') as x order by x.sz;"
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
User avatar
ignacio.sanchez
Posts: 22
Joined: Thu Jun 25, 2020 4:46 am

Re: Nagios xi 5.8.5 high CPU usage

Post by ignacio.sanchez »

Hello dchurch.
dchurch wrote:1. Please try running the database repair script, and let me know if that is successful. Run the following as root from the terminal.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
See Repairing The Nagios xi Database for complete instructions
Done, successfully.
dchurch wrote:2. Sometimes decreasing the amount of time the audit logs stay in the database can help.
The Max Audit Log Age setting is controlled through the admin screens. You can get to it through Admin => System Config -> Performance Settings, then click on the Database tab. The default in Nagios xi 5.7.3 and later is 180 days.

I usually recommend the following settings for better performance on larger (1000+ hosts/services) Nagios xi installs:

- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30

See this document: Nagios xi Database Optimization
Changed
dchurch wrote:3. If that still doesn't work

Try truncating some of the poorly-optimized "paper-trail" tables:

If MySQL:

Code: Select all

mysql -uroot -pnagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
If PostgreSQL:

Code: Select all

psql -U nagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
Tables truncated.
dchurch wrote:4. If you're still having issues with mysqld taking up 100% of the CPU

What is the output from the following command?

Code: Select all

mysql -uroot -pnagiosxi --table <<< "select * from (select table_name, round(((data_length + index_length) / 1024 / 1024), 2) as sz from information_schema.tables where table_schema like 'nagios%') as x order by x.sz;"

Code: Select all

+--------------------------------------------+---------+
| table_name                                 | sz      |
+--------------------------------------------+---------+
| nagios_hostgroups                          |    0.00 |
| nagios_host_parenthosts                    |    0.00 |
| nagios_servicegroups                       |    0.00 |
| nagios_hostescalations                     |    0.00 |
| nagios_servicegroup_members                |    0.00 |
| nagios_serviceescalations                  |    0.00 |
| nagios_contactstatus                       |    0.00 |
| nagios_externalcommands                    |    0.00 |
| nagios_serviceescalation_contacts          |    0.00 |
| nagios_timeperiods                         |    0.00 |
| nagios_serviceescalation_contactgroups     |    0.00 |
| nagios_scheduleddowntime                   |    0.00 |
| nagios_timedevents                         |    0.00 |
| nagios_runtimevariables                    |    0.00 |
| nagios_downtimehistory                     |    0.00 |
| nagios_servicedependencies                 |    0.00 |
| nagios_contactgroups                       |    0.00 |
| nagios_programstatus                       |    0.00 |
| nagios_dbversion                           |    0.00 |
| nagios_contactgroup_members                |    0.00 |
| nagios_service_parentservices              |    0.00 |
| nagios_contact_notificationcommands        |    0.00 |
| nagios_contact_addresses                   |    0.00 |
| nagios_configfilevariables                 |    0.00 |
| nagios_hostescalation_contacts             |    0.00 |
| nagios_timedeventqueue                     |    0.00 |
| nagios_configfiles                         |    0.00 |
| nagios_instances                           |    0.00 |
| nagios_hostescalation_contactgroups        |    0.00 |
| nagios_comments                            |    0.00 |
| nagios_hostdependencies                    |    0.00 |
| nagios_hostgroup_members                   |    0.01 |
| nagios_acknowledgements                    |    0.01 |
| nagios_host_contacts                       |    0.01 |
| nagios_host_contactgroups                  |    0.01 |
| nagios_contacts                            |    0.01 |
| nagios_timeperiod_timeranges               |    0.01 |
| nagios_commands                            |    0.02 |
| tbl_lnkServiceToHostgroup                  |    0.02 |
| tbl_lnkHostToContactgroup                  |    0.02 |
| tbl_lnkServicetemplateToServicetemplate    |    0.02 |
| tbl_lnkContactToContacttemplate            |    0.02 |
| xi_cmp_ccm_backups                         |    0.02 |
| tbl_lnkServiceescalationToServicegroup     |    0.02 |
| tbl_lnkServicetemplateToServicegroup       |    0.02 |
| tbl_lnkContactToContactgroup               |    0.02 |
| tbl_lnkServiceescalationToService          |    0.02 |
| tbl_lnkServiceToContactgroup               |    0.02 |
| tbl_lnkServicetemplateToHostgroup          |    0.02 |
| tbl_lnkContactToCommandService             |    0.02 |
| tbl_lnkServiceescalationToHostgroup        |    0.02 |
| tbl_lnkServiceToContact                    |    0.02 |
| tbl_lnkServicetemplateToHost               |    0.02 |
| tbl_lnkContactToCommandHost                |    0.02 |
| tbl_lnkHostgroupToHost                     |    0.02 |
| tbl_lnkServiceescalationToHost             |    0.02 |
| tbl_lnkHostescalationToHostgroup           |    0.02 |
| tbl_lnkServiceescalationToContactgroup     |    0.02 |
| tbl_lnkHosttemplateToVariabledefinition    |    0.02 |
| tbl_lnkServicetemplateToContactgroup       |    0.02 |
| tbl_lnkHostescalationToHost                |    0.02 |
| tbl_lnkServiceescalationToContact          |    0.02 |
| xi_meta                                    |    0.02 |
| tbl_lnkHosttemplateToHosttemplate          |    0.02 |
| tbl_lnkServicetemplateToContact            |    0.02 |
| tbl_timedefinition                         |    0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |    0.02 |
| nagios_eventhandlers                       |    0.02 |
| tbl_lnkHosttemplateToHostgroup             |    0.02 |
| tbl_lnkServicegroupToServicegroup          |    0.02 |
| tbl_submenu                                |    0.02 |
| tbl_lnkHostescalationToContactgroup        |    0.02 |
| tbl_lnkHosttemplateToHost                  |    0.02 |
| tbl_lnkServicegroupToService               |    0.02 |
| tbl_lnkHostescalationToContact             |    0.02 |
| tbl_lnkServicedependencyToServicegroup_DS  |    0.02 |
| tbl_lnkHostToContact                       |    0.02 |
| tbl_lnkHosttemplateToContactgroup          |    0.02 |
| tbl_session_locks                          |    0.02 |
| tbl_lnkHostdependencyToHostgroup_H         |    0.02 |
| tbl_lnkServicedependencyToService_S        |    0.02 |
| tbl_lnkContacttemplateToVariabledefinition |    0.02 |
| xi_deploy_jobs                             |    0.02 |
| tbl_lnkHosttemplateToContact               |    0.02 |
| tbl_session                                |    0.02 |
| tbl_lnkHostdependencyToHostgroup_DH        |    0.02 |
| tbl_lnkServicedependencyToService_DS       |    0.02 |
| tbl_lnkContacttemplateToContacttemplate    |    0.02 |
| xi_deploy_agents                           |    0.02 |
| tbl_lnkHostdependencyToHost_H              |    0.02 |
| tbl_permission_inactive                    |    0.02 |
| tbl_lnkServicedependencyToHostgroup_H      |    0.02 |
| tbl_lnkContacttemplateToContactgroup       |    0.02 |
| xi_commands                                |    0.02 |
| tbl_lnkHostgroupToHostgroup                |    0.02 |
| tbl_lnkHostdependencyToHost_DH             |    0.02 |
| tbl_permission                             |    0.02 |
| tbl_lnkServicedependencyToHostgroup_DH     |    0.02 |
| tbl_lnkContacttemplateToCommandService     |    0.02 |
| tbl_lnkHostToVariabledefinition            |    0.02 |
| tbl_mainmenu                               |    0.02 |
| tbl_lnkServicedependencyToHost_H           |    0.02 |
| tbl_lnkContacttemplateToCommandHost        |    0.02 |
| tbl_lnkServicedependencyToHost_DH          |    0.02 |
| tbl_lnkContactgroupToContactgroup          |    0.02 |
| xi_cmp_scheduledreports_log                |    0.02 |
| tbl_lnkHostToHosttemplate                  |    0.02 |
| tbl_logbook                                |    0.02 |
| tbl_lnkHostToHostgroup                     |    0.02 |
| tbl_lnkTimeperiodToTimeperiod              |    0.02 |
| tbl_lnkContactgroupToContact               |    0.02 |
| tbl_lnkServiceToServicegroup               |    0.02 |
| tbl_lnkHostToHost                          |    0.02 |
| tbl_lnkServicetemplateToVariabledefinition |    0.02 |
| tbl_lnkContactToVariabledefinition         |    0.02 |
| tbl_domain                                 |    0.03 |
| tbl_contacttemplate                        |    0.03 |
| tbl_contactgroup                           |    0.03 |
| tbl_user                                   |    0.03 |
| tbl_contact                                |    0.03 |
| tbl_timeperiod                             |    0.03 |
| nagios_service_contactgroups               |    0.03 |
| xi_eventqueue                              |    0.03 |
| tbl_settings                               |    0.03 |
| xi_users                                   |    0.03 |
| xi_sysstat                                 |    0.03 |
| tbl_hosttemplate                           |    0.03 |
| xi_sessions                                |    0.03 |
| tbl_servicegroup                           |    0.03 |
| xi_cmp_trapdata_log                        |    0.03 |
| tbl_hostgroup                              |    0.03 |
| xi_cmp_trapdata                            |    0.03 |
| tbl_hostextinfo                            |    0.03 |
| tbl_serviceextinfo                         |    0.03 |
| tbl_hostescalation                         |    0.03 |
| tbl_serviceescalation                      |    0.03 |
| nagios_systemcommands                      |    0.03 |
| tbl_hostdependency                         |    0.03 |
| tbl_servicedependency                      |    0.03 |
| xi_cmp_favorites                           |    0.03 |
| xi_events                                  |    0.05 |
| xi_mibs                                    |    0.05 |
| tbl_command                                |    0.06 |
| nagios_customvariablestatus                |    0.06 |
| tbl_servicetemplate                        |    0.06 |
| nagios_customvariables                     |    0.06 |
| nagios_service_contacts                    |    0.06 |
| xi_options                                 |    0.06 |
| nagios_hosts                               |    0.06 |
| tbl_lnkServiceToVariabledefinition         |    0.09 |
| tbl_host                                   |    0.09 |
| tbl_lnkServiceToHost                       |    0.11 |
| tbl_lnkServiceToServicetemplate            |    0.11 |
| nagios_hostchecks                          |    0.12 |
| tbl_variabledefinition                     |    0.14 |
| nagios_flappinghistory                     |    0.14 |
| nagios_hoststatus                          |    0.14 |
| tbl_info                                   |    0.17 |
| xi_usermeta                                |    0.25 |
| nagios_processevents                       |    0.33 |
| nagios_commenthistory                      |    0.36 |
| xi_cmp_nagiosbpi_backups                   |    0.48 |
| tbl_service                                |    0.50 |
| nagios_services                            |    0.75 |
| nagios_servicechecks                       |    1.05 |
| nagios_objects                             |    1.43 |
| nagios_servicestatus                       |    1.87 |
| xi_auth_tokens                             |    2.53 |
| xi_auditlog                                |    6.17 |
| nagios_contactnotificationmethods          |   17.86 |
| nagios_contactnotifications                |   18.89 |
| nagios_notifications                       |   32.03 |
| nagios_statehistory                        |   37.43 |
| nagios_logentries                          | 2513.74 |
+--------------------------------------------+---------+
But I reverted back max_concurrent_checks to 0 (it was working before this way), and here you have a print of all processes.

Code: Select all

top - 09:25:46 up 2 min,  2 users,  load average: 76.43, 25.34, 9.01
Tasks: 257 total,  42 running, 215 sleeping,   0 stopped,   0 zombie
%Cpu(s): 64.8 us, 30.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  4.9 si,  0.0 st
KiB Mem : 16431432 total, 14773732 free,  1346700 used,   311000 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 14801788 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1427 mysql     20   0 2616024 135472   8896 S   9.3  0.8   0:31.48 mysqld
 1502 apache    20   0  638624  27660   5656 R   5.7  0.2   0:10.35 httpd
 1503 apache    20   0  638368  27408   5660 R   5.4  0.2   0:08.39 httpd
23355 nagios    20   0  203416  11224   4032 R   5.1  0.1   0:00.21 python
23376 nagios    20   0  202756  10700   3996 R   5.1  0.1   0:00.20 python
23377 nagios    20   0  205788  11652   4104 R   5.1  0.1   0:00.21 python
 1607 apache    20   0  639000  28096   5808 R   4.8  0.2   0:08.61 httpd
 1609 apache    20   0  643748  32528   5656 R   4.8  0.2   0:07.88 httpd
23358 nagios    20   0  203416  11220   4032 R   4.8  0.1   0:00.20 python
 1505 apache    20   0  644904  33568   6716 R   4.5  0.2   0:08.82 httpd
15742 nagios    20   0   47496  17732   2344 R   4.5  0.1   0:02.66 mrtg
22948 nagios    20   0  142032   8444   2148 R   4.5  0.1   0:00.21 check_ifopersta
23346 nagios    20   0  140976   7400   2136 R   4.5  0.0   0:00.19 check_ifopersta
23349 nagios    20   0  187332   9220   3604 R   4.5  0.1   0:00.20 python
23350 nagios    20   0  194096   9844   3660 R   4.5  0.1   0:00.18 python
23352 nagios    20   0  147108   9388   2188 R   4.5  0.1   0:00.20 check_ifopersta
23353 nagios    20   0  200436  10352   3832 R   4.5  0.1   0:00.19 python
23356 nagios    20   0  140712   7132   2136 R   4.5  0.0   0:00.19 check_ifopersta
 1598 apache    20   0  643772  32512   5652 R   4.2  0.2   0:04.07 httpd
17096 nagios    20   0  183492  62908   1352 R   4.2  0.4   0:02.20 send_nrdp.sh
20586 nagios    20   0  283992  49520   2264 R   4.2  0.3   0:00.95 curl
23348 nagios    20   0  184872   8932   3532 R   4.2  0.1   0:00.17 python
23360 nagios    20   0  200436  10344   3832 R   4.2  0.1   0:00.19 python
23361 nagios    20   0  140712   7128   2136 R   4.2  0.0   0:00.17 check_ifopersta
23373 nagios    20   0  200436  10348   3832 R   4.2  0.1   0:00.17 python
23378 nagios    20   0  187332   9224   3604 R   4.2  0.1   0:00.18 python
21523 nagios    20   0  157712  12132   2536 S   4.0  0.1   0:00.38 check_ifopersta
21760 nagios    20   0  155636  11992   2408 R   4.0  0.1   0:00.36 check_ifopersta
23357 nagios    20   0  189412   9600   3616 R   4.0  0.1   0:00.16 python
23363 nagios    20   0  141240   7656   2140 R   4.0  0.0   0:00.17 check_ifopersta
23364 nagios    20   0  189404   9604   3616 R   4.0  0.1   0:00.16 python
23367 nagios    20   0  194092  10072   3704 R   4.0  0.1   0:00.16 python
23371 nagios    20   0  203020  10692   3996 R   4.0  0.1   0:00.18 python
23375 nagios    20   0  194088   9840   3660 R   4.0  0.1   0:00.17 python
21538 nagios    20   0  155636  11988   2408 S   3.7  0.1   0:00.37 check_ifopersta
21638 nagios    20   0  155636  11996   2408 R   3.7  0.1   0:00.36 check_ifopersta
23359 nagios    20   0  184872   9012   3556 R   3.7  0.1   0:00.15 python
23366 nagios    20   0  184872   9012   3556 R   3.7  0.1   0:00.16 python
23369 nagios    20   0  140580   6864   2136 R   3.7  0.0   0:00.16 check_ifopersta
23354 nagios    20   0  184872   8972   3548 R   3.4  0.1   0:00.15 python
23368 nagios    20   0  182788   8852   3520 R   3.4  0.1   0:00.15 python
15440 nagios    20   0  453360  34728  11016 S   3.1  0.2   0:01.73 php
    9 root      20   0       0      0      0 R   2.3  0.0   0:02.03 rcu_sched
21497 nagios    20   0  155636  11992   2408 S   1.7  0.1   0:00.33 check_ifopersta
21531 nagios    20   0  115540   1700   1368 S   1.4  0.0   0:00.05 check_rrdtraf
21532 nagios    20   0  115540   1700   1368 S   1.4  0.0   0:00.06 check_rrdtraf
21561 nagios    20   0  115540   1704   1368 R   1.4  0.0   0:00.09 check_rrdtraf
21507 nagios    20   0  115540   1704   1368 S   1.1  0.0   0:00.05 check_rrdtraf
21510 nagios    20   0  115540   1704   1368 S   1.1  0.0   0:00.08 check_rrdtraf
21536 nagios    20   0  115540   1700   1368 S   0.8  0.0   0:00.03 check_rrdtraf
21546 nagios    20   0  115540   1700   1368 S   0.8  0.0   0:00.05 check_rrdtraf
21555 nagios    20   0  115540   1700   1368 S   0.8  0.0   0:00.04 check_rrdtraf
21565 nagios    20   0  115540   1704   1368 S   0.8  0.0   0:00.07 check_rrdtraf
21689 nagios    20   0  115536   1704   1368 S   0.8  0.0   0:00.05 check_rrdtraf
21739 nagios    20   0  115540   1704   1368 S   0.8  0.0   0:00.03 check_rrdtraf
   19 root      20   0       0      0      0 S   0.6  0.0   0:00.12 ksoftirqd/2
 1613 root      20   0  162372   2396   1608 R   0.6  0.0   0:00.92 top
21489 nagios    20   0  115540   1700   1368 S   0.6  0.0   0:00.04 check_rrdtraf
21490 nagios    20   0  115540   1704   1368 S   0.6  0.0   0:00.04 check_rrdtraf
21491 nagios    20   0  115540   1704   1368 S   0.6  0.0   0:00.02 check_rrdtraf
And obviously, web browsing through Nagios interface becomes unusable.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Nagios xi 5.8.5 high CPU usage

Post by dchurch »

Please run the following commands and post the output:

Code: Select all

rm -f /usr/local/nagiosxi/var/dbmaint.lock
time php /usr/local/nagiosxi/cron/dbmaint.php
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
User avatar
ignacio.sanchez
Posts: 22
Joined: Thu Jun 25, 2020 4:46 am

Re: Nagios xi 5.8.5 high CPU usage

Post by ignacio.sanchez »

Code: Select all

CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1565713788)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1597249788)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1628180988)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1627921788)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1621009788)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1626193788)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1628785488)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1628785488)
LASTOPT:  1628784905
INTERVAL: 60
NOW:      1628785788
OPTTIME:  1628788505
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < FROM_UNIXTIME(1628756988) AND status_code = 2
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < FROM_UNIXTIME(1628756988) AND status_code = 2
CLEANING nagiosxi TABLE 'auth_tokens'...
SQL: DELETE FROM xi_auth_tokens WHERE auth_valid_until < FROM_UNIXTIME(1628699388)
CLEANING nagiosxi TABLE 'cmp_trapdata_log'...
SQL: DELETE FROM xi_cmp_trapdata_log WHERE trapdata_log_datetime < FROM_UNIXTIME(1621009788)
CLEANING nagiosxi TABLE 'cmp_scheduledreports_log'...
SQL: DELETE FROM xi_cmp_scheduledreports_log WHERE report_run < FROM_UNIXTIME(1597249788)
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: Deleted 0 (DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL))
CLEANING nagiosxi TABLE 'auditlog'...
SQL: DELETE FROM xi_auditlog WHERE log_time < FROM_UNIXTIME(1627921788)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1628756988)




Repair Complete: Removing Lock File

real    0m4.361s
user    0m0.436s
sys     0m0.151s
But still, performance is slow :cry: (

max_concurrent_checks=0 (web interface almost unusable)

Code: Select all

top - 16:42:36 up  7:19,  2 users,  load average: 93.72, 38.78, 24.57
Tasks: 400 total, 128 running, 272 sleeping,   0 stopped,   0 zombie
%Cpu(s): 68.3 us, 26.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.3 si,  0.0 st
KiB Mem : 16431432 total,  9984132 free,  2181016 used,  4266284 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 13232672 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16662 mysql     20   0 2628720 563436   9656 S   2.8  3.4  49:20.32 mysqld
20471 nagios    20   0  153512  11952   2380 R   2.6  0.1   0:00.32 check_ifopersta
22223 nagios    20   0  134772   5488   2152 R   2.6  0.0   0:00.11 python
20456 nagios    20   0  147636   9920   2192 R   2.3  0.1   0:00.31 check_ifopersta
20528 nagios    20   0  147372   9660   2188 R   2.3  0.1   0:00.31 check_ifopersta
22164 nagios    20   0  154900   6016   2292 D   2.3  0.0   0:00.11 python
22165 nagios    20   0  139392   5808   2132 R   2.3  0.0   0:00.10 check_ifopersta
22170 nagios    20   0  136856   5524   2152 R   2.3  0.0   0:00.10 python
22175 nagios    20   0  136852   5520   2152 R   2.3  0.0   0:00.10 python
22187 nagios    20   0  138732   5020   2128 R   2.3  0.0   0:00.11 check_ifopersta
22188 nagios    20   0  138600   5024   2128 R   2.3  0.0   0:00.11 check_ifopersta
22198 nagios    20   0  138864   5288   2132 R   2.3  0.0   0:00.12 check_ifopersta
22201 nagios    20   0  138468   4760   2128 R   2.3  0.0   0:00.11 check_ifopersta
22206 nagios    20   0  138468   4764   2128 R   2.3  0.0   0:00.10 check_ifopersta
22214 nagios    20   0  177660   7776   3452 R   2.3  0.0   0:00.10 python
22377 nagios    20   0   45444   4196   2436 R   2.3  0.0   0:00.10 snmpget
 5184 apache    20   0  643728  32440   5576 R   2.1  0.2   0:02.75 httpd
11810 nagios    20   0 1228088  41900   3524 R   2.1  0.3   0:03.18 nagios
16251 apache    20   0  536244  26676   4216 R   2.1  0.2   0:00.66 httpd
16381 apache    20   0  534888  25108   4172 R   2.1  0.2   0:00.63 httpd
17132 nagios    20   0  311868  24096   7188 R   2.1  0.1   0:01.02 php
17840 apache    20   0  641944  30972   6068 R   2.1  0.2   0:08.50 httpd
20035 nagios    20   0  153248  11568   2264 R   2.1  0.1   0:00.37 check_ifopersta
20067 nagios    20   0  153380  11576   2264 R   2.1  0.1   0:00.36 check_ifopersta
20191 nagios    20   0  153116  11316   2236 R   2.1  0.1   0:00.34 check_ifopersta
20486 nagios    20   0  147768  10180   2192 R   2.1  0.1   0:00.33 check_ifopersta
20554 nagios    20   0  147768   9912   2192 R   2.1  0.1   0:00.30 check_ifopersta
22167 nagios    20   0  138068   4496   2112 R   2.1  0.0   0:00.10 check_ifopersta
22172 nagios    20   0  177524   7264   3168 R   2.1  0.0   0:00.09 python
22173 nagios    20   0  177520   7260   3168 R   2.1  0.0   0:00.10 python
22178 nagios    20   0  138204   4496   2112 R   2.1  0.0   0:00.10 check_ifopersta
22203 nagios    20   0  139128   5548   2132 R   2.1  0.0   0:00.10 check_ifopersta
22207 nagios    20   0  138068   4500   2112 R   2.1  0.0   0:00.10 check_ifopersta
22218 nagios    20   0  128080   5060   2100 R   2.1  0.0   0:00.10 python
22219 nagios    20   0  154900   6008   2292 D   2.1  0.0   0:00.09 python
22222 nagios    20   0  139128   5552   2132 R   2.1  0.0   0:00.10 check_ifopersta
22256 nagios    20   0  136852   5524   2152 R   2.1  0.0   0:00.10 python
22261 nagios    20   0  138996   5284   2132 R   2.1  0.0   0:00.10 check_ifopersta
22406 nagios    20   0  138932   5712   2168 R   2.1  0.0   0:00.09 python
22407 nagios    20   0  175304   6408   2544 R   2.1  0.0   0:00.09 python
22410 nagios    20   0  134768   5296   2136 R   2.1  0.0   0:00.09 python
22413 nagios    20   0  136856   5668   2168 R   2.1  0.0   0:00.09 python
22424 nagios    20   0  145548   5924   2224 R   2.1  0.0   0:00.09 python
22517 nagios    20   0  154900   6012   2292 D   2.1  0.0   0:00.09 python
22572 nagios    20   0  138204   4500   2112 R   2.1  0.0   0:00.09 check_ifopersta
22581 nagios    20   0  137936   4240   2088 R   2.1  0.0   0:00.09 check_ifopersta
22627 nagios    20   0  138204   4496   2112 R   2.1  0.0   0:00.09 check_ifopersta
23166 apache    20   0  653528  42260   5804 R   2.1  0.3   0:07.66 httpd
32271 apache    20   0  641168  30448   6052 R   2.1  0.2   0:10.67 httpd
  368 nagios    20   0  172516  49476   2088 R   1.9  0.3   0:03.57 mrtg
 6532 apache    20   0  643132  32480   6260 R   1.9  0.2   0:28.21 httpd
 8798 apache    20   0  534964  25632   4772 R   1.9  0.2   0:00.65 httpd
16383 apache    20   0  536940  27548   4172 R   1.9  0.2   0:00.65 httpd
18622 nagios    20   0  131536  19652   1352 R   1.9  0.1   0:00.58 send_nrdp.sh
19966 nagios    20   0  153380  11572   2264 R   1.9  0.1   0:00.34 check_ifopersta
19972 nagios    20   0  153380  11568   2264 R   1.9  0.1   0:00.35 check_ifopersta
19973 nagios    20   0  153380  11568   2264 R   1.9  0.1   0:00.34 check_ifopersta
19997 nagios    20   0  153380  11572   2264 R   1.9  0.1   0:00.35 check_ifopersta
20074 nagios    20   0  153248  11572   2264 R   1.9  0.1   0:00.36 check_ifopersta
20087 nagios    20   0  153116  11316   2236 R   1.9  0.1   0:00.36 check_ifopersta
max_concurrent_checks=20 (web interface slow --not as before CPU usage issue--, but at least working)

Code: Select all

top - 16:47:41 up  7:24,  2 users,  load average: 59.40, 121.26, 72.98
Tasks: 225 total,  27 running, 196 sleeping,   0 stopped,   2 zombie
%Cpu(s): 76.0 us, 18.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.1 si,  0.0 st
KiB Mem : 16431432 total, 10465372 free,  1687416 used,  4278644 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 13717956 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16662 mysql     20   0 2638192 582736   9656 S  22.7  3.5  50:02.58 mysqld
23166 apache    20   0  639180  28440   5816 R  16.6  0.2   0:34.14 httpd
 2592 apache    20   0  644072  33128   6264 R  16.2  0.2   0:09.17 httpd
31314 nagios    20   0  114224   2448   1384 R  16.2  0.0   0:16.64 send_nrdp.sh
 2601 apache    20   0  643724  32460   5616 R  15.9  0.2   0:07.64 httpd
 6532 apache    20   0  640952  30312   6260 R  15.9  0.2   0:55.31 httpd
32271 apache    20   0  639648  28604   6052 R  15.6  0.2   0:37.83 httpd
24303 apache    20   0  644072  33376   6068 S  14.9  0.2   0:36.67 httpd
29116 apache    20   0  641872  31140   6052 R  14.3  0.2   0:32.67 httpd
 5318 apache    20   0  639120  28272   5712 S  13.6  0.2   0:31.15 httpd
13625 nagios    20   0  150760  11104   2236 R  11.7  0.1   0:00.36 check_ifopersta
13637 nagios    20   0  148428  10708   2220 R  11.4  0.1   0:00.35 check_ifopersta
13606 nagios    20   0  113560   1952   1388 S   9.1  0.0   0:00.28 send_nrdp.sh
13769 nagios    20   0  146184   8612   2168 R   6.8  0.1   0:00.21 check_ifopersta
13773 nagios    20   0  141900   8180   2140 R   6.8  0.0   0:00.21 check_ifopersta
  537 root      20   0   88484  40536  40156 S   5.5  0.2  10:01.19 systemd-journal
13817 nagios    20   0  187340   9224   3604 R   5.2  0.1   0:00.16 python
13832 nagios    20   0  187332   9600   3616 R   5.2  0.1   0:00.16 python
13816 nagios    20   0  184872   9016   3556 R   4.9  0.1   0:00.15 python
27771 nagios    20   0 1229272  41608   3552 S   4.9  0.3   0:08.30 nagios
13824 nagios    20   0  139524   5816   2132 R   4.5  0.0   0:00.14 check_ifopersta
13839 nagios    20   0  140316   6600   2132 R   4.2  0.0   0:00.13 check_ifopersta
    9 root      20   0       0      0      0 S   2.9  0.0   8:30.08 rcu_sched
30865 nagios    20   0  172644  49476   2088 S   2.9  0.3   0:03.34 mrtg
13628 nagios    20   0  115536   1700   1368 S   1.6  0.0   0:00.05 check_rrdtraf
30862 nagios    20   0  174596  49652   2268 S   1.6  0.3   0:03.16 mrtg
30863 nagios    20   0  172512  49448   2080 S   1.6  0.3   0:02.56 mrtg
13527 nagios    20   0  115540   1700   1368 S   1.3  0.0   0:00.04 check_rrdtraf
13881 nagios    20   0  264836   4016   2960 R   1.3  0.0   0:00.04 curl
30866 nagios    20   0  172512  49432   2080 S   1.3  0.3   0:02.50 mrtg
 5296 root      20   0  163008   3056   1608 R   1.0  0.0   0:07.04 top
27777 nagios    20   0   10844   1036    756 S   1.0  0.0   0:00.12 nagios
13551 nagios    20   0  115540   1704   1368 R   0.6  0.0   0:00.02 check_rrdtraf
   12 root      rt   0       0      0      0 S   0.3  0.0   0:04.34 watchdog/1
   19 root      20   0       0      0      0 S   0.3  0.0   0:17.64 ksoftirqd/2
 1071 root      20   0  557480  30276  28440 S   0.3  0.2   2:53.14 rsyslogd
 9165 root      20   0  161728   6216   4832 S   0.3  0.0   0:00.92 sshd
 9185 nagios    20   0  452616  33436  10524 S   0.3  0.2   0:00.98 php
13815 nagios    20   0  115536   1664   1356 S   0.3  0.0   0:00.01 check_rrdtraf
13835 nagios    20   0  115408   1648   1352 S   0.3  0.0   0:00.01 check_rrdtraf
13854 nagios    20   0  115408   1652   1352 S   0.3  0.0   0:00.01 check_rrdtraf
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Nagios xi 5.8.5 high CPU usage

Post by dchurch »

If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
User avatar
ignacio.sanchez
Posts: 22
Joined: Thu Jun 25, 2020 4:46 am

Re: Nagios xi 5.8.5 high CPU usage

Post by ignacio.sanchez »

PM sent, but it is stuck in my Outbox.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios xi 5.8.5 high CPU usage

Post by pbroste »

Hello @ignacio.sanchez

Thanks for sending over the System Profile

After review we see
[ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
. Failing on nagios_logentries. We will hit up that with the following:

Code: Select all

systemctl stop mariadb.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_logentries
systemctl start mariadb.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
Then;

Code: Select all

mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'
Important: Running these commands will clear all entries from the affected tables. After you truncate tables, you should repeat the repair process outlined above.

Please bounce the nagios.service and verify.

Thanks,
Perry
User avatar
ignacio.sanchez
Posts: 22
Joined: Thu Jun 25, 2020 4:46 am

Re: Nagios xi 5.8.5 high CPU usage

Post by ignacio.sanchez »

Hello Perry.

Seems the problem is now solved, thanks a lot! ;)

For the next time, where can see I the error of "repair failed"? (I'm asking because after launching the repair process, all seems good, no error message received)
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios xi 5.8.5 high CPU usage

Post by pbroste »

Hello @ignacio.sanchez

Most excellent, I am glad that we were able to help resolve the issue. The referencing log entry "[ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed" is found /var/log/mysqld.log or /var/log/mariadb.log.

We will go ahead and lock.

Thanks,
Perry