Enormous Daily files in /usr/local/nagios/var/archives (~1G)
-
- Posts: 17
- Joined: Fri Dec 20, 2019 2:47 pm
Enormous Daily files in /usr/local/nagios/var/archives (~1G)
I have an issue with files in /usr/local/nagios/var/archives consuming an enormous amount of disk space.
In October, files were around 275M each day, and by May, the log files had grown to nearly 1G, now they are 1.1G per day.
Installed Version: 5.6.14
Red Hat Enterprise Linux Server release 7.8 (Maipo) on VMWare
We have 320 hosts, with 6520 services being checked. We have not had a threefold increase in hosts or services, nor have we increased the frequency of the checks being performed.
I'm not really sure how to begin troubleshooting this issue, but I've looked in the log file, and I've found a number of hosts that have long since been removed from Nagios that appear in each day's log file. The last timestamp on one of the hosts is 1607061600 (very annoying that this logs using the epoch instead of standard times, or, preferably, ISO8601), and that converts to Fri Dec 04 2020 00:00:00 GMT-0600 (Central Standard Time).
If I search the system for files with the names of these systems, I find them in /usr/local/nagios/share/perfdata/. Is this log rewriting all of the data from that directory every day? If so, why is the entry in perfdata not removed when we delete the host and its services from the system? Is there an additional step we should be taking?
In October, files were around 275M each day, and by May, the log files had grown to nearly 1G, now they are 1.1G per day.
Installed Version: 5.6.14
Red Hat Enterprise Linux Server release 7.8 (Maipo) on VMWare
We have 320 hosts, with 6520 services being checked. We have not had a threefold increase in hosts or services, nor have we increased the frequency of the checks being performed.
I'm not really sure how to begin troubleshooting this issue, but I've looked in the log file, and I've found a number of hosts that have long since been removed from Nagios that appear in each day's log file. The last timestamp on one of the hosts is 1607061600 (very annoying that this logs using the epoch instead of standard times, or, preferably, ISO8601), and that converts to Fri Dec 04 2020 00:00:00 GMT-0600 (Central Standard Time).
If I search the system for files with the names of these systems, I find them in /usr/local/nagios/share/perfdata/. Is this log rewriting all of the data from that directory every day? If so, why is the entry in perfdata not removed when we delete the host and its services from the system? Is there an additional step we should be taking?
-
- Dreams In Code
- Posts: 7682
- Joined: Wed Feb 11, 2015 12:54 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.
Please compress one of the large archive files so I can see what's consuming the space:
Then PM me the resulting /tmp/archive.tar.gz file.
The /usr/local/nagios/share/perfdata files are where the graphing data is stored (in the RRD files), they are not auto-cleaned up from that directory.
You could do this to clean them up:
https://support.nagios.com/kb/article/n ... s-854.html
Please compress one of the large archive files so I can see what's consuming the space:
Code: Select all
GZIP=-9 tar czvf /tmp/archive.tar.gz /usr/local/nagios/var/archives/nagios-06-15-2021-00.log
The /usr/local/nagios/share/perfdata files are where the graphing data is stored (in the RRD files), they are not auto-cleaned up from that directory.
You could do this to clean them up:
https://support.nagios.com/kb/article/n ... s-854.html
-
- Posts: 17
- Joined: Fri Dec 20, 2019 2:47 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Any progress on identifying the cause? Were you able to download the files from the links I provided?
-
- Dreams In Code
- Posts: 7682
- Joined: Wed Feb 11, 2015 12:54 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
A lot of your checks are failing, they are timing out which causes them to be rechecked more often because of the retry_interval resulting in a ton of logs.
What is the output of these commands as root:
Additionally, please send the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
What is the output of these commands as root:
Code: Select all
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
sysctl -p
chage -l nagios
grep standard /var/lib/pgsql/data/postgresql.conf
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
-
- Posts: 17
- Joined: Fri Dec 20, 2019 2:47 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Okay. That was a concern of mine. These are definitely checks on existing hosts only, right? I'm still not sure why we're seeing hosts that have been deleted in the archives files.
Here are the outputs you requested.
Here are the outputs you requested.
Code: Select all
[root@nagiosxi ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31192
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31192
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@nagiosxi ~]# su -s /bin/bash -c 'ulimit -a' nagios
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31192
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@nagiosxi ~]# su -s /bin/bash -c 'ulimit -a' mysql
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31192
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@nagiosxi ~]# sysctl -p
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
[root@nagiosxi ~]# chage -l nagios
Last password change : May 01, 2020
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
[root@nagiosxi ~]# grep standard /var/lib/pgsql/data/postgresql.conf
#standard_conforming_strings = on
Code: Select all
# echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.04 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 295.40 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.48 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.03 |
| nagios_contactgroup_members | 0.01 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 8.97 |
| nagios_contactnotifications | 9.43 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.49 |
| nagios_customvariablestatus | 0.52 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 9.09 |
| nagios_eventhandlers | 0.01 |
| nagios_externalcommands | 0.01 |
| nagios_flappinghistory | 7.08 |
| nagios_host_contactgroups | 0.02 |
| nagios_host_contacts | 0.00 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.01 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.07 |
| nagios_hoststatus | 0.17 |
| nagios_instances | 0.00 |
| nagios_logentries | 346.42 |
| nagios_notifications | 15.34 |
| nagios_objects | 1.24 |
| nagios_processevents | 0.26 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.27 |
| nagios_service_contacts | 0.03 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 1.33 |
| nagios_servicestatus | 3.20 |
| nagios_statehistory | 469.92 |
| nagios_systemcommands | 0.04 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.01 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.04 |
| tbl_contact | 0.01 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.07 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.02 |
| tbl_info | 0.13 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.00 |
| tbl_lnkHostToContactgroup | 0.01 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.01 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.02 |
| tbl_lnkServiceToContactgroup | 0.13 |
| tbl_lnkServiceToHost | 0.16 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.22 |
| tbl_lnkServiceToVariabledefinition | 0.16 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.00 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 1.24 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.01 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.01 |
| tbl_timeperiod | 0.01 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.47 |
| xi_auditlog | 0.08 |
| xi_auth_tokens | 0.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.02 |
| xi_eventqueue | 0.03 |
| xi_events | 0.05 |
| xi_meta | 0.02 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.03 |
| xi_usermeta | 0.05 |
| xi_users | 0.03 |
+--------------------------------------------+------------+
Code: Select all
# echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
table | size | externalsize
---------------------+---------+--------------
xi_meta | 133 MB | 119 MB
xi_events | 60 MB | 60 MB
xi_auth_tokens | 4264 kB | 3864 kB
xi_auditlog | 1480 kB | 1000 kB
xi_usermeta | 360 kB | 232 kB
xi_commands | 128 kB | 72 kB
xi_sysstat | 104 kB | 72 kB
xi_options | 104 kB | 72 kB
xi_users | 72 kB | 64 kB
xi_mibs | 72 kB | 64 kB
xi_sessions | 40 kB | 40 kB
xi_eventqueue | 32 kB | 24 kB
xi_cmp_trapdata | 24 kB | 24 kB
xi_cmp_trapdata_log | 16 kB | 16 kB
xi_incidents | 0 bytes | 0 bytes
(15 rows)
-
- Dreams In Code
- Posts: 7682
- Joined: Wed Feb 11, 2015 12:54 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Please edit this file:
Change this:
To this:
Then restart the postgresql/httpd/crond services:
Then run this command:
Let me know if that resolves the issue with the old hosts showing up.
If it doesn't, try doing this as well:
Please go to Configure > Core Config Manager > Tools > Config File Management:
- Click the Delete Files button (don't worry, it's safe, they will be rewritten)
- Then click the Write Configs button
- Then apply configuration
Code: Select all
/var/lib/pgsql/data/postgresql.conf
Code: Select all
#standard_conforming_strings = on
Code: Select all
standard_conforming_strings = off
Code: Select all
systemctl restart postgresql httpd crond
Code: Select all
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
If it doesn't, try doing this as well:
Please go to Configure > Core Config Manager > Tools > Config File Management:
- Click the Delete Files button (don't worry, it's safe, they will be rewritten)
- Then click the Write Configs button
- Then apply configuration
-
- Posts: 17
- Joined: Fri Dec 20, 2019 2:47 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Unfortunately, even after rewriting the config files, servers that no longer exist are appearing in the archive files. And the files are still the same size, 1.1GB as of today.
-
- Posts: 1253
- Joined: Tue Mar 02, 2021 11:15 am
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Hi
From a command line:
Do you see the "old" hosts in that list?
Thanks
From a command line:
Code: Select all
mysql -u root -p nagios;
select display_name from nagios_hosts;
Thanks
-
- Posts: 17
- Joined: Fri Dec 20, 2019 2:47 pm
Re: Enormous Daily files in /usr/local/nagios/var/archives (
I do not.
As an example, I see this in the log file:
And that does not exist in the display_name column.
But the same query syntax will find "localhost," which exists there.
As an example, I see this in the log file:
Code: Select all
[1596036618] Warning: Check of host 'centos3' timed out after 30.00 seconds
[1596036678] wproc: host=centos3; service=(null);
Code: Select all
MariaDB [nagios]> select * from nagios_hosts where display_name like '%centos%' ;
Empty set (0.01 sec)
Code: Select all
MariaDB [nagios]> select * from nagios_hosts where display_name like '%local%' ;
...
-------+----------------+------+------+----------------+------+------+------+------------+
1 row in set (0.00 sec)
-
- Posts: 1253
- Joined: Tue Mar 02, 2021 11:15 am
Re: Enormous Daily files in /usr/local/nagios/var/archives (
Hi
From a command line:
Do you see the "old" hosts in that list?
Thanks
From a command line:
Code: Select all
mysql -u root -p nagios;
select name1 from nagios_objects;
Thanks