Nagios ramdisk full and no performance graphs

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

Account was not locked.

FYI, restarting Nagios Ramdisk seems to delete the xidpe folder from /var/nagiosramdisk/spool. I have to recreate it each time. I will send you the tar file over PM.

Code: Select all

 ls -l /var/nagiosramdisk/*perfdata
-rw-r--r-- 1 nagios nagios  71896 Nov  8 11:20 /var/nagiosramdisk/host-perfdata
-rw-r--r-- 1 nagios nagios 273566 Nov  8 11:20 /var/nagiosramdisk/service-perfdata

ls -l /var/nagiosramdisk/spool/perfdata/
total 0

ls -l /var/nagiosramdisk/spool/xidpe/
total 0

systemctl status ramdisk.service
● ramdisk.service - Ramdisk
   Loaded: loaded (/usr/lib/systemd/system/ramdisk.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2021-11-03 10:25:06 EDT; 5 days ago
 Main PID: 17274 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/ramdisk.service

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.


systemctl status ramdisk.service
● ramdisk.service - Ramdisk
   Loaded: loaded (/usr/lib/systemd/system/ramdisk.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2021-11-08 11:19:17 EST; 12s ago
  Process: 14565 ExecStart=/usr/bin/chown -R nagios:nagios /var/nagiosramdisk (code=exited, status=0/SUCCESS)
  Process: 14562 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk/tmp/var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults/var/nagiosramdisk/spool/xidpe /var/nagiosramdisk/spool/perfdata (code=exited, status=0/SUCCESS)
  Process: 14559 ExecStartPre=/usr/bin/mount -t tmpfs -o size=500m tmpfs /var/nagiosramdisk (code=exited, status=0/SUCCESS)
  Process: 14557 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk (code=exited, status=0/SUCCESS)
 Main PID: 14565 (code=exited, status=0/SUCCESS)

Nov 08 11:19:17 HOST systemd[1]: Starting Ramdisk...
Nov 08 11:19:17 HOST systemd[1]: Started Ramdisk.


User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma

Thanks for sending over the configs, and after review, we see the configs are good.

Circling back to our previous message, we explained that the 'tmpfs' partition for the 'nagiosramdisk' mount point is stored in RAM and not on hard disk. We stated that even though it deletes the file, it won't reclaim the space until the held open file is closed. By looking at the list of open files we can get a list or count.

Code: Select all

yum install lsof -y
lsof | grep deleted      #list of all open files
lsof | grep deleted | grep -Ei 'nagiosramdisk'    #list of all found in 'nagiosramdisk'
lsof | grep deleted | grep -Ei 'nagiosramdisk' | wc -l    #list of number of open files
We see that the list of deleted files has been "flagged" as deleted but not released from the process holding open. This can be caused by security applications that are holding the file open. Suggestion to take a look to see what is used on the system for security. We see that there is a running process '/opt/bit9/bin/b9daemon' that appears to be associated with 'Carbon Black' Security agent and want to suggest stopping the 'b9daemon' to help verify.

Thanks,
Perry
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

I will get with our IT Security department about possibly turning off Carbon Black temporarily to see if things improve. However, it should be mentioned that we have 8 other Nagios XI servers each setup with Nagios RamDisk and Carbon Black that do not have this issue.

There are a few files in /var/lib/sss/mc/initgroups and a few files in /var/lib/sss/mc/passwd. The only other 2 files in lsof | grep deleted are the 2 listed below.


lsof | grep deleted | grep -Ei 'nagiosramdisk'
nagios 31016 nagios 24w REG 0,44 32229 107368773 /var/nagiosramdisk/spool/perfdata/1636404336.perfdata.host-PID-31893 (deleted)
nagios 31016 nagios 38w REG 0,44 110648 107368775 /var/nagiosramdisk/spool/perfdata/1636404335.perfdata.service-PID-31891 (deleted)

$ lsof | grep deleted | grep -Ei 'nagiosramdisk' | wc -l
2
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma

Please also toss these values over as well for our review:

Code: Select all

sysctl -p
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
Thanks,
Perry
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

sysctl -p
net.ipv6.conf.default.accept_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.default.log_martians = 1
net.ipv4.ip_forward = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.log_martians = 1
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
kernel.randomize_va_space = 2
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.shmmax = 4294967295
kernel.shmall = 268435456


ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63446
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63446
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited



su -s /bin/bash -c 'ulimit -a' nagios
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63446
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


Our MariaDB instance is offloaded. Here are the values for the offloaded DB:
su -s /bin/bash -c 'ulimit -a' mysql
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63449
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

I have turned off RAMDISK and now the host-perfdata and service-perfdata are temporarily started processing, then stopped again. After Carbon Black is turned back on, things are still processing. Very odd. I will reach out to Carbon Black about that.

Oddly, looking through, I am now seeing that some of the perf graphs started showing up yesterday at 11:30. A little while after Nagios RamDisk was restarted. However, not all the checks have perf graphs. For instance, the Nagios XI server's Memory check through NCPA still has no perf graph. Even odder, the gauge at the bottom of the perf graph has information.

check_xi_ncpa!-t 'TOKEN' -P 5693 -M memory/virtual/percent -w 80 -c 90!!!!!!!
Last edited by hbouma on Tue Nov 09, 2021 8:30 am, edited 2 times in total.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

The really odd thing is that the perfdata log keeps saying 0 files processed.

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:13:01 -0500

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:14:01 -0500
Outbound data DISABLED Tue, 09 Nov 2021 08:15:01 -0500

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:15:58 -0500

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:16:01 -0500

DONE. Processed 0 files.

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:17:01 -0500
Outbound data DISABLED Tue, 09 Nov 2021 08:18:01 -0500

DONE. Processed 0 files.

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:19:01 -0500

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:20:01 -0500

DONE. Processed 0 files.
Outbound data DISABLED Tue, 09 Nov 2021 08:23:01 -0500
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma

Want to increase the logging data:

Code: Select all

 /usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:
LOG_LEVEL = 0
To:
LOG_LEVEL = 2
The (ps -aux | grep -Ei 'process_perfdata.pl') script should now log all errors:

Code: Select all

tail -f /usr/local/nagios/var/perfdata.log
Please provide a copy on all messages of interest,
Perry
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

The log file hasn't updated since January 2021. I see data in the nagiosxi/var/perfdataproc.log file though.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma
hbouma wrote:I have turned off RAMDISK and now the host-perfdata and service-perfdata are temporarily started processing, then stopped again. After Carbon Black is turned back on, things are still processing. Very odd. I will reach out to Carbon Black about that.
I understand that with Carbon Black disabled the Performance Data is coming across now. What is the current status of the RAMDISK?
Oddly, looking through, I am now seeing that some of the perf graphs started showing up yesterday at 11:30. A little while after Nagios RamDisk was restarted. However, not all the checks have perf graphs. For instance, the Nagios XI server's Memory check through NCPA still has no perf graph. Even odder, the gauge at the bottom of the perf graph has information.

check_xi_ncpa!-t 'TOKEN' -P 5693 -M memory/virtual/percent -w 80 -c 90!!!!!!!
Some of the checks are not displaying Performance Data, is this consistent or random?

In the previous update you stated;
The log file hasn't updated since January 2021. I see data in the nagiosxi/var/perfdataproc.log file though.
Please verify that the logging config is correct in '/usr/local/nagios/etc/pnp/process_perfdata.cfg' find out where it is writing to:
#
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 2
Would you please let us know how things are looking and what we need to focus on next.

Thanks again,
Perry