Nagios xi cannot read the /etc/passwd or /etc/groups file

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
User avatar
inversecow
Posts: 44
Joined: Wed Sep 25, 2019 4:17 pm

Nagios xi cannot read the /etc/passwd or /etc/groups file

Post by inversecow »

Ahoy folks,

I am working to setup a DR scenario involving two Nagios xi APP nodes and a shared NFS mount from an off-box NFS "filer" (basic VM with big disk attached / shared).
In this present scenario I am using basic deployments of Nagios xi (stand-alone, all-in-one, out-of-box), with some HOST && SERVICE (ACTIVE) checks (going against remote NCPA agents).

- app_node1.fqdn: Nagios xi 5.8.x, RHEL 7 VM
- app_node2.fqdn: Nagios xi 5.8.x, RHEL 8 VM (app install in-flight)
- filer_node1.fqdn: NFS filer VM, RHEL 7 VM
- local disk is "shared" via NFS out to app_node1 && app_node2

At the OS level, both xi boxes can talk to the filer (via NFS), but not to each other.
I can manually interact with the remote NFS share also (from both app_node boxes).
However, when I attempt to use the Nagios xi "Scheduled Backups" facility ("local" mode), I get an unfortunate error.

"Nagios xi cannot read the /etc/passwd or /etc/groups file."

This is only when I have placed my (locally mounted & available) NFS share in the "Location" field (/remote_nagiosbkup/inframon).
This is not a problem when the default value is present (/store/backups/nagiosxi).
I have not tried creating a local "sym-link" to this share (yet), but do not feel it will be a solution to this error.

I would be pleased for any insights / recommendations on how to get past this error?
Also, any thoughts on how to get around the unfortunate NFS mount option "no_root_squash" dilema (which is not good from a security perspective)?

Once I get this working, I will then take the next steps on my PROD nodes (which run off-box managed dBs, but also have remote NFS shares presented).

Signed,

- Rowan

---

# env details

## FILER node

Code: Select all

RHEL7.9 VM

[15:24:16:user@filer_node1:~]
$ cat /etc/fstab | grep nagios
/dev/mapper/vg_nagiosbkup-lv_nagios     /nagiosbkup     ext4    defaults        0 0

[15:24:21:user@filer_node1:~]
$ mount | grep nagios
/dev/mapper/vg_nagiosbkup-lv_nagios on /nagiosbkup type ext4 (rw,relatime,data=ordered)

[15:25:36:user@filer_node1:~]
$ ls -alh / | grep nagios
drwxr-xrwx    6 root      root      4.0K Jul 16 14:06 nagiosbkup/

[15:29:33:user@filer_node1:~]
$ ls -alh /nagiosbkup/ | grep inframon
drwxr-xr-x   2 user users_group 4.0K Jul 16 15:17 inframon/

[15:32:26:user@filer_node1:~]
$ cat /etc/exports | grep nagios
/nagiosbkup app_node1.fqdn(rw,sync,insecure,no_root_squash) app_node2.fqdn(rw,sync,insecure,no_root_squash)

[15:54:36:user@filer_node1:~]
$ getent passwd nagios
nagios:x:189:175::/home/nagios:/bin/bash
## ACTIVE node

Code: Select all

RHEL7.7 VM
Nagios xi (5.8.3)
all-in-one / stand-alone deployment (Nagios provided dB on-box)

START:          2021:07:16:15:34:36:PDT
SOURCE IP:      app_node1.fqdn
TARGET IP:      filer_node1.fqdn
*       TEST RESULT:
**      PING RESULT:
PING filer_node1.fqdn (filer_node1.ip) 56(84) bytes of data.
64 bytes from filer_node1.fqdn (filer_node1.ip): icmp_seq=1 ttl=57 time=8.29 ms

--- filer_node1.fqdn ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.297/8.297/8.297/0.000 ms
**      PORT PROBE RESULT:
***     PORT TEST (22:TCP)
Ncat: Connected to filer_node1.ip:22.
Ncat: 0 bytes sent, 0 bytes received in 0.03 seconds.

***     PORT TEST (111:TCP)
Ncat: Connected to filer_node1.ip:111.
Ncat: 0 bytes sent, 0 bytes received in 0.03 seconds.

***     PORT TEST (2049:TCP)
Ncat: Connected to filer_node1.ip:2049.
Ncat: 0 bytes sent, 0 bytes received in 0.03 seconds.

END:            2021:07:16:15:34:36:PDT

[15:39:04:user@app_node1.fqdn:~]
$ cat /etc/fstab | grep nagios
filer_node1.fqdn:/nagiosbkup /remote_nagiosbkup nfs auto,noatime,nolock,bg,nfsvers=4,intr,tcp,actimeo=1800,retry=60,_netdev 0 0

[15:39:07:user@app_node1.fqdn:~]
$ mount | grep nagios
filer_node1.fqdn:/nagiosbkup on /remote_nagiosbkup type nfs4 (rw,noatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=app_node1.ip,local_lock=none,addr=filer_node1.ip,_netdev)

[15:39:26:user@app_node1.fqdn:~]
$ ls -alh / | grep nagios
drwxr-xrwx    6 root root 4.0K Jul 16 14:06 remote_nagiosbkup

[15:39:51:user@app_node1.fqdn:~]
$ ls -alh /remote_nagiosbkup/ | grep inframon
drwxr-xrwx   2 user users_group 4.0K Jul 16 15:17 inframon

[15:55:05:user@app_node1.fqdn:~]
$ getent passwd nagios
nagios:x:1001:100::/home/nagios:/bin/bash
## STANDBY node

Code: Select all

RHEL8.4 VM
Nagios xi 5.8.x (install inflight)
all-in-one / stand-alone deployment (Nagios provided dB on-box)

START:          2021:07:16:15:43:56:PDT
SOURCE IP:      app_node2.ip
TARGET IP:      filer_node1.fqdn
*       TEST RESULT:
**      PING RESULT:
PING filer_node1.fqdn (filer_node1.ip) 56(84) bytes of data.
64 bytes from filer_node1.fqdn (filer_node1.ip): icmp_seq=1 ttl=56 time=8.04 ms

--- filer_node1.fqdn ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.036/8.036/8.036/0.000 ms
**      PORT PROBE RESULT:
***     PORT TEST (22:TCP)
Ncat: Connected to filer_node1.ip:22.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.

***     PORT TEST (111:TCP)
Ncat: Connected to filer_node1.ip:111.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

***     PORT TEST (2049:TCP)
Ncat: Connected to filer_node1.ip:2049.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.

END:            2021:07:16:15:43:56:PDT

[15:43:56:user@app_node2.fqdn:~]
$ cat /etc/fstab | grep nagios
filer_node1.fqdn:/nagiosbkup /remote_nagiosbkup nfs auto,noatime,nolock,bg,nfsvers=4,intr,tcp,actimeo=1800,retry=60,_netdev 0 0

[15:44:40:user@app_node2.fqdn:~]
$ mount | grep nagios
filer_node1.fqdn:/nagiosbkup on /remote_nagiosbkup type nfs4 (rw,noatime,vers=4.2,rsize=524288,wsize=524288,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=app_node2.ip,local_lock=none,addr=filer_node1.ip,_netdev)

[15:44:48:user@app_node2.fqdn:~]
$ ls -alh / | grep nagios
drwxr-xrwx    6 root root 4.0K Jul 16 14:06 remote_nagiosbkup/

[15:44:55:user@app_node2.fqdn:~]
$ ls -alh /remote_nagiosbkup/ | grep inframon
drwxr-xr-x   2 user users_group 4.0K Jul 16 15:17 inframon/

[15:55:25:user@app_node2.fqdn:~]
$ getent passwd nagios
nagios:x:188:184::/home/nagios:/bin/bash
## manual backup (cli, successful)

Code: Select all

[15:14:37:user@app_node1.fqdn:~]
$ sudo /usr/local/nagiosxi/scripts/backup_xi.sh -p inframon_app_node1.fqdn -d /remote_nagiosbkup/inframon
\nStarting new backup....\n
Backing up Nagios Core...
tar: Removing leading `/' from member names
tar: /usr/local/nagios/var/rw/nagios.qh: socket ignored
Backing up Nagios xi...
tar: Removing leading `/' from member names
Backing up MRTG...
tar: Removing leading `/' from member names
Backing up the SNMP directories
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Backing up NRDP...
tar: Removing leading `/' from member names
Backing up Nagvis...
tar: Removing leading `/' from member names
Backing up nagios user home dir...
tar: Removing leading `/' from member names
Backing up MySQL databases...
Backing up cronjobs for Apache...
Backing up logrotate config files...
Backing up Apache config files...
Compressing backup...

===============
BACKUP COMPLETE
===============
Backup stored in /remote_nagiosbkup/inframon/inframon_app_node1.fqdn.1626473684.tar.gz

[15:19:14:user@app_node1.fqdn:~]
$ ls -alh /remote_nagiosbkup/inframon/inframon_app_node1.fqdn.1626473684.tar.gz
-rw-r--r-- 1 nagios nagios 156M Jul 16 15:17 /remote_nagiosbkup/inframon/inframon_app_node1.fqdn.1626473684.tar.gz
“And who better understands the Unix-nature?” Master Foo asked.
“Is it he who writes the ten thousand lines, or he who, perceiving the emptiness of the task, gains merit by not coding?”
Master Foo - The ten thousand Lines
Unix Koans of Master Foo
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Nagios xi cannot read the /etc/passwd or /etc/groups fil

Post by gsmith »

Hi

On the NFS filer:

Code: Select all

adduser nagios
groupadd nagios
chown nagios /<exported dir>
chgrp nagios /<exported dir>
Use the "Test Permissions" button to verify things are working:
Image1.jpg

Thanks
You do not have the required permissions to view the files attached to this post.
User avatar
inversecow
Posts: 44
Joined: Wed Sep 25, 2019 4:17 pm

Re: Nagios xi cannot read the /etc/passwd or /etc/groups fil

Post by inversecow »

Many thanks, that gave me the "bread crumb" I needed!

I completed this on my NFS filer, and then noted a different error (to do with the `apache` user having some sort of permissions issue).
I thought to double-check and verified all was set from the NFS filer (`nagios:nagios` existed on the NFS filer, and exported dir was set to `nagios:nagios`).
I had a minor brain wave, and verified from the perspective of the xi APP node, which saw different "user:group" settings than the filer host.

Thus, I applied your recommended settings, from the context of the xi APP node (that had the share mounted).
This is what enabled successful `test connection` & execution of scheduled backups.

Is it plausible I might also remove the `no_root_squash` option in my `/etc/exports` definitions (on the NFS filer), using the `nagios:nagios` permissioning?
“And who better understands the Unix-nature?” Master Foo asked.
“Is it he who writes the ten thousand lines, or he who, perceiving the emptiness of the task, gains merit by not coding?”
Master Foo - The ten thousand Lines
Unix Koans of Master Foo
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Nagios xi cannot read the /etc/passwd or /etc/groups fil

Post by gsmith »

Great!

Yes, removing no_root_squash would be a good thing
By default, NFS shares change the root user to the nfsnobody user, an unprivileged user account. In this way, all root-created files are owned by nfsnobody, which prevents uploading of programs with the setuid bit set.
If no_root_squash is used, remote root users are able to change any file on the shared file system and leave trojaned applications for other users to inadvertently execute.
Please let me know if you have anymore questions or if I can close this topic.

Thanks