Check failing to run correctly on new xi server

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
danniiffxi
Posts: 121
Joined: Tue Jan 30, 2018 3:29 am
Location: UK

Check failing to run correctly on new xi server

Post by danniiffxi »

OK so on the left is our current production xi server on CentOS 6 on the right is the new build on CentOS 7. it is almost complete but i have this one issue that is bugging me.

As you can see in the screenshot, the service check on the left works fine and takes 16 seconds to execute, on the right is fails after 2 seconds.

Image

Now this is the bit that is confusing me. When I run the check from the CLI of the new server it works perfectly, but fails to work from the GUI

Code: Select all

[root@nagxit02 libexec]# /usr/local/nagios/libexec/check_internet
OK - Internet Bearer is via Primary
The script is a custom script I wrote that basically goes out and periodically checks our internet bearer status over our 10GB link, if our main site fails, the internet will fail over to our secondary site, if both fail it should go critical.

Code: Select all

#!/bin/bash
# set -x
# Check if the Internet Bearer has switched from Primary to Backup
#

# Check which Bearer is being used
# ----------------------------------------------------------
sudo traceroute -I 8.8.8.8 > /tmp/traceroute.txt

# Alert if it's the wrong one
# ----------------------------------------------------------
cat /tmp/traceroute.txt | grep "111.111.111.111" > /dev/null 2>&1
Primary=$?
if [ ${Primary} -eq 0 ]; then
  echo "OK - Internet Bearer is via Primary"
  exit 0;
fi

cat /tmp/traceroute.txt | grep "111.111.111.111" > /dev/null 2>&1
Backup=$?
if [ ${Backup} -eq 0 ]; then
  echo "WARNING - Internet Bearer is on Backup"
  exit 1;
fi

echo "CRITICAL - Internet Bearer is DOWN !!"
cat /tmp/traceroute.txt
exit 2

"/usr/local/nagios/libexec/check_internet" 35L, 887C
Any idea how i can get this to run correctly in the GUI, worth noting that the other 8000+ checks are working fine.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Check failing to run correctly on new xi server

Post by scottwilkerson »

What are the permission on this on the server on the right

Code: Select all

ls -l /tmp/traceroute.txt
Can it be read/written by the nagios user?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
danniiffxi
Posts: 121
Joined: Tue Jan 30, 2018 3:29 am
Location: UK

Re: Check failing to run correctly on new xi server

Post by danniiffxi »

Hi Scott

Both servers were set the same with the following permissions.

this is from the server that works

Code: Select all

[root@nagip01 ~]# ls -l /tmp/traceroute.txt
-rw-r--r-- 1 nagios nagios 641 Aug 18 18:14 /tmp/traceroute.txt
This is the new server

Code: Select all

[root@nagxit02 ~]# ls -l /tmp/traceroute.txt
-rw-r--r-- 1 nagios nagios 0 Aug 18 18:10 /tmp/traceroute.txt

I then did a chmod 777 and run the test again. Unfortunately It still fails in the GUI.

Code: Select all

[root@nagxit02 ~]# chmod 777 /tmp/traceroute.txt
[root@nagxit02 ~]# ls -l /tmp/traceroute.txt
-rwxrwxrwx 1 nagios nagios 0 Aug 18 18:10 /tmp/traceroute.txt
GUI output

Code: Select all

[nagios@nagxit02 ~]$ /usr/local/nagios/libexec/check_internet
CRITICAL - Internet Bearer is DOWN !!
CLI

Code: Select all

[root@nagxit02 ~]# /usr/local/nagios/libexec/check_internet
OK - Internet Bearer is via HQ
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Check failing to run correctly on new xi server

Post by scottwilkerson »

I also see this in your script

Code: Select all

sudo traceroute -I 8.8.8.8 > /tmp/traceroute.txt
Does the nagios user have sudoers permissions to do this on the new server?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
danniiffxi
Posts: 121
Joined: Tue Jan 30, 2018 3:29 am
Location: UK

Re: Check failing to run correctly on new xi server

Post by danniiffxi »

Hi Scott

Sorry for the late reply, I have been off for a while. It's all working now, you can lock this one, thank you.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Check failing to run correctly on new xi server

Post by scottwilkerson »

danniiffxi wrote:Hi Scott

Sorry for the late reply, I have been off for a while. It's all working now, you can lock this one, thank you.
Great!

Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart