Service check timed out after 60.01 seconds

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi Team,

Continuously, we are getting service check timed out alerts for datastores. Please check and let us why we are getting these kind of alerts.

Alert details: (Service check timed out after 60.01 seconds) on adc-vmvhis-p001 Datastore usage for NONPRD-BE-V015 is CRITICAL

Note: System profile details has been attached.

Regards,
Venkata Reddy

Moderator note: removed attached profile and placed it on local shared with support agents
Last edited by pbroste on Fri Nov 05, 2021 12:49 pm, edited 1 time in total.
Reason: Moderator note: removed attached profile and placed it on local shared with support agents
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Hello @Mahesh786

Thanks for sending over the System Profile.

We see 'NCPA' service checks with the following error and timing out.

Code: Select all

error: [Errno xxx ] Address family not supported by protocol
That system may have IPv6 disabled, Let's edit your ncpa.cfg on the remote system (not the XI server) and set this under the [listener] section:

Code: Select all

ip = 0.0.0.0
Then restart the listener on the remote server (not the XI server):

Code: Select all

systemctl restart ncpa_listener

Code: Select all

sudo ss -tulpn | grep -Ei 'LISTEN|ESTABLISHED'
Results:
tcp LISTEN 0 128 0.0.0.0:5693 0.0.0.0:* users:(("ncpa_listener",pid=1588,fd=11))

Code: Select all

systemctl status ncpa_listener
Then test again and see if it resolved it.

If it's still not working, attach your ncpa.cfg from the remote system and send this information:

What OS/version is the remote system running?

Code: Select all

uname -a
cat /etc/*release
/usr/local/nagios/libexec/check_ncpa.py -V
To help tackle the timeout issues:

Want to increase the check timeouts: for 'check_ncpa' by adding -t XX number of seconds timeout.
-T TIMEOUT, --timeout=TIMEOUT
Enforced timeout, will terminate plugins after this
amount of seconds. [60]

Code: Select all

vi /usr/local/nagios/etc/nagios.cfg
Service Check Timeout
Format: service_check_timeout=<seconds>
Example: service_check_timeout=60
Host Check Timeout
Format: host_check_timeout=<seconds>
Example: host_check_timeout=120

Code: Select all

vi /usr/local/ncpa/etc/ncpa.cfg
host_check_timeout=30
service_check_timeout=120
Bump the timeouts up by 60 seconds and then check to see how things look. Restart the ncpa_listener.service and nagios.service.

We also see that Performance data throwing a lot of errors and want to have you clean that up too.

Check to see if any config throwing error:

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
Is NPCD and crond running?

Code: Select all

service npcd status
service crond status

Code: Select all

service npcd restart
service crond status
Here are some troubleshooting docs, I'm sure you've looked at them but is it only the bandwidth graphs?

https://support.nagios.com/kb/article.php?id=9

https://support.nagios.com/kb/article/n ... hs-29.html

Let us know how things are looking,
Perry
Mahesh786
Posts: 30
Joined: Mon Apr 05, 2021 9:21 am

Re: Service check timed out after 60.01 seconds

Post by Mahesh786 »

Hi Team,

we have not discovered with NCPA. Discovered with vmware wizard. As this is a esxi host. Please suggest accordingly.

Regards,
Venkata Reddy
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Service check timed out after 60.01 seconds

Post by pbroste »

Hello @Mahesh786

Thanks for following up and pointing me to the correct check. Want to run the 'check_vmware_api.pl' via command line so we can see what message results we are receiving when ran. First, let's get the version:

Code: Select all

/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -V
Please provide the output results on these:

Code: Select all

/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -l mem --verbose
And then we add '-t' or '--timeout' to increase the default timeout to see results:

Code: Select all

/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 --timeout=120 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -l mem --verbose
Please let us know the results,
Perry