Hi Team,
Continuously, we are getting service check timed out alerts for datastores. Please check and let us why we are getting these kind of alerts.
Alert details: (Service check timed out after 60.01 seconds) on adc-vmvhis-p001 Datastore usage for NONPRD-BE-V015 is CRITICAL
Note: System profile details has been attached.
Regards,
Venkata Reddy
Moderator note: removed attached profile and placed it on local shared with support agents
Service check timed out after 60.01 seconds
-
- Posts: 30
- Joined: Mon Apr 05, 2021 9:21 am
Service check timed out after 60.01 seconds
Last edited by pbroste on Fri Nov 05, 2021 12:49 pm, edited 1 time in total.
Reason: Moderator note: removed attached profile and placed it on local shared with support agents
Reason: Moderator note: removed attached profile and placed it on local shared with support agents
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Service check timed out after 60.01 seconds
Hello @Mahesh786
Thanks for sending over the System Profile.
We see 'NCPA' service checks with the following error and timing out.
That system may have IPv6 disabled, Let's edit your ncpa.cfg on the remote system (not the XI server) and set this under the [listener] section:
Then restart the listener on the remote server (not the XI server):
Results:
Then test again and see if it resolved it.
If it's still not working, attach your ncpa.cfg from the remote system and send this information:
What OS/version is the remote system running?
To help tackle the timeout issues:
Want to increase the check timeouts: for 'check_ncpa' by adding -t XX number of seconds timeout.
We also see that Performance data throwing a lot of errors and want to have you clean that up too.
Check to see if any config throwing error:
Is NPCD and crond running?
Here are some troubleshooting docs, I'm sure you've looked at them but is it only the bandwidth graphs?
https://support.nagios.com/kb/article.php?id=9
https://support.nagios.com/kb/article/n ... hs-29.html
Let us know how things are looking,
Perry
Thanks for sending over the System Profile.
We see 'NCPA' service checks with the following error and timing out.
Code: Select all
error: [Errno xxx ] Address family not supported by protocol
Code: Select all
ip = 0.0.0.0
Code: Select all
systemctl restart ncpa_listener
Code: Select all
sudo ss -tulpn | grep -Ei 'LISTEN|ESTABLISHED'
tcp LISTEN 0 128 0.0.0.0:5693 0.0.0.0:* users:(("ncpa_listener",pid=1588,fd=11))
Code: Select all
systemctl status ncpa_listener
If it's still not working, attach your ncpa.cfg from the remote system and send this information:
What OS/version is the remote system running?
Code: Select all
uname -a
cat /etc/*release
/usr/local/nagios/libexec/check_ncpa.py -V
Want to increase the check timeouts: for 'check_ncpa' by adding -t XX number of seconds timeout.
-T TIMEOUT, --timeout=TIMEOUT
Enforced timeout, will terminate plugins after this
amount of seconds. [60]
Code: Select all
vi /usr/local/nagios/etc/nagios.cfg
Service Check Timeout
Format: service_check_timeout=<seconds>
Example: service_check_timeout=60
Host Check Timeout
Format: host_check_timeout=<seconds>
Example: host_check_timeout=120
Code: Select all
vi /usr/local/ncpa/etc/ncpa.cfg
Bump the timeouts up by 60 seconds and then check to see how things look. Restart the ncpa_listener.service and nagios.service.host_check_timeout=30
service_check_timeout=120
We also see that Performance data throwing a lot of errors and want to have you clean that up too.
Check to see if any config throwing error:
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
Code: Select all
service npcd status
service crond status
Code: Select all
service npcd restart
service crond status
https://support.nagios.com/kb/article.php?id=9
https://support.nagios.com/kb/article/n ... hs-29.html
Let us know how things are looking,
Perry
-
- Posts: 30
- Joined: Mon Apr 05, 2021 9:21 am
Re: Service check timed out after 60.01 seconds
Hi Team,
we have not discovered with NCPA. Discovered with vmware wizard. As this is a esxi host. Please suggest accordingly.
Regards,
Venkata Reddy
we have not discovered with NCPA. Discovered with vmware wizard. As this is a esxi host. Please suggest accordingly.
Regards,
Venkata Reddy
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Service check timed out after 60.01 seconds
Hello @Mahesh786
Thanks for following up and pointing me to the correct check. Want to run the 'check_vmware_api.pl' via command line so we can see what message results we are receiving when ran. First, let's get the version:
Please provide the output results on these:
And then we add '-t' or '--timeout' to increase the default timeout to see results:
Please let us know the results,
Perry
Thanks for following up and pointing me to the correct check. Want to run the 'check_vmware_api.pl' via command line so we can see what message results we are receiving when ran. First, let's get the version:
Code: Select all
/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -V
Code: Select all
/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -l mem --verbose
Code: Select all
/usr/local/nagios/libexec/check_vmware_api.pl -H adc-vmvhis-p001 --timeout=120 -f /usr/local/nagiosxi/etc/components/vmware/adc_vmvhis_p001_auth.txt -l mem --verbose
Perry