Problem Description
This KB article addresses the following NRPE error:
CHECK_NRPE: Socket Timeout After n Seconds
Assumed Knowledge
The following KB article contains an explanation of how NRPE works and may need to be referenced to completely understand the problem and solution that is provided here:
NRPE - Agent and Plugin Explained
Troubleshooting The Error
This is one of the harder errors to pin down errors. More often than not the solution to this problem can be found while following this KB article:
NRPE - CHECK_NRPE: Error - Could Not Complete SSL Handshake
However sometimes it is not related to SSL or your allowed hosts. In these instances, it can either be that a plugin is taking longer than "n" seconds to return the check, or there is a firewall/port issue.
Timeout Issues
You can increase the timeout on the check, though you will have to alter the check in xi and the command and connection timeout in the nrpe.cfg file on the remote host.
Nagios xi check_nrpe Timeout
This timeout is how long the check_nrpe command on the Nagios xi server will wait for a response from the NRPE agent. By default the timeout is set to 10 seconds, which is too short for certain checks (disk/filesystem/database checks among others) however in Nagios xi the default has been defined at 30.
In the Nagios xi web interface navigate to Configure > Core Config Manager > Commands. This brings up the Commands page, use the Search field to search for nrpe and click Search.
Click the check_nrpe command.
You can change the timeout in Nagios xi with the switch -t in the check_nrpe command.
In the Command Line, change -t xx to a higher value, in the screenshot above you can see it is set to 30 seconds.
Save your changes and then click the Apply Configuration button.
NRPE Client Timeout
This timeout is how long the NRPE client on the Nagios xi server will wait for a response from the plugin it executes before returning a result to Nagios xi. You may need to change a couple settings in the remote host's /usr/local/nagios/etc/nrpe.cfgfile depending on how high you set the timeout in Nagios xi. Edit the file with the following command:
vi /usr/local/nagios/etc/nrpe.cfg
When using the vi editor, to make changes press i on the keyboard first to enter insert mode. Press Esc to exit insert mode.
Search for the command_timeout= and connection_timeout= settings which may need to be altered. Set both of these, at minimum, to the value of the timeout in Nagios xi. Usually the connection_timeout=300 is more than enough, as is the command_timeout which defaults to 60 seconds. If you do set your timeout in Nagios xi higher, increase the command_timeout to match.
Plugin Timeout
You may also find that certain plugins also have their own timeout argument, if this does exist you would need to define your NRPE command to also take this into account.
Nagios xi Global Timeout
Nagios xi by default has a global timeout for host (30 seconds) and service (60 seconds) check commands. This means if you were to change the check_nrpe command timeout in Nagios xi with the switch -t to 120, Nagios xi will not wait for 120 seconds to pass, the global timeout will stop at 60 seconds.
To adjust the global timeout, navigate to Configure > Core Config Manager > CCM Admin > Core Configs. This brings up the Core Configs page and by default the General [nagios.cfg] tab is selected. The two directives to change are:
host_check_timeout=30
service_check_timeout=60
Click Save Changes to update these settings and then Apply Config via Quick Tools.
A Realistic Discussion On Timeouts
After reading all of that you might think to yourself "I'm going to go and change all the timeouts to 120 seconds". It's not as simple as that, you need to take into account that each layer of timeout needs to take into account the previous layer. If Nagios xi global timeout was set to 120 seconds and the NRPE was command_timeout=120 then it may take a whole second before it gets to NRPE, you will need to take that into account, here's an example of the "layers":
-
Nagios xi Global Timeout
- 120
-
check_nrpe timeout on Nagios xi server
- 119
-
connection_timeout= on NRPE Client
- 118
-
command_timeout= on NRPE Client
- 117
-
Plugin specific timeout (if any)
-
116
-
This completes the section on timeouts. The remaining part of this KB article helps identify other reasons why Socket Timeout After n Seconds may be occurring.
Check the NRPE Service Status:
You may receive this error if the NRPE daemon is not running on the remote host. If you are using xinetd, you can check the status of the service by logging onto the remote host as root and running the following command:
service xinetd status
You should see output similar to the following:
xinetd (pid 1260) is running...
If you are using the init-script method, or if your distribution does not use the "service" command, you can always grep a process listing:
ps -aef | grep nrpe
You should see output similar to the following (important bits in bold):
nagios 53213 1 0 Feb26 ? 00:00:07 /usr/libexec/nrpe -c /etc/nagios/nrpe.cfg --daemon
If NRPE/xinetd is not running, start it with the following command:
service xinetd start
Or if you are not using xinetd:
/path/to/init/script start
The following KB article provides details on the commands that each operating system uses to control NRPE:
NRPE - How To Install NRPE v3 From Source
Check Firewall and Port Settings:
The last of the probable causes of this error is associated with firewalls and ports. If the NRPE traffic is not traversing a firewall, you will see the checks timeout. Additionally, if port 5666 is not open on the remote host's firewall, you may receive a timeout error as well. Usually xinetd will open the ports automatically, as long as the /etc/xinetd.d/nrpe file is configured correctly, and NRPE's port settings have been added to /etc/services.
First, you should make sure that port 5666 is open on the remote host. The easiest way to do this, is to just run check_nrpefrom the remote host to itself. This will also double as a good way to check that NRPE is functioning as expected. Log into the remote host as root and execute:
/usr/local/nagios/libexec/check_nrpe -H localhost
You should get something similar to the following output:
NRPE v2.15
If not, make sure the that port 5666 is open on the remote host's firewall. If you are using xinetd go back to previous step (check the NRPE service status) as it should automatically open the port for you.
Checking Remote Host's Ports and Configuring iptables:
You may have to open port 5666 on your firewall, which in the case of most Linux distributions, is iptables. To get a listing of the current iptables rules, run the following on the remote host as root:
iptables -L
The expected output is similar to:
ACCEPT - tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:5666
OR
ACCEPT tcp – anywhere anywhere state NEW tcp dpt:nrpe
If the port is not open, you will have to add an iptables rule for it using the following commands:
iptables -I INPUT -p tcp --destination-port 5666 -j ACCEPT
service iptables save
Those commands were for TCP/IP v4. If you need TCP/IP v6 the commands are similar:
ip6tables -L
ip6tables -I INPUT -p tcp --destination-port 5666 -j ACCEPT
service ip6tables save
Checking Remote Host's Ports and Configuring firewalld:
Firewalld is present on Enterprise Linux 7 and higher. To get a listing of the current firewalld rules, run the following on the remote host as root:
firewall-cmd --list-all
The expected output is similar to:
ports: 5666/tcp
If the port is not open, you will have to add a firewalld rule for it using the following commands:
firewall-cmd --zone=public --add-port=5666/tcp
firewall-cmd --zone=public --add-port=5666/tcp --permanent
firewalld applies to both TCP/IP v4 and TCP/IP v6.
The following KB article provides details on the commands that each operating system uses to open firewall ports:
NRPE - How To Install NRPE v3 From Source
Checking Port 5666 From the Nagios xi Server with nmap:
You can use nmap (among other port scanners) to check the remote host's ports. If you do not have nmap installed, it can be installed using the following commands (with yum for RHEL/CentOS systems):
yum install -y nmap
Once installed, test the connection on port 5666 from the Nagios xi server to the remote host by logging in as root on your Nagios xi server and running the following command:
nmap <remote host ip> -Pn -p 5666
Replace your remote host server ip address above. The expected output should be similar to:
PORT STATE SERVICE
5666/tcp open nrpe
Final Thoughts
For any support related questions please visit the Nagios Support Forums at: