We're using passive services with freshness checks enabled (using the check_dummy command). Recently we've been getting alerts that services are out of date, and when I look at /usr/local/nagios/var/nagios.log, I see messages like the following:
[1341338993] Warning: The results of service 'Load' on host 'TEST-HOST' are stale by 0d 0h 0m 49s (threshold=0d 0h 10m 0s). I'm forcing an immediate check of the service.
It appears that the freshness check is being triggered even though the service status is only 49 seconds old.
I upgraded to the latest release of Nagios xi which is 2011R3.2, however that did not resolve the problem.
Freshess checks on Nagios xi 2011R3.2
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Freshess checks on Nagios xi 2011R3.2
can you post the configuration for this service.
Thanks.
Thanks.
-
- Posts: 96
- Joined: Sat Feb 11, 2012 2:16 pm
Re: Freshess checks on Nagios xi 2011R3.2
This is happening across services, the Load alert I mentioned was one example. The passive services are using the xiwizard_passive_service template, here is its config:
And here is the definition for xiwizard_generic_service:
Code: Select all
define service {
name xiwizard_passive_service
service_description Passive Service
use xiwizard_generic_service
check_command check_dummy!2!"Status is out of date."
initial_state o
max_check_attempts 1
check_interval 1
retry_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_period 24x7
check_freshness 0
freshness_threshold 600
flap_detection_enabled 0
notification_interval 60
first_notification_delay 10
notification_period 24x7
notification_options c
register 0
}
Code: Select all
define service {
name xiwizard_generic_service
check_command check_xi_service_none
is_volatile 0
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notifications_enabled 1
failure_prediction_enabled 1
register 0
}
-
- Posts: 15
- Joined: Wed Oct 13, 2010 2:31 am
Re: Freshess checks on Nagios xi 2011R3.2
It appears that you don't have freshness checking enabled.nagiosadmin42 wrote:Code: Select all
define service { check_freshness 0 ... }
This options takes precedence over global options in nagios.cfg.
Try setting check_freshness to 1 and see if that helps.
-
- Posts: 96
- Joined: Sat Feb 11, 2012 2:16 pm
Re: Freshess checks on Nagios xi 2011R3.2
Yeah, sorry about that. I had disabled freshness checks in that template because we're getting spammed due to this problem. Other passive services that override the template and specify their own freshness-check values are still experiencing the problem. Here's one example config, where I've replaced our production values with "test":
Code: Select all
define service {
host_name TEST-HOST
service_description Test Service
use xiwizard_passive_service
check_command check_dummy_test_command!!!!!!!!
max_check_attempts 1
check_interval 5
retry_interval 1
check_period 24x7
check_freshness 1
freshness_threshold 3600
notification_interval 60
notification_period 24x7
notification_options u,c,s
contact_groups Test Admins
stalking_options n
_xiwizard passiveobject
register 1
}
define command {
command_name check_dummy_test_command
command_line $USER1$/check_dummy 2 "Test message."
}
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Freshess checks on Nagios xi 2011R3.2
One thing to note, the freshness_threshold is in minutes, I know that there is a typo in the CCM that says sec next to freshness_threshold but it should be minutes.
so the shown item has a freshness_threshold of 3600 which would actually by 60 hours instead of the likely intended 1 hour.
so the shown item has a freshness_threshold of 3600 which would actually by 60 hours instead of the likely intended 1 hour.
-
- Posts: 96
- Joined: Sat Feb 11, 2012 2:16 pm
Re: Freshess checks on Nagios xi 2011R3.2
I think the configuration is correct, at least the log message looks like it's what we want. Here's a message I just saw in nagios.log on our production server (with the service name and host name changed to "test"). This service has the freshness threshold set to 3600, so "threshold=0d 1h 0m 0s" does match that if it's interpreted as seconds:
[1341517995] Warning: The results of service 'Test Service' on host 'TEST-HOST' are stale by 0d 0h 1m 0s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
-
- Posts: 15
- Joined: Wed Oct 13, 2010 2:31 am
Re: Freshess checks on Nagios xi 2011R3.2
This is an interesting behavior indeed...
nagiosadmin42, please enable debugging by setting the debug_verbosity to 16 (Host/service check information) in nagios.cfg.
Can you post the line(s) that start with "HBC: xx, PC: xx, ..."?
Scott, hmm, are you sure that the threshold is in minutes?
Based on the source code, freshness_threshold (if specified) will not be multiplied with the interval length in calculations (which would imply that it really is in minutes).
If not specified (i.e. auto threshold is being used), it will use check_interval/retry_interval with some additional latencies:
nagiosadmin42, please enable debugging by setting the debug_verbosity to 16 (Host/service check information) in nagios.cfg.
Can you post the line(s) that start with "HBC: xx, PC: xx, ..."?
Scott, hmm, are you sure that the threshold is in minutes?
Based on the source code, freshness_threshold (if specified) will not be multiplied with the interval length in calculations (which would imply that it really is in minutes).
If not specified (i.e. auto threshold is being used), it will use check_interval/retry_interval with some additional latencies:
Code: Select all
if(temp_service->freshness_threshold == 0) {
if(temp_service->state_type == HARD_STATE || temp_service->current_state == STATE_OK)
freshness_threshold = (temp_service->check_interval * interval_length) + temp_service->latency + additional_freshness_latency;
else
freshness_threshold = (temp_service->retry_interval * interval_length) + temp_service->latency + additional_freshness_latency;
}
else
freshness_threshold = temp_service->freshness_threshold
...
expiration_time = (time_t)(temp_service->last_check + freshness_threshold)
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: Freshess checks on Nagios xi 2011R3.2
By golly, I just tested this again with the latest version and it is seconds. I may have been remembering a bug from previous versions.lmilkovic wrote:Scott, hmm, are you sure that the threshold is in minutes?
Based on the source code, freshness_threshold (if specified) will not be multiplied with the interval length in calculations (which would imply that it really is in minutes).
In any event....as lmilkovic had mentioned
lmilkovic wrote:nagiosadmin42, please enable debugging by setting the debug_verbosity to 16 (Host/service check information) in nagios.cfg.
Can you post the line(s) that start with "HBC: xx, PC: xx, ..."?
-
- Posts: 96
- Joined: Sat Feb 11, 2012 2:16 pm
Re: Freshess checks on Nagios xi 2011R3.2
I've edited /usr/local/nagios/etc/nagios.cfg and set debug_verbosity=16, re-enabled "Check freshness" for the xiwizard_passive_service template, and have applied the configuration changes.
It's been about half an hour, and I've been monitoring /var/log/messages for the debug lines you mentioned, but so far see nothing related to them or the problem I described where the freshness check is being triggered even though the threshold hasn't been reached.
Very strange... I'll keep you posted if I find out anything new.
It's been about half an hour, and I've been monitoring /var/log/messages for the debug lines you mentioned, but so far see nothing related to them or the problem I described where the freshness check is being triggered even though the threshold hasn't been reached.
Very strange... I'll keep you posted if I find out anything new.