Freshess checks on Nagios xi 2011R3.2

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Freshess checks on Nagios xi 2011R3.2

Post by nagiosadmin42 »

We're using passive services with freshness checks enabled (using the check_dummy command). Recently we've been getting alerts that services are out of date, and when I look at /usr/local/nagios/var/nagios.log, I see messages like the following:

[1341338993] Warning: The results of service 'Load' on host 'TEST-HOST' are stale by 0d 0h 0m 49s (threshold=0d 0h 10m 0s). I'm forcing an immediate check of the service.

It appears that the freshness check is being triggered even though the service status is only 49 seconds old.

I upgraded to the latest release of Nagios xi which is 2011R3.2, however that did not resolve the problem.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Freshess checks on Nagios xi 2011R3.2

Post by scottwilkerson »

can you post the configuration for this service.

Thanks.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Freshess checks on Nagios xi 2011R3.2

Post by nagiosadmin42 »

This is happening across services, the Load alert I mentioned was one example. The passive services are using the xiwizard_passive_service template, here is its config:

Code: Select all

define service {
       name                                     xiwizard_passive_service
       service_description                      Passive Service
       use                                      xiwizard_generic_service
       check_command                            check_dummy!2!"Status is out of date."
       initial_state                            o
       max_check_attempts                       1
       check_interval                           1
       retry_interval                           1
       active_checks_enabled                    0
       passive_checks_enabled                   1
       check_period                             24x7
       check_freshness                          0
       freshness_threshold                      600
       flap_detection_enabled                   0
       notification_interval                    60
       first_notification_delay                 10
       notification_period                      24x7
       notification_options                     c
       register                                 0

}
And here is the definition for xiwizard_generic_service:

Code: Select all

define service {
       name                                     xiwizard_generic_service
       check_command                            check_xi_service_none
       is_volatile                              0
       max_check_attempts                       5
       check_interval                           5
       retry_interval                           1
       active_checks_enabled                    1
       passive_checks_enabled                   1
       parallelize_check                        1
       obsess_over_service                      1
       check_freshness                          0
       event_handler_enabled                    1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       notification_interval                    60
       notifications_enabled                    1
       failure_prediction_enabled               1
       register                                 0

}
lmilkovic
Posts: 15
Joined: Wed Oct 13, 2010 2:31 am

Re: Freshess checks on Nagios xi 2011R3.2

Post by lmilkovic »

nagiosadmin42 wrote:

Code: Select all

define service {
       check_freshness                          0
...

}
It appears that you don't have freshness checking enabled.
This options takes precedence over global options in nagios.cfg.

Try setting check_freshness to 1 and see if that helps.
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Freshess checks on Nagios xi 2011R3.2

Post by nagiosadmin42 »

Yeah, sorry about that. I had disabled freshness checks in that template because we're getting spammed due to this problem. Other passive services that override the template and specify their own freshness-check values are still experiencing the problem. Here's one example config, where I've replaced our production values with "test":

Code: Select all

define service {
        host_name                       TEST-HOST
        service_description             Test Service
        use                             xiwizard_passive_service
        check_command                   check_dummy_test_command!!!!!!!!
        max_check_attempts              1
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        check_freshness                 1
        freshness_threshold             3600
        notification_interval           60
        notification_period             24x7
        notification_options            u,c,s
        contact_groups                  Test Admins
        stalking_options                n
        _xiwizard                       passiveobject
        register                        1
        }

define command {
       command_name                             check_dummy_test_command
       command_line                             $USER1$/check_dummy 2 "Test message."
}
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Freshess checks on Nagios xi 2011R3.2

Post by scottwilkerson »

One thing to note, the freshness_threshold is in minutes, I know that there is a typo in the CCM that says sec next to freshness_threshold but it should be minutes.

so the shown item has a freshness_threshold of 3600 which would actually by 60 hours instead of the likely intended 1 hour.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Freshess checks on Nagios xi 2011R3.2

Post by nagiosadmin42 »

I think the configuration is correct, at least the log message looks like it's what we want. Here's a message I just saw in nagios.log on our production server (with the service name and host name changed to "test"). This service has the freshness threshold set to 3600, so "threshold=0d 1h 0m 0s" does match that if it's interpreted as seconds:
[1341517995] Warning: The results of service 'Test Service' on host 'TEST-HOST' are stale by 0d 0h 1m 0s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
lmilkovic
Posts: 15
Joined: Wed Oct 13, 2010 2:31 am

Re: Freshess checks on Nagios xi 2011R3.2

Post by lmilkovic »

This is an interesting behavior indeed...

nagiosadmin42, please enable debugging by setting the debug_verbosity to 16 (Host/service check information) in nagios.cfg.
Can you post the line(s) that start with "HBC: xx, PC: xx, ..."?

Scott, hmm, are you sure that the threshold is in minutes?
Based on the source code, freshness_threshold (if specified) will not be multiplied with the interval length in calculations (which would imply that it really is in minutes).
If not specified (i.e. auto threshold is being used), it will use check_interval/retry_interval with some additional latencies:

Code: Select all

if(temp_service->freshness_threshold == 0) {
		if(temp_service->state_type == HARD_STATE || temp_service->current_state == STATE_OK)
			freshness_threshold = (temp_service->check_interval * interval_length) + temp_service->latency + additional_freshness_latency;
		else
			freshness_threshold = (temp_service->retry_interval * interval_length) + temp_service->latency + additional_freshness_latency;
		}
	else
		freshness_threshold = temp_service->freshness_threshold
...
expiration_time = (time_t)(temp_service->last_check + freshness_threshold)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Freshess checks on Nagios xi 2011R3.2

Post by scottwilkerson »

lmilkovic wrote:Scott, hmm, are you sure that the threshold is in minutes?
Based on the source code, freshness_threshold (if specified) will not be multiplied with the interval length in calculations (which would imply that it really is in minutes).
By golly, I just tested this again with the latest version and it is seconds. I may have been remembering a bug from previous versions.

In any event....as lmilkovic had mentioned
lmilkovic wrote:nagiosadmin42, please enable debugging by setting the debug_verbosity to 16 (Host/service check information) in nagios.cfg.
Can you post the line(s) that start with "HBC: xx, PC: xx, ..."?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nagiosadmin42
Posts: 96
Joined: Sat Feb 11, 2012 2:16 pm

Re: Freshess checks on Nagios xi 2011R3.2

Post by nagiosadmin42 »

I've edited /usr/local/nagios/etc/nagios.cfg and set debug_verbosity=16, re-enabled "Check freshness" for the xiwizard_passive_service template, and have applied the configuration changes.

It's been about half an hour, and I've been monitoring /var/log/messages for the debug lines you mentioned, but so far see nothing related to them or the problem I described where the freshness check is being triggered even though the threshold hasn't been reached.

Very strange... I'll keep you posted if I find out anything new.