Overview
This guide on Best Practices is about host and service intervals and what you should take into consideration when designing your monitoring deployment.
Check Intervals - Be Realistic
It can be very easy to setup your monitoring with the same intervals across the board. This can lead to peaks and troughs in load on the xi server as a lot of checks can occur in the same time windows.
Have a think about what you are monitoring and how often do you really need to check it. Something like disk usage rarely runs out quickly, you can monitor this every hour and be confident you’ll be notified about the free disk space running low in a reasonable time.
-
Does it need to be checked every 5 minutes?
-
Disk Free Space – every 60 minutes perhaps?
-
Too long = no performance data
-
An interval that is more than four hours apart
-
However if you are going to make it every hour, why not every 58 minutes or 61 minutes? Try to spread the load out a bit.
-
Different intervals to spread the load
-
3, 5, 7 minute intervals
-
58, 60, 62 minute intervals
Notification & Check Intervals
Sometimes larger check intervals can have an adverse affect on notification intervals.
The monitoring engine determines if it should send a notification every time a check result is received.
Due to how the internal scheduling works, you might fall short of the notification window by a small time period like 20 seconds. This means it might be another 15 minutes until the next check is run, that’s when the notification will be sent.
-
e.g. 15 minute check and 60 minute notification
-
Internal scheduling may cause 14min 55sec to pass, 4 x 14:55 = 59min 40sec … it’s < 60min!
-
Notification not sent until 75min!
-
Scheduling is geared +/- to reduce load!
Final Thoughts
For any support related questions please visit the Nagios Support Forums at: