Consistently inaccurate notifications

rgage_hhsc · Post by **rgage_hhsc** » Fri Aug 13, 2021 2:30 pm

Our organization is using Nagios for monitoring several servers on several criteria -- one in particular has picked up a very confusing pattern.

The alert being triggered is a CPU Usage limit -- what happens is a nightly maintenance task that pumps CPU usage for 10-20 minutes.
Nagios' usage graph very accurately portrays the situation:

Cap1.JPG

What Nagios reports in its history is even more detailed, with six events every day in that spike-time:

Cap2.JPG

But I consistently get THE FIRST THREE notifications about this spike every day, and nothing else: WARNING, RECOVERY, WARNING, all within a few minutes of each other; then nothing until the next day when it happens again.

Judging solely by my emails from Nagios, there is a few seconds of recovery time each day amidst a CPU Warning event that has been happening for months. Looking at the graph up there, this obviously is a false picture.

This is not a critical worry, we know it's just one spike despite what Nagios' emails are telling us … but a bug's a bug, and Nagios can't be fixed unless it's reported. So, consider this reported.

Thanks!
boB