hi,
Our system so large and it is real time system, have to online 24/7 --> so we need keep monitor every 1 minute for real time alert.
i believe real time alert is a highlight feature of Nagios, so it must have a good support for this feature.
In our system, normally avg of cpus about 30-40%, sometimes in rush hours, it may took more than 70%. And we need to receive alert immediate when it come to high performance.
"CPU changes so fast that it is very hard to get the exact number." --> i know it and i was did testing parallel the commands.
it can have small different but in our case, very much different. So need you support our case.
So what can i do now?
check_ncpa get wrong alert from CPUs
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: check_ncpa get wrong alert from CPUs
Hi sacom01,
How are you doing? ...
I would suggested that you set 'aggregate=avg' since your system is so large with many CPU's ....
Best Regards,
Vinh
How are you doing? ...

I would suggested that you set 'aggregate=avg' since your system is so large with many CPU's ....

Best Regards,
Vinh
-
- Posts: 194
- Joined: Wed Dec 23, 2020 10:15 pm
Re: check_ncpa get wrong alert from CPUs
hi Vinh,
as i told, first, we use avg for get average from all cpus, but it got wrong number, so i tried max but it's not exact what we need.
what can i do for "avg" get exact number from our system? now it show too different with actua number. 60% vs 90%.
as i told, first, we use avg for get average from all cpus, but it got wrong number, so i tried max but it's not exact what we need.
what can i do for "avg" get exact number from our system? now it show too different with actua number. 60% vs 90%.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: check_ncpa get wrong alert from CPUs
Hi,
How are you doing?
CPU spikes up and down very fast. I would suggest changing:
Check interval = 5 minutes
Retry-interval = 1 minutes
Max check attempt = 5
What that will do is check every five minutes. Once an issue is identified, then check every (1) minute for 5 times before sending out notification.
In your case, those remote server are so large that many CPUs are not being used (0% percent).
What did you get when you used the curl command to get the average, then divide by number of cpu cores?
Does that match what you see in Nagios's check_ncpa.py outputs?
By the way, how many CPU do you have on that remote machine?
Can you please take a screenshot of the performance graph of that one remote machine, which you said has issue?
Best Regards,
Vinh
How are you doing?
CPU spikes up and down very fast. I would suggest changing:
Check interval = 5 minutes
Retry-interval = 1 minutes
Max check attempt = 5
What that will do is check every five minutes. Once an issue is identified, then check every (1) minute for 5 times before sending out notification.
In your case, those remote server are so large that many CPUs are not being used (0% percent).
What did you get when you used the curl command to get the average, then divide by number of cpu cores?
Does that match what you see in Nagios's check_ncpa.py outputs?
By the way, how many CPU do you have on that remote machine?
Can you please take a screenshot of the performance graph of that one remote machine, which you said has issue?
Best Regards,
Vinh
-
- Posts: 194
- Joined: Wed Dec 23, 2020 10:15 pm
Re: check_ncpa get wrong alert from CPUs
hi Vinh,
In your case, those remote server are so large that many CPUs are not being used (0% percent).
--> actually, this's not relate to our problem. (i told you about this few days ago)
What did you get when you used the curl command to get the average, then divide by number of cpu cores?
--> yes, i run the command for get total and devide by number of cpu
Does that match what you see in Nagios's check_ncpa.py outputs?
--> the avg number match with ncpa check, but too different with TOP and TOPAS command when i check on remote machine.
By the way, how many CPU do you have on that remote machine?
--> 216 CPUs
Can you please take a screenshot of the performance graph of that one remote machine, which you said has issue?
--> i replicated issue like :
1. write a script run check CPU with command "sar 1 1" in client server and set crontab for run command
2. write a script run check CPU with ncpa for client server from nagios xi, andd set crontab for run ncpa check
This two crontab run at the same time on 2 servers.
Pls find attach file for details.
In your case, those remote server are so large that many CPUs are not being used (0% percent).
--> actually, this's not relate to our problem. (i told you about this few days ago)
What did you get when you used the curl command to get the average, then divide by number of cpu cores?
--> yes, i run the command for get total and devide by number of cpu
Does that match what you see in Nagios's check_ncpa.py outputs?
--> the avg number match with ncpa check, but too different with TOP and TOPAS command when i check on remote machine.
By the way, how many CPU do you have on that remote machine?
--> 216 CPUs
Can you please take a screenshot of the performance graph of that one remote machine, which you said has issue?
--> i replicated issue like :
1. write a script run check CPU with command "sar 1 1" in client server and set crontab for run command
2. write a script run check CPU with ncpa for client server from nagios xi, andd set crontab for run ncpa check
This two crontab run at the same time on 2 servers.
Pls find attach file for details.
You do not have the required permissions to view the files attached to this post.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: check_ncpa get wrong alert from CPUs
Hi sacom01 (Hang),
Hope you are having a good day!! ...
Can you please share the "check_ncpa.py" command used on one of your NCPA remote VM?
I'm not sure why you get "CRITICAL: Percent was 42.99 %" when your system is only at "42.99%".
Also, please run the "top" command on your NCPA remote VM and share that at well, screenshot would be nice since it is easier to see ...
Here's an example of my "top" command: As you can see from the picture above (two red circles) .... which list "CPU%" and "Load average".
Those are very important info since they will tell us how busy the system and the respond time.
Best Regards,
Vinh
Hope you are having a good day!! ...

Can you please share the "check_ncpa.py" command used on one of your NCPA remote VM?
I'm not sure why you get "CRITICAL: Percent was 42.99 %" when your system is only at "42.99%".
Also, please run the "top" command on your NCPA remote VM and share that at well, screenshot would be nice since it is easier to see ...

Here's an example of my "top" command: As you can see from the picture above (two red circles) .... which list "CPU%" and "Load average".
Those are very important info since they will tell us how busy the system and the respond time.
Best Regards,
Vinh
You do not have the required permissions to view the files attached to this post.
-
- Posts: 194
- Joined: Wed Dec 23, 2020 10:15 pm
Re: check_ncpa get wrong alert from CPUs
hi Vinh,
Can you please share the "check_ncpa.py" command used on one of your NCPA remote VM?
--> ./check_ncpa.py -H 192.168.xxx.x -t token -P 5693 -M cpu/percent -w '20' -c '40' -q 'aggregate=avg'
I'm not sure why you get "CRITICAL: Percent was 42.99 %" when your system is only at "42.99%".
--> just for testing purpose, not important.
Also, please run the "top" command on your NCPA remote VM and share that at well, screenshot would be nice since it is easier to see
--> I know TOP command, but actually, TOP and SAR is for the same purpose, check cpu. So it will show same result. (tested already).
thanks.
Can you please share the "check_ncpa.py" command used on one of your NCPA remote VM?
--> ./check_ncpa.py -H 192.168.xxx.x -t token -P 5693 -M cpu/percent -w '20' -c '40' -q 'aggregate=avg'
I'm not sure why you get "CRITICAL: Percent was 42.99 %" when your system is only at "42.99%".
--> just for testing purpose, not important.
Also, please run the "top" command on your NCPA remote VM and share that at well, screenshot would be nice since it is easier to see
--> I know TOP command, but actually, TOP and SAR is for the same purpose, check cpu. So it will show same result. (tested already).
thanks.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: check_ncpa get wrong alert from CPUs
Hi,
Ok, now I understand ...
Your setting of warning and critical at "-w '20' -c '40' ", which only for testing purpose.
Best Regards,
Vinh
Ok, now I understand ...

Your setting of warning and critical at "-w '20' -c '40' ", which only for testing purpose.
Best Regards,
Vinh
-
- Posts: 194
- Joined: Wed Dec 23, 2020 10:15 pm
Re: check_ncpa get wrong alert from CPUs
you understood, then....what's next?
my issue is not resolved yet
number of -w and -c for test purpose but ncpa got wrong alert is a real case. We need you forcus to this.
my issue is not resolved yet

number of -w and -c for test purpose but ncpa got wrong alert is a real case. We need you forcus to this.
-
- Posts: 903
- Joined: Tue Oct 27, 2020 1:35 pm
Re: check_ncpa get wrong alert from CPUs
Hi,
How are you doing?
I talked to my team member and he suggested that you try this on your XI command prompt:
Since we do not have any AIX machine internally, there is no way for me to test this.
If this does not work, I would suggest that you write your own script using either "top" or "sar" or "vmstat" and put that under:
Then call your script as:
You could also check out Nagios Exchange page and see if there is any modules or plugins that would fit your needs.
https://exchange.nagios.org/
Here's the one I found on Nagios Exchange:
https://exchange.nagios.org/directory/P ... IX/details
Best Regards,
Vinh
How are you doing?
I talked to my team member and he suggested that you try this on your XI command prompt:
Code: Select all
cd /usr/local/nagios/libexec
./check_ncpa.py -H 192.168.xxx.x -t token -P 5693 -M cpu/percent -w '20' -c '40' -q 'aggregate=avg&sleep=5'
If this does not work, I would suggest that you write your own script using either "top" or "sar" or "vmstat" and put that under:
Code: Select all
/usr/local/ncpa/plugins
Code: Select all
./check_ncpa.py -H 192.168.xxx.x -t token -P 5693 -M 'plugins/yourNewScript'
https://exchange.nagios.org/
Here's the one I found on Nagios Exchange:
https://exchange.nagios.org/directory/P ... IX/details
Best Regards,
Vinh