Hi Team,
currently we are monitoring around 300 servers and 3500 services using Nagiosxi , nagiosxi server has 4 vCPU, 32 GB RAM.
We are getting critical alerts on current load of Localhost on every Monday and it is back to Ok on Tuesday
below is the checck command.
define service {
host_name localhost
service_description Current Load
use local-service
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
contacts SN-SCRITICAL-LOCALHOST-Service
register 1
}
Attaching the SS of performance graph.
Please suggest a suitable solution.
Thanks
Current Load on Localhost
-
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Current Load on Localhost
You do not have the required permissions to view the files attached to this post.
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Current Load on Localhost
Hello @lanxessinfy
Thanks for reaching out.
This issue appears to be intermittent, and the option to set up a triggered 'Eventhandler' to catch the offending process(es) would be the best way to approach it.
Here are the two support articles to instruct on the setup:
https://assets.nagios.com/downloads/nagiosxi/docs/Introduction-To-Event-Handlers-in-Nagios-XI.pdf
https://assets.nagios.com/downloads/nagiosxi/docs/Configuring-Global-Event-Handlers-In-Nagios-XI.pdf
I would say that eventhandler script when triggered by the alert, should write the following to a log for review when alerts are triggered. This will provide an overview of what is happening during the extra load.
Perry
Thanks for reaching out.
This issue appears to be intermittent, and the option to set up a triggered 'Eventhandler' to catch the offending process(es) would be the best way to approach it.
Here are the two support articles to instruct on the setup:
https://assets.nagios.com/downloads/nagiosxi/docs/Introduction-To-Event-Handlers-in-Nagios-XI.pdf
https://assets.nagios.com/downloads/nagiosxi/docs/Configuring-Global-Event-Handlers-In-Nagios-XI.pdf
I would say that eventhandler script when triggered by the alert, should write the following to a log for review when alerts are triggered. This will provide an overview of what is happening during the extra load.
- top -b -n 1 > top.txt
- tail -n 10 /usr/local/nagiosxi/var/eventman.log
- tail -n 10 /usr/local/nagios/var/nagios.log
- tail -n 10 /var/log/syslog or /var/log/messages
Perry
-
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Current Load on Localhost
Hi @perry,
Thanks for the response.
We have set up event handler for current load service of nagios server.
As per the event handler script provided by you, it has to restart httpd service when the service state is critical but in our case it did not restart.
when the cpu load is critical this is the output.
top -b -n 1 > top.txt
[**************** ~]$ top -b -n 1 > top.txt
[*****************~]$ cat top.txt
top - 14:15:30 up 10 days, 8:05, 2 users, load average: 8.62, 8.63, 7.02
Tasks: 349 total, 11 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 83.8 us, 5.9 sy, 0.0 ni, 8.8 id, 0.0 wa, 0.0 hi, 1.5 si, 0.0 st
KiB Mem : 32947892 total, 1267736 free, 11574456 used, 20105700 buff/cache
KiB Swap: 2097148 total, 1717324 free, 379824 used. 19939124 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12155 apache 20 0 4275160 3.000g 15832 R 93.8 9.5 22:35.79 wkhtmltopdf
30319 apache 20 0 3988992 2.851g 15784 R 93.8 9.1 32:03.40 wkhtmltopdf
30207 apache 20 0 3835948 2.623g 16284 R 87.5 8.3 16:25.83 wkhtmltopdf
6880 mysql 20 0 2652616 491336 10252 S 31.2 1.5 1:56.99 mysqld
1793 root 20 0 546936 32916 2996 S 12.5 0.1 35:28.19 python
732 root 20 0 0 0 0 S 6.2 0.0 52:01.41 kcs-evbsync/3
8331 apache 20 0 761776 42024 14424 R 6.2 0.1 0:05.68 httpd
9991 apache 20 0 756360 35996 13852 R 6.2 0.1 0:04.82 httpd
10813 apache 20 0 758060 38000 14168 R 6.2 0.1 0:03.16 httpd
13509 apache 20 0 752576 32376 14048 S 6.2 0.1 0:03.66 httpd
16037 apache 20 0 751292 30380 13808 S 6.2 0.1 0:00.18 httpd
can you please suggest a way to check if the httpd service is restarted or not ? and please give brief on the logs like eventman.log, /var/log/syslog or /var/log/messages.
Thanks!
Thanks for the response.
We have set up event handler for current load service of nagios server.
As per the event handler script provided by you, it has to restart httpd service when the service state is critical but in our case it did not restart.
when the cpu load is critical this is the output.
top -b -n 1 > top.txt
[**************** ~]$ top -b -n 1 > top.txt
[*****************~]$ cat top.txt
top - 14:15:30 up 10 days, 8:05, 2 users, load average: 8.62, 8.63, 7.02
Tasks: 349 total, 11 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 83.8 us, 5.9 sy, 0.0 ni, 8.8 id, 0.0 wa, 0.0 hi, 1.5 si, 0.0 st
KiB Mem : 32947892 total, 1267736 free, 11574456 used, 20105700 buff/cache
KiB Swap: 2097148 total, 1717324 free, 379824 used. 19939124 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12155 apache 20 0 4275160 3.000g 15832 R 93.8 9.5 22:35.79 wkhtmltopdf
30319 apache 20 0 3988992 2.851g 15784 R 93.8 9.1 32:03.40 wkhtmltopdf
30207 apache 20 0 3835948 2.623g 16284 R 87.5 8.3 16:25.83 wkhtmltopdf
6880 mysql 20 0 2652616 491336 10252 S 31.2 1.5 1:56.99 mysqld
1793 root 20 0 546936 32916 2996 S 12.5 0.1 35:28.19 python
732 root 20 0 0 0 0 S 6.2 0.0 52:01.41 kcs-evbsync/3
8331 apache 20 0 761776 42024 14424 R 6.2 0.1 0:05.68 httpd
9991 apache 20 0 756360 35996 13852 R 6.2 0.1 0:04.82 httpd
10813 apache 20 0 758060 38000 14168 R 6.2 0.1 0:03.16 httpd
13509 apache 20 0 752576 32376 14048 S 6.2 0.1 0:03.66 httpd
16037 apache 20 0 751292 30380 13808 S 6.2 0.1 0:00.18 httpd
can you please suggest a way to check if the httpd service is restarted or not ? and please give brief on the logs like eventman.log, /var/log/syslog or /var/log/messages.
Thanks!
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Current Load on Localhost
Hello @lanxessinfy
Appears that the top three are busy converting to pdf. Question; how many reports are you converting to pdf? Depending on what we are converting this can be process-intensive.
You can check for httpd service restart by viewing system logs. The system messages include the messages that are logged during system startup, mail, cron, services, kern, and auth,
or depending on distro:
The eventman.log show real-time handler events, which is located:
Thanks,
Perry
Appears that the top three are busy converting to pdf. Question; how many reports are you converting to pdf? Depending on what we are converting this can be process-intensive.
You can check for httpd service restart by viewing system logs. The system messages include the messages that are logged during system startup, mail, cron, services, kern, and auth,
Code: Select all
grep -Ei 'Starting The Apache HTTP Server' /var/log/messages
Code: Select all
grep -Ei 'Starting The Apache HTTP Server' /var/log/syslog
Code: Select all
tail -F /usr/local/nagiosxi/var/eventman.log
Perry
-
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Current Load on Localhost
Hi,
we are downloading/converting only 1 report.
whenever we are trying to download report the current load is increasing.
Today early morning current load went into critical but we didn't generate any reports.
currently we are using below script to log service info into hostinfo.txt file and restarting httpd service , the scripting is running fine but we want the script should run when the current load service state is critical.
script:
#!/bin/bash
SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
In the above script we wrote service sate as "CRITICAL" but it is not working as per the logic.
can you please modify the existing script or provide us script which meets our requirement.
Thanks.
we are downloading/converting only 1 report.
whenever we are trying to download report the current load is increasing.
Today early morning current load went into critical but we didn't generate any reports.
currently we are using below script to log service info into hostinfo.txt file and restarting httpd service , the scripting is running fine but we want the script should run when the current load service state is critical.
script:
#!/bin/bash
SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
In the above script we wrote service sate as "CRITICAL" but it is not working as per the logic.
can you please modify the existing script or provide us script which meets our requirement.
Thanks.
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Current Load on Localhost
Hello @lanxessinfy
Thanks for following up, the option to restart the service would be to create an eventhandler to restart the 'httpd' service on critical alert on (for example) check_load plugin.
Here is an example on restart service script:
Please review and let us know if you have further questions,
Perry
Thanks for following up, the option to restart the service would be to create an eventhandler to restart the 'httpd' service on critical alert on (for example) check_load plugin.
Here is an example on restart service script:
Code: Select all
/usr/local/nagios/libexec/service_restart.sh
Paste the following into the terminal session:
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
#Restarting Linux Services With NCPA
CRITICAL)
/usr/local/nagios/libexec/check_ncpa.py -H "$2" -P 5693 -t "$3" -M 'plugins/service_restart.sh' -a "$4"
;;
esac
exit 0
Perry
-
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Current Load on Localhost
Hi,
I did exactly what's in the document and run a passive check but could able to restart the httpd service.
I tun this cmd " grep -Ei 'Starting The Apache HTTP Server' /var/log/messages " but no new records.
Please find the SS.
Thanks!
I did exactly what's in the document and run a passive check but could able to restart the httpd service.
I tun this cmd " grep -Ei 'Starting The Apache HTTP Server' /var/log/messages " but no new records.
Please find the SS.
Thanks!
You do not have the required permissions to view the files attached to this post.
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Current Load on Localhost
Hello @lanxessinfy
Thanks for following up, want to take a look at the System Profile from your environment.
To send us your system profile.
Perry
Thanks for following up, want to take a look at the System Profile from your environment.
To send us your system profile.
- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and send via Private Message
Perry
-
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Current Load on Localhost
Hi,
I have sent the profile to you.
Thanks!
I have sent the profile to you.
Thanks!
-
- Posts: 1288
- Joined: Tue Jun 01, 2021 1:27 pm
Re: Current Load on Localhost
Hello @@lanxessinfy
Following up with my test results; I went ahead and imported your System Profile on my test VM.
In the command for the 'Service Restart - Linux' set up with:
*
*Note; enable 'log_event_handlers=1' in /usr/local/nagios/etc/nagios.cfg to output logging results for Event Handlers.
My results:
Thanks,
Perry
Following up with my test results; I went ahead and imported your System Profile on my test VM.
In the command for the 'Service Restart - Linux' set up with:
Set permissions on the script to look like this:Command Name *
Service Restart - Linux
Command Line *
/home/nagios/service_restart.sh $SERVICESTATE$
- ls -la /home/nagios
The script that I used to test:-rwxr-xr-x 1 nagios nagios 334 Oct 14 11:02 service_restart.sh
Logs to verify 'Event Handler' executed on alert:#!/bin/bash
SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
date >> /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
*
Code: Select all
cat /usr/local/nagios/var/nagios.log | grep -Ei 'current load' -A 5 -B 1
My results:
May need to address permissions on the '/tmp/hostinfo.txt' to (chown nagios:nagios /tmp/hostinfo.txt)[1634229490] SERVICE EVENT HANDLER: localhost;Current Load;CRITICAL;HARD;1;Service Restart - Linux
Thanks,
Perry