Current Load on Localhost

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Current Load on Localhost

Post by lanxessinfy »

Hi Team,

currently we are monitoring around 300 servers and 3500 services using Nagiosxi , nagiosxi server has 4 vCPU, 32 GB RAM.
We are getting critical alerts on current load of Localhost on every Monday and it is back to Ok on Tuesday
below is the checck command.

define service {
host_name localhost
service_description Current Load
use local-service
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
contacts SN-SCRITICAL-LOCALHOST-Service
register 1
}

Attaching the SS of performance graph.
Current_Load.PNG
Please suggest a suitable solution.
Thanks
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Current Load on Localhost

Post by pbroste »

Hello @lanxessinfy

Thanks for reaching out.

This issue appears to be intermittent, and the option to set up a triggered 'Eventhandler' to catch the offending process(es) would be the best way to approach it.

Here are the two support articles to instruct on the setup:

https://assets.nagios.com/downloads/nagiosxi/docs/Introduction-To-Event-Handlers-in-Nagios-XI.pdf
https://assets.nagios.com/downloads/nagiosxi/docs/Configuring-Global-Event-Handlers-In-Nagios-XI.pdf

I would say that eventhandler script when triggered by the alert, should write the following to a log for review when alerts are triggered. This will provide an overview of what is happening during the extra load.
  • top -b -n 1 > top.txt
  • tail -n 10 /usr/local/nagiosxi/var/eventman.log
  • tail -n 10 /usr/local/nagios/var/nagios.log
  • tail -n 10 /var/log/syslog or /var/log/messages
Thanks,
Perry
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Current Load on Localhost

Post by lanxessinfy »

Hi @perry,

Thanks for the response.

We have set up event handler for current load service of nagios server.
As per the event handler script provided by you, it has to restart httpd service when the service state is critical but in our case it did not restart.

when the cpu load is critical this is the output.
top -b -n 1 > top.txt

[**************** ~]$ top -b -n 1 > top.txt
[*****************~]$ cat top.txt
top - 14:15:30 up 10 days, 8:05, 2 users, load average: 8.62, 8.63, 7.02
Tasks: 349 total, 11 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 83.8 us, 5.9 sy, 0.0 ni, 8.8 id, 0.0 wa, 0.0 hi, 1.5 si, 0.0 st
KiB Mem : 32947892 total, 1267736 free, 11574456 used, 20105700 buff/cache
KiB Swap: 2097148 total, 1717324 free, 379824 used. 19939124 avail Mem


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12155 apache 20 0 4275160 3.000g 15832 R 93.8 9.5 22:35.79 wkhtmltopdf
30319 apache 20 0 3988992 2.851g 15784 R 93.8 9.1 32:03.40 wkhtmltopdf
30207 apache 20 0 3835948 2.623g 16284 R 87.5 8.3 16:25.83 wkhtmltopdf
6880 mysql 20 0 2652616 491336 10252 S 31.2 1.5 1:56.99 mysqld
1793 root 20 0 546936 32916 2996 S 12.5 0.1 35:28.19 python
732 root 20 0 0 0 0 S 6.2 0.0 52:01.41 kcs-evbsync/3
8331 apache 20 0 761776 42024 14424 R 6.2 0.1 0:05.68 httpd
9991 apache 20 0 756360 35996 13852 R 6.2 0.1 0:04.82 httpd
10813 apache 20 0 758060 38000 14168 R 6.2 0.1 0:03.16 httpd
13509 apache 20 0 752576 32376 14048 S 6.2 0.1 0:03.66 httpd
16037 apache 20 0 751292 30380 13808 S 6.2 0.1 0:00.18 httpd


can you please suggest a way to check if the httpd service is restarted or not ? and please give brief on the logs like eventman.log, /var/log/syslog or /var/log/messages.


Thanks!
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Current Load on Localhost

Post by pbroste »

Hello @lanxessinfy

Appears that the top three are busy converting to pdf. Question; how many reports are you converting to pdf? Depending on what we are converting this can be process-intensive.

You can check for httpd service restart by viewing system logs. The system messages include the messages that are logged during system startup, mail, cron, services, kern, and auth,

Code: Select all

grep -Ei 'Starting The Apache HTTP Server' /var/log/messages
or depending on distro:

Code: Select all

grep -Ei 'Starting The Apache HTTP Server' /var/log/syslog
The eventman.log show real-time handler events, which is located:

Code: Select all

tail -F /usr/local/nagiosxi/var/eventman.log
Thanks,
Perry
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Current Load on Localhost

Post by lanxessinfy »

Hi,

we are downloading/converting only 1 report.
whenever we are trying to download report the current load is increasing.
Today early morning current load went into critical but we didn't generate any reports.


currently we are using below script to log service info into hostinfo.txt file and restarting httpd service , the scripting is running fine but we want the script should run when the current load service state is critical.

script:

#!/bin/bash

SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac

In the above script we wrote service sate as "CRITICAL" but it is not working as per the logic.

can you please modify the existing script or provide us script which meets our requirement.


Thanks.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Current Load on Localhost

Post by pbroste »

Hello @lanxessinfy

Thanks for following up, the option to restart the service would be to create an eventhandler to restart the 'httpd' service on critical alert on (for example) check_load plugin.

Here is an example on restart service script:

Code: Select all

 /usr/local/nagios/libexec/service_restart.sh
Paste the following into the terminal session:
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;

#Restarting Linux Services With NCPA
CRITICAL)
/usr/local/nagios/libexec/check_ncpa.py -H "$2" -P 5693 -t "$3" -M 'plugins/service_restart.sh' -a "$4"
;;
esac
exit 0
Please review and let us know if you have further questions,
Perry
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Current Load on Localhost

Post by lanxessinfy »

Hi,

I did exactly what's in the document and run a passive check but could able to restart the httpd service.

I tun this cmd " grep -Ei 'Starting The Apache HTTP Server' /var/log/messages " but no new records.

Please find the SS.

Thanks!
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Current Load on Localhost

Post by pbroste »

Hello @lanxessinfy

Thanks for following up, want to take a look at the System Profile from your environment.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via Private Message
Thanks,
Perry
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Current Load on Localhost

Post by lanxessinfy »

Hi,

I have sent the profile to you.

Thanks!
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Current Load on Localhost

Post by pbroste »

Hello @@lanxessinfy

Following up with my test results; I went ahead and imported your System Profile on my test VM.

In the command for the 'Service Restart - Linux' set up with:
Command Name *
Service Restart - Linux
Command Line *
/home/nagios/service_restart.sh $SERVICESTATE$
Set permissions on the script to look like this:
  • ls -la /home/nagios
-rwxr-xr-x 1 nagios nagios 334 Oct 14 11:02 service_restart.sh
The script that I used to test:
#!/bin/bash

SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
date >> /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
Logs to verify 'Event Handler' executed on alert:

*

Code: Select all

cat /usr/local/nagios/var/nagios.log | grep -Ei 'current load' -A 5 -B 1
*Note; enable 'log_event_handlers=1' in /usr/local/nagios/etc/nagios.cfg to output logging results for Event Handlers.

My results:
[1634229490] SERVICE EVENT HANDLER: localhost;Current Load;CRITICAL;HARD;1;Service Restart - Linux
May need to address permissions on the '/tmp/hostinfo.txt' to (chown nagios:nagios /tmp/hostinfo.txt)

Thanks,
Perry