Hello
nagios - 5.7.3
centos 7
mod_gearman 3
i accidently stumbled on my nagios load graph today and it showed over 2 hours of doing nothing. When i started to look into it, i discovered mod_gearman proccess dieing and then starting again in 2hours.
From nagios event log:
Information 2020-11-12 15:14:42 Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
Information 2020-11-12 15:14:42 mod_gearman: initialized version 3.0.7 (libgearman 0.33)
Information 2020-11-12 15:14:42 Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Process Information 2020-11-12 13:01:33 Caught SIGSEGV, shutting down...
Information 2020-11-12 12:59:50 Successfully launched command file worker with pid 68617
Runtime Warning 2020-11-12 12:59:48 WARNING: RLIMIT_NPROC is 127602, total max estimated processes is 172158! You should increase your limits (ulimit -u, or limits.conf)
Information 2020-11-12 12:59:50 Successfully launched command file worker with pid 68617
Information 2020-11-12 12:59:47 Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
Information 2020-11-12 12:59:47 mod_gearman: initialized version 3.0.7 (libgearman 0.33)
Information 2020-11-12 12:59:47 Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Process Information 2020-11-12 12:59:41 Successfully shutdown... (PID=92225)
Process Information 2020-11-12 12:59:41 Caught SIGTERM, shutting down...
Seems the SIGTERMS are happening regulary and after that process starts right up, but after SIGSEGV it stopped for 2h.
Any suggestions?
Mod gearman process died
-
- Madmin
- Posts: 9190
- Joined: Thu Oct 30, 2014 9:02 am
Re: Mod gearman process died
Can you restart the nagios process and let it run for about 10 minutes.
Then get the /usr/local/nagios/var/nagios.log file and add it to the post.
Also, I will need you to run the following commands as root and post the output to the ticket.
One thing I do see in the data you provided is that the time looks like it went backwards an hour and 45 minutes. Make sure the system's time is stable and that it is getting updated with a stable time source.
Time changes could be causing the issue you are having.
Then get the /usr/local/nagios/var/nagios.log file and add it to the post.
Also, I will need you to run the following commands as root and post the output to the ticket.
Code: Select all
ps -ef --cols=300
yum list installed |grep gear
Time changes could be causing the issue you are having.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 146
- Joined: Thu Feb 16, 2017 3:45 am
Re: Mod gearman process died
The time is correct, it is synced from 2 different ntp servers. The logs have been taken from the Nagios Home view, event logs. It jumps because the process died and nothing happened till it restarted itself.
Luckily the problem havent occured anymore and i have scheduled a centos upgrade, 7.9 came out recently and nagiosxi upgrade to the latest.
Hopefully those solve the issue and it will not occur again.
Luckily the problem havent occured anymore and i have scheduled a centos upgrade, 7.9 came out recently and nagiosxi upgrade to the latest.
Hopefully those solve the issue and it will not occur again.
-
- Madmin
- Posts: 9190
- Joined: Thu Oct 30, 2014 9:02 am
Re: Mod gearman process died
Thanks for the update. If you have any further questions, post them here and we'll get back to you.
Be sure to check out our Knowledgebase for helpful articles and solutions!