Nagios performance trouble

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 7:54 pm

So I've been on RAM disk for all but perf data and things were great all day, but now load is up. vmstat doesn't show any blocking processes, but I see this in /var/log/mysqld.log.

Code: Select all

120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed

mguthrie · Post by **mguthrie** » Thu Sep 20, 2012 9:24 am

Try running our repair procedure on the database:
http://assets.nagios.com/downloads/nagi ... tabase.pdf

Also, lets make sure postgres is ok well:

Code: Select all

psql nagiosxi nagiosxi
vacuum;
vacuum analyze;
vaccum full;
\q

psql postgres postgres
vacuum;
vacuum analyze;
vaccum full;
\q

hhlodge · Post by **hhlodge** » Thu Sep 20, 2012 3:04 pm

This morning I noticed I had a failed drive on the my /usr/local RAID set. I checked the RAID yesterday via remote ILO console when I rebooted and it reported okay, but maybe the drive has been failing and causing this and finally let go. I've replaced the drive and ran the suggested commands and things look good all around now. I'll follow up if things go south again. Thanks for all the help.

slansing · Post by **slansing** » Thu Sep 20, 2012 3:31 pm

Good to hear you found the bad apple. Hopefully that's all it was.

hhlodge · Post by **hhlodge** » Mon Sep 24, 2012 1:10 pm

All good after 3 days.

hhlodge · Post by **hhlodge** » Thu Nov 01, 2012 8:20 am

I continue with load issues. Last night I hit a load average on 19 and continue to see blocked processes. I have run the Postgres vacuum procedure a couple times but it seems for naught. I'm at a loss what to do. I wanted to move this from physical to a VM but I don't dare with this kind of performance issue. Any thoughts as to whether upgrading to from 2011R3.2 to 2012R1.1 might put things to a correct state?

mguthrie · Post by **mguthrie** » Thu Nov 01, 2012 9:17 am

With quite a while between this post and the previous issue, lets start from the top on this. What do you have showing as the top CPU consuming processes when running:

Code: Select all

top

Check /var/log/mysqld.log and make sure there aren't any corrupted tables.

hhlodge · Post by **hhlodge** » Thu Nov 01, 2012 9:29 am

mysqld for the most part with httpd and php coming in behind it.

mguthrie · Post by **mguthrie** » Thu Nov 01, 2012 4:27 pm

I would recommend restarting apache and then also running the mysql DB repair procedure.

Are performance graphs updating ok?

Do you see any red dots from the Admin page on the subsystem components?

hhlodge · Post by **hhlodge** » Fri Nov 02, 2012 8:24 am

I stopped/started httpd and then did the repair but it could not start mysqld after.

Code: Select all

 recovering (with sort) MyISAM-table 'nagios_timeperiod_timeranges.MYI'
Data records: 166
- Fixing index 1
- Fixing index 2
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]

There were no errors in the repair process before this. So I tried the suggested next step, but that wasn't happening.

Code: Select all

[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[root@psm-itmon ~]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@psm-itmon ~]# tail -20 /var/log/mysqld.log 
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
[root@psm-itmon ~]# ps -ef | grep mys
root     17258     1  0 08:26 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql    17308 17258  0 08:26 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root     18000     1  0 08:27 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql    18050 18000  0 08:27 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root     18668 23518  0 08:28 pts/0    00:00:00 grep mys

So I rebooted and I am immediately getting blocked processes and increasingly high load and the web interface is dreadfully slow. I never see red dots in the admin page. I am also getting a lot of WMI checks timing out now. graphs don't seem to have any gaps in data.

Nagios Support Forum

Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble