So I've been on RAM disk for all but perf data and things were great all day, but now load is up. vmstat doesn't show any blocking processes, but I see this in /var/log/mysqld.log.
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
This morning I noticed I had a failed drive on the my /usr/local RAID set. I checked the RAID yesterday via remote ILO console when I rebooted and it reported okay, but maybe the drive has been failing and causing this and finally let go. I've replaced the drive and ran the suggested commands and things look good all around now. I'll follow up if things go south again. Thanks for all the help.
I continue with load issues. Last night I hit a load average on 19 and continue to see blocked processes. I have run the Postgres vacuum procedure a couple times but it seems for naught. I'm at a loss what to do. I wanted to move this from physical to a VM but I don't dare with this kind of performance issue. Any thoughts as to whether upgrading to from 2011R3.2 to 2012R1.1 might put things to a correct state?
With quite a while between this post and the previous issue, lets start from the top on this. What do you have showing as the top CPU consuming processes when running:
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[root@psm-itmon ~]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL: [FAILED]
[root@psm-itmon ~]# tail -20 /var/log/mysqld.log
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
[root@psm-itmon ~]# ps -ef | grep mys
root 17258 1 0 08:26 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql 17308 17258 0 08:26 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root 18000 1 0 08:27 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql 18050 18000 0 08:27 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root 18668 23518 0 08:28 pts/0 00:00:00 grep mys
So I rebooted and I am immediately getting blocked processes and increasingly high load and the web interface is dreadfully slow. I never see red dots in the admin page. I am also getting a lot of WMI checks timing out now. graphs don't seem to have any gaps in data.