Nagios performance trouble

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

So I've been on RAM disk for all but perf data and things were great all day, but now load is up. vmstat doesn't show any blocking processes, but I see this in /var/log/mysqld.log.

Code: Select all

120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
- Kyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios performance trouble

Post by mguthrie »

Try running our repair procedure on the database:
http://assets.nagios.com/downloads/nagi ... tabase.pdf

Also, lets make sure postgres is ok well:

Code: Select all

psql nagiosxi nagiosxi
vacuum;
vacuum analyze;
vaccum full;
\q

psql postgres postgres
vacuum;
vacuum analyze;
vaccum full;
\q
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

This morning I noticed I had a failed drive on the my /usr/local RAID set. I checked the RAID yesterday via remote ILO console when I rebooted and it reported okay, but maybe the drive has been failing and causing this and finally let go. I've replaced the drive and ran the suggested commands and things look good all around now. I'll follow up if things go south again. Thanks for all the help.
- Kyle
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios performance trouble

Post by slansing »

Good to hear you found the bad apple. Hopefully that's all it was.
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

All good after 3 days.
- Kyle
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

I continue with load issues. Last night I hit a load average on 19 and continue to see blocked processes. I have run the Postgres vacuum procedure a couple times but it seems for naught. I'm at a loss what to do. I wanted to move this from physical to a VM but I don't dare with this kind of performance issue. Any thoughts as to whether upgrading to from 2011R3.2 to 2012R1.1 might put things to a correct state?
- Kyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios performance trouble

Post by mguthrie »

With quite a while between this post and the previous issue, lets start from the top on this. What do you have showing as the top CPU consuming processes when running:

Code: Select all

top
Check /var/log/mysqld.log and make sure there aren't any corrupted tables.
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

mysqld for the most part with httpd and php coming in behind it.
- Kyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios performance trouble

Post by mguthrie »

I would recommend restarting apache and then also running the mysql DB repair procedure.

Are performance graphs updating ok?

Do you see any red dots from the Admin page on the subsystem components?
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

I stopped/started httpd and then did the repair but it could not start mysqld after.

Code: Select all

 recovering (with sort) MyISAM-table 'nagios_timeperiod_timeranges.MYI'
Data records: 166
- Fixing index 1
- Fixing index 2
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
There were no errors in the repair process before this. So I tried the suggested next step, but that wasn't happening.

Code: Select all

[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[root@psm-itmon ~]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@psm-itmon ~]# tail -20 /var/log/mysqld.log 
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
[root@psm-itmon ~]# ps -ef | grep mys
root     17258     1  0 08:26 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql    17308 17258  0 08:26 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root     18000     1  0 08:27 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql    18050 18000  0 08:27 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root     18668 23518  0 08:28 pts/0    00:00:00 grep mys
So I rebooted and I am immediately getting blocked processes and increasingly high load and the web interface is dreadfully slow. I never see red dots in the admin page. I am also getting a lot of WMI checks timing out now. graphs don't seem to have any gaps in data.
- Kyle