MySQL Database Crashes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

MySQL Database Crashes

Post by gwakem »

We have been been seeing huge amounts of load being generated from the MySQL daemon process on out Database server starting at midnight. Load spikes from around 0.5 to about 30-50 in the span of ten minutes. These symptoms pretty much cover whats described in the "repairing the Nagios XI Database" pdf, so I attempted to run the repairmysql.sh script. Output below:

./repairmysql.sh nagios *
DATABASE: nagios
TABLE:
/var/lib/mysql/nagios ~
Stopping MySQL: [ OK ]
myisamchk: error: File '*.MYI' doesn't exist
Starting MySQL: [ OK ]
~

===============
REPAIR COMPLETE
===============

Uh oh. No indexes?

ls -lh /var/lib/mysql/nagios
total 912K
-rw-rw---- 1 mysql mysql 65 Feb 24 09:28 db.opt
-rw-rw---- 1 mysql mysql 8.9K Jul 16 08:14 nagios_acknowledgements.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_commands.frm
-rw-rw---- 1 mysql mysql 9.2K Jul 16 08:14 nagios_commenthistory.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_comments.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_configfiles.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_configfilevariables.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_conninfo.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contact_addresses.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contactgroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_contact_notificationcommands.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 08:14 nagios_contactnotificationmethods.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 08:14 nagios_contactnotifications.frm
-rw-rw---- 1 mysql mysql 9.8K Jul 16 08:14 nagios_contacts.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_contactstatus.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_customvariables.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_customvariablestatus.frm
-rw-rw---- 1 mysql mysql 8.4K Jul 16 08:14 nagios_dbversion.frm
-rw-rw---- 1 mysql mysql 9.3K Jul 16 08:14 nagios_downtimehistory.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 08:14 nagios_eventhandlers.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_externalcommands.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 08:14 nagios_flappinghistory.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 08:14 nagios_hostchecks.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_contacts.frm
-rw-rw---- 1 mysql mysql 8.9K Jul 16 08:14 nagios_hostdependencies.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostescalation_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostescalation_contacts.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 08:14 nagios_hostescalations.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostgroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_parenthosts.frm
-rw-rw---- 1 mysql mysql 12K Jul 16 08:14 nagios_hosts.frm
-rw-rw---- 1 mysql mysql 40K Jul 16 08:14 nagios_hoststatus.frm
-rw-rw---- 1 mysql mysql 8.5K Jul 16 08:14 nagios_instances.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:09 nagios_logentries.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:09 nagios_notifications.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_objects.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:09 nagios_processevents.frm
-rw-rw---- 1 mysql mysql 10K Jul 16 07:09 nagios_programstatus.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_runtimevariables.frm
-rw-rw---- 1 mysql mysql 9.2K Jul 16 07:09 nagios_scheduleddowntime.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 07:09 nagios_servicechecks.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_service_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_service_contacts.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 07:09 nagios_servicedependencies.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_serviceescalation_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_serviceescalation_contacts.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 07:09 nagios_serviceescalations.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_servicegroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_servicegroups.frm
-rw-rw---- 1 mysql mysql 12K Jul 16 07:09 nagios_services.frm
-rw-rw---- 1 mysql mysql 40K Jul 16 07:09 nagios_servicestatus.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:09 nagios_statehistory.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:10 nagios_systemcommands.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:10 nagios_timedeventqueue.frm
-rw-rw---- 1 mysql mysql 8.9K Jul 16 07:10 nagios_timedevents.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:10 nagios_timeperiods.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:10 nagios_timeperiod_timeranges.frm


Monday's just KILL me. Please help?
Last edited by gwakem on Tue Jul 17, 2012 2:29 pm, edited 2 times in total.
--
Griffin Wakem
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: MySQL Database Crashes

Post by mguthrie »

Is your database offloaded to another machine by chance? The repair has to be run from whatever machine is hosting the mysql database. I'm not sure you'd have a working XI install without any MYD or MYI files...

If you run:

Code: Select all

service mysqld stop
myisamchk -r -f /var/lib/mysql/nagios/*.MYI
service mysqld start
on the offloaded server you would get the same result as the script.
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: MySQL Database Crashes

Post by gwakem »

That's the weird thing.. I was running the repair on the offloaded DB, and the file list was from that server.

myisamchk -r -f /var/lib/mysql/nagios/*.MYI
myisamchk: error: File '/var/lib/mysql/nagios/*.MYI' doesn't exist

We cant get the Nagios server running, and while its been hanging every night (all weekend), this is the first time its just not coming back up. I don't know when the Indexes went up in smoke.

Sooooooo.. We do have backups of the nagios.sql, but I don't know if we should import that or if there's a better way of doing this.
--
Griffin Wakem
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: MySQL Database Crashes

Post by gwakem »

Well, it appears that the nagios database is innodb, and not myisam (which may explain the lack of MYI?) An import of a backed up nagios.sql seems to be the same. No MYI files. My coffee is bitter with tears and I am very confused. The nagiosql database is myisam.
--
Griffin Wakem
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: MySQL Database Crashes

Post by mguthrie »

Yeah, we do myisam for the nagios database as well, I remember someone asking about converting to innodb, but I had forgotten who it was ; )

So....we don't have a documented repair procedure for innodb, but here are some items I found. We haven't tested any of this, which is the downside of switching to innodb.
http://dev.mysql.com/doc/refman/5.5/en/ ... overy.html
http://www.mysqlperformanceblog.com/200 ... rruption/e
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: MySQL Database Crashes

Post by KevinD »

Just for clarification, it was us that were asking about InnoDB vs MyISAM.
But the order was reversed.

http://support.nagios.com/forum/viewtop ... oDB#p27965

We had found during an issue with the upgrade that the DB was not updated.
When digging to find what had changed, and thus, what version of the DB changes we were on, we found that one of them were trying to alter the DB to use InnoDB.

If I remember right it was 4b4 -> 4b5 that had the InnoDB alterations on it.

So, we are going to change these back to MyISAM later tonight, but we didn't just change it due to our personal feelings toward InnoDB vs MyISAM.

I also found several things on the inter webs about the leap second that was added on 6/30 causing CPU to spike in MySQL, we have forced an update in NTP to resolve this (assuming it was related). Turns out or Parent Nagios was 1min 6sec fast.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: MySQL Database Crashes

Post by mguthrie »

Thanks for the clarification, you are correct. Yours was somewhat of a unique support situation. Let us know if you guys run into any further issues related to this.
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: MySQL Database Crashes

Post by gwakem »

Quick rehash: at exactly midnight, load on the DB server goes from under 1 (0.3,0.7,0.9) to 80 (80.0,43.0,20.0). This causes graphing to stop and checks to stop in Nagios.

We converted the database over to MyISAM last night during a maintenance, updated the kernel, and ran the repairdatabase.sh script. This time however, we watched the spike live, while enabling debug on the mysql queries. We see this in the logs, starting at midnight:

Jul 17 00:00:00 sidhqmonm0 nagios: LOG ROTATION: DAILY
Jul 17 00:00:00 sidhqmonm0 nagios: LOG VERSION: 2.0

Followed by a registering of host sate (which takes from 00:00:00 - 00:00:33,) and service state (which takes 00:00:33 - 00:03:17). During this time, it appears that the service and host state are not being pulled from the retention.dat, but instead being queried directly from the database (confirmed by the debug,) which causes the load spike and subsequent crashes. We saw all 200 of our mysql threads capped, and the queue was over 10k.

Strangely enough, I cant seem to find the specific entry that causes the log rotate for the nagios.log.

/usr/local/nagios/etc/nagios.cfg shows:

retain_state_information=1
retention_update_interval=60
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10

The retention.dat is updated at 4 minutes after the hour.

So now we know how the db crashes every night, but we still dont know why. Any ideas? We are still using 2011r3.1.
--
Griffin Wakem
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: MySQL Database Crashes

Post by gwakem »

Additional Info: I was able to confirm via the /usr/local/nagiosxi/var/sysstat.log that nagios, ndo2db, and npcd all retained the same PID through this time, so they doesn't appear to have restarted.
--
Griffin Wakem
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: MySQL Database Crashes

Post by gwakem »

We have to babysit the system from around 00:00 to 01:00 MST nightly, so any help tomorrow would be greatly appreciated.
--
Griffin Wakem