We have been been seeing huge amounts of load being generated from the MySQL daemon process on out Database server starting at midnight. Load spikes from around 0.5 to about 30-50 in the span of ten minutes. These symptoms pretty much cover whats described in the "repairing the Nagios XI Database" pdf, so I attempted to run the repairmysql.sh script. Output below:
./repairmysql.sh nagios *
DATABASE: nagios
TABLE:
/var/lib/mysql/nagios ~
Stopping MySQL: [ OK ]
myisamchk: error: File '*.MYI' doesn't exist
Starting MySQL: [ OK ]
~
===============
REPAIR COMPLETE
===============
Uh oh. No indexes?
ls -lh /var/lib/mysql/nagios
total 912K
-rw-rw---- 1 mysql mysql 65 Feb 24 09:28 db.opt
-rw-rw---- 1 mysql mysql 8.9K Jul 16 08:14 nagios_acknowledgements.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_commands.frm
-rw-rw---- 1 mysql mysql 9.2K Jul 16 08:14 nagios_commenthistory.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_comments.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_configfiles.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_configfilevariables.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_conninfo.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contact_addresses.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contactgroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_contact_notificationcommands.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 08:14 nagios_contactnotificationmethods.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 08:14 nagios_contactnotifications.frm
-rw-rw---- 1 mysql mysql 9.8K Jul 16 08:14 nagios_contacts.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 08:14 nagios_contactstatus.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_customvariables.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_customvariablestatus.frm
-rw-rw---- 1 mysql mysql 8.4K Jul 16 08:14 nagios_dbversion.frm
-rw-rw---- 1 mysql mysql 9.3K Jul 16 08:14 nagios_downtimehistory.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 08:14 nagios_eventhandlers.frm
-rw-rw---- 1 mysql mysql 8.7K Jul 16 08:14 nagios_externalcommands.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 08:14 nagios_flappinghistory.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 08:14 nagios_hostchecks.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_contacts.frm
-rw-rw---- 1 mysql mysql 8.9K Jul 16 08:14 nagios_hostdependencies.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostescalation_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostescalation_contacts.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 08:14 nagios_hostescalations.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostgroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_hostgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 08:14 nagios_host_parenthosts.frm
-rw-rw---- 1 mysql mysql 12K Jul 16 08:14 nagios_hosts.frm
-rw-rw---- 1 mysql mysql 40K Jul 16 08:14 nagios_hoststatus.frm
-rw-rw---- 1 mysql mysql 8.5K Jul 16 08:14 nagios_instances.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:09 nagios_logentries.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:09 nagios_notifications.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_objects.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:09 nagios_processevents.frm
-rw-rw---- 1 mysql mysql 10K Jul 16 07:09 nagios_programstatus.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_runtimevariables.frm
-rw-rw---- 1 mysql mysql 9.2K Jul 16 07:09 nagios_scheduleddowntime.frm
-rw-rw---- 1 mysql mysql 34K Jul 16 07:09 nagios_servicechecks.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_service_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_service_contacts.frm
-rw-rw---- 1 mysql mysql 9.0K Jul 16 07:09 nagios_servicedependencies.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_serviceescalation_contactgroups.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_serviceescalation_contacts.frm
-rw-rw---- 1 mysql mysql 9.1K Jul 16 07:09 nagios_serviceescalations.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_servicegroup_members.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:09 nagios_servicegroups.frm
-rw-rw---- 1 mysql mysql 12K Jul 16 07:09 nagios_services.frm
-rw-rw---- 1 mysql mysql 40K Jul 16 07:09 nagios_servicestatus.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:09 nagios_statehistory.frm
-rw-rw---- 1 mysql mysql 33K Jul 16 07:10 nagios_systemcommands.frm
-rw-rw---- 1 mysql mysql 8.8K Jul 16 07:10 nagios_timedeventqueue.frm
-rw-rw---- 1 mysql mysql 8.9K Jul 16 07:10 nagios_timedevents.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:10 nagios_timeperiods.frm
-rw-rw---- 1 mysql mysql 8.6K Jul 16 07:10 nagios_timeperiod_timeranges.frm
Monday's just KILL me. Please help?
MySQL Database Crashes
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
MySQL Database Crashes
Last edited by gwakem on Tue Jul 17, 2012 2:29 pm, edited 2 times in total.
--
Griffin Wakem
Griffin Wakem
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: MySQL Database Crashes
Is your database offloaded to another machine by chance? The repair has to be run from whatever machine is hosting the mysql database. I'm not sure you'd have a working XI install without any MYD or MYI files...
If you run:
on the offloaded server you would get the same result as the script.
If you run:
Code: Select all
service mysqld stop
myisamchk -r -f /var/lib/mysql/nagios/*.MYI
service mysqld start
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
Re: MySQL Database Crashes
That's the weird thing.. I was running the repair on the offloaded DB, and the file list was from that server.
myisamchk -r -f /var/lib/mysql/nagios/*.MYI
myisamchk: error: File '/var/lib/mysql/nagios/*.MYI' doesn't exist
We cant get the Nagios server running, and while its been hanging every night (all weekend), this is the first time its just not coming back up. I don't know when the Indexes went up in smoke.
Sooooooo.. We do have backups of the nagios.sql, but I don't know if we should import that or if there's a better way of doing this.
myisamchk -r -f /var/lib/mysql/nagios/*.MYI
myisamchk: error: File '/var/lib/mysql/nagios/*.MYI' doesn't exist
We cant get the Nagios server running, and while its been hanging every night (all weekend), this is the first time its just not coming back up. I don't know when the Indexes went up in smoke.
Sooooooo.. We do have backups of the nagios.sql, but I don't know if we should import that or if there's a better way of doing this.
--
Griffin Wakem
Griffin Wakem
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
Re: MySQL Database Crashes
Well, it appears that the nagios database is innodb, and not myisam (which may explain the lack of MYI?) An import of a backed up nagios.sql seems to be the same. No MYI files. My coffee is bitter with tears and I am very confused. The nagiosql database is myisam.
--
Griffin Wakem
Griffin Wakem
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: MySQL Database Crashes
Yeah, we do myisam for the nagios database as well, I remember someone asking about converting to innodb, but I had forgotten who it was ; )
So....we don't have a documented repair procedure for innodb, but here are some items I found. We haven't tested any of this, which is the downside of switching to innodb.
http://dev.mysql.com/doc/refman/5.5/en/ ... overy.html
http://www.mysqlperformanceblog.com/200 ... rruption/e
So....we don't have a documented repair procedure for innodb, but here are some items I found. We haven't tested any of this, which is the downside of switching to innodb.
http://dev.mysql.com/doc/refman/5.5/en/ ... overy.html
http://www.mysqlperformanceblog.com/200 ... rruption/e
-
- Posts: 26
- Joined: Thu Mar 29, 2012 10:26 am
Re: MySQL Database Crashes
Just for clarification, it was us that were asking about InnoDB vs MyISAM.
But the order was reversed.
http://support.nagios.com/forum/viewtop ... oDB#p27965
We had found during an issue with the upgrade that the DB was not updated.
When digging to find what had changed, and thus, what version of the DB changes we were on, we found that one of them were trying to alter the DB to use InnoDB.
If I remember right it was 4b4 -> 4b5 that had the InnoDB alterations on it.
So, we are going to change these back to MyISAM later tonight, but we didn't just change it due to our personal feelings toward InnoDB vs MyISAM.
I also found several things on the inter webs about the leap second that was added on 6/30 causing CPU to spike in MySQL, we have forced an update in NTP to resolve this (assuming it was related). Turns out or Parent Nagios was 1min 6sec fast.
But the order was reversed.
http://support.nagios.com/forum/viewtop ... oDB#p27965
We had found during an issue with the upgrade that the DB was not updated.
When digging to find what had changed, and thus, what version of the DB changes we were on, we found that one of them were trying to alter the DB to use InnoDB.
If I remember right it was 4b4 -> 4b5 that had the InnoDB alterations on it.
So, we are going to change these back to MyISAM later tonight, but we didn't just change it due to our personal feelings toward InnoDB vs MyISAM.
I also found several things on the inter webs about the leap second that was added on 6/30 causing CPU to spike in MySQL, we have forced an update in NTP to resolve this (assuming it was related). Turns out or Parent Nagios was 1min 6sec fast.
-
- Posts: 4380
- Joined: Mon Jun 14, 2010 10:21 am
Re: MySQL Database Crashes
Thanks for the clarification, you are correct. Yours was somewhat of a unique support situation. Let us know if you guys run into any further issues related to this.
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
Re: MySQL Database Crashes
Quick rehash: at exactly midnight, load on the DB server goes from under 1 (0.3,0.7,0.9) to 80 (80.0,43.0,20.0). This causes graphing to stop and checks to stop in Nagios.
We converted the database over to MyISAM last night during a maintenance, updated the kernel, and ran the repairdatabase.sh script. This time however, we watched the spike live, while enabling debug on the mysql queries. We see this in the logs, starting at midnight:
Jul 17 00:00:00 sidhqmonm0 nagios: LOG ROTATION: DAILY
Jul 17 00:00:00 sidhqmonm0 nagios: LOG VERSION: 2.0
Followed by a registering of host sate (which takes from 00:00:00 - 00:00:33,) and service state (which takes 00:00:33 - 00:03:17). During this time, it appears that the service and host state are not being pulled from the retention.dat, but instead being queried directly from the database (confirmed by the debug,) which causes the load spike and subsequent crashes. We saw all 200 of our mysql threads capped, and the queue was over 10k.
Strangely enough, I cant seem to find the specific entry that causes the log rotate for the nagios.log.
/usr/local/nagios/etc/nagios.cfg shows:
retain_state_information=1
retention_update_interval=60
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
The retention.dat is updated at 4 minutes after the hour.
So now we know how the db crashes every night, but we still dont know why. Any ideas? We are still using 2011r3.1.
We converted the database over to MyISAM last night during a maintenance, updated the kernel, and ran the repairdatabase.sh script. This time however, we watched the spike live, while enabling debug on the mysql queries. We see this in the logs, starting at midnight:
Jul 17 00:00:00 sidhqmonm0 nagios: LOG ROTATION: DAILY
Jul 17 00:00:00 sidhqmonm0 nagios: LOG VERSION: 2.0
Followed by a registering of host sate (which takes from 00:00:00 - 00:00:33,) and service state (which takes 00:00:33 - 00:03:17). During this time, it appears that the service and host state are not being pulled from the retention.dat, but instead being queried directly from the database (confirmed by the debug,) which causes the load spike and subsequent crashes. We saw all 200 of our mysql threads capped, and the queue was over 10k.
Strangely enough, I cant seem to find the specific entry that causes the log rotate for the nagios.log.
/usr/local/nagios/etc/nagios.cfg shows:
retain_state_information=1
retention_update_interval=60
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
The retention.dat is updated at 4 minutes after the hour.
So now we know how the db crashes every night, but we still dont know why. Any ideas? We are still using 2011r3.1.
--
Griffin Wakem
Griffin Wakem
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
Re: MySQL Database Crashes
Additional Info: I was able to confirm via the /usr/local/nagiosxi/var/sysstat.log that nagios, ndo2db, and npcd all retained the same PID through this time, so they doesn't appear to have restarted.
--
Griffin Wakem
Griffin Wakem
-
- Posts: 238
- Joined: Mon Jan 23, 2012 2:02 pm
- Location: Asheville, NC
Re: MySQL Database Crashes
We have to babysit the system from around 00:00 to 01:00 MST nightly, so any help tomorrow would be greatly appreciated.
--
Griffin Wakem
Griffin Wakem