Out of disk space after only a week?

clauretano · Post by **clauretano** » Mon Jan 25, 2010 12:41 pm

I figured the machine was 8GB because that was sufficient, but boy was I wrong.

***** Nagios XI Alert ***** Notification Type: PROBLEM Service: Root Partition Host: localhost Address: 127.0.0.1 State: WARNING Info: DISK WARNING - free space: / 1355 MB (20% inode=93%)

Since it's a VM on a vSphere cluster and since the XI appliance uses LVM, it shouldn't be too much trouble to extend it, or so I thought.

Here's the steps I went through, including where I think I went wrong:
1. Shut down the appliance
2. Edit the config in the vSphere Client. I expanded the disk from 8GB to 64GB
3. Boot up the appliance, partition the new free space at the end of the disk as a primary partition type Linux LVM (8e)
4. Expand VolGroup00 to fill the new partition (/dev/sda3 in my case)
5. Expand the logical volume VolGroup00-Logvol00 to fill the empty space in the volume group
6. Expand the ext3 partition to fill the new space in the logical volume** this is where I messed it up.

I issued the command "resize2fs /dev/mapper/VolGroup00-Logvol00", and this is the output I got:

Code: Select all

[root@nagiosxi ~]# resize2fs /dev/mapper/VolGroup00-LogVol00
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/mapper/VolGroup00-LogVol00 is mounted on /; on-line resizing required
Performing an on-line resize of /dev/mapper/VolGroup00-LogVol00 to 16424960 (4k) blocks.

It has been quite a while, which is why I'm worried that it has failed. Monitoring the CPU, Memory, and Disk Usage in vSphere I see that it did peak for a few minutes but it's basically flatlining now. There has been zero disk usage for the last 30 minutes. I did try searching the wiki for info on this topic before I proceeded but the wiki seems to be pretty much empty. I knew I should have just rolled my own Nagios box, but mgmt likes to see support contracts, "enterprise", ajaxy interfaces and pretty pictures, all of which seem to be covered by Nagios XI.

Post by **admin** » Tue Jan 26, 2010 6:10 pm

Strange that you ran out of space that quickly. How many hosts/services are you monitoring? Are there a lot of passive service checks?

In order to see where the space might be getting used, try running the 'du -hs' command on four directories like so:

Code: Select all

cd /usr/local/nagios
du -hs *
cd /usr/local/nagiosxi
du -hs *
cd /var/lib/mysql
du -hs *
cd /var/lib/pgsql
du -hs *

If there are large numbers somewhere in the output, you can start digging deeper in the offending directory to try and track the source down. It could be database size, performance graphs, or Nagios event logs that are eating up some space.

BTW, thanks for posting the notes on how you expanded the disk. I'm sure others will find the information you posted most useful if they need to upsize their drives in the future.

rseiwert · Post by **rseiwert** » Tue Dec 06, 2011 11:37 am

I know this is a year later but I have the same problem. The thing that everyone seemed to miss about the original post is that the disk is 20% full but the inode list is 97% full. Expanding the disk will have no effect. Expanding the file system with mkfs should but the issue is lots of temp files not being cleaned up.

To fix this problem:
Remove unnecessary (old, temporary, core, or log) files from the filesystem.
Determine whether the filesystem contains a large number of small files.

The initial allocation of inodes assumes a ratio of about four data blocks per inode. If the filesystem contains mostly files that are smaller than four blocks, it runs out of inodes.

rseiwert · Post by **rseiwert** » Tue Dec 06, 2011 11:53 am

BTW, top of inode count for /usr/local/nagios

181171 ./var/spool/perfdata
1254 ./share/images/logos
168 ./var/archives
124 ./libexec
106 ./share/docs/images
95 ./share/perfdata

Is there supposed to be a log rotator or cleanup process for these?

rseiwert · Post by **rseiwert** » Tue Dec 06, 2011 11:55 am

Also can these be deleted? If so how to delete 181,000 files from a directory. Surely rm -f * will not work as the expansion would be to long.

mguthrie · Post by **mguthrie** » Tue Dec 06, 2011 12:43 pm

If there are 181171 then you might have a permissions issue somewhere. Those files are supposed to be cleaned up automatically within by PNP (the performance grapher). What are the permissions on that directory? The files are supposed to be "reaped" every 15 seconds, the results processed and dump to the rrd files, and then the files removed. How are your performance graphs? ; )

Code: Select all

service npcd status
service npcd restart

You can delete those files, it's hard to say whether or not that performance data has been processed to the rrd files or not for performance graphs. If we fix whatever issue is preventing them from being deleted, they will get processed ok, but your CPU load is going to be pretty high until they're all completed.

rseiwert · Post by **rseiwert** » Wed Dec 07, 2011 9:28 am

This is a problem that's been going on for awhile. I

mguthrie · Post by **mguthrie** » Wed Dec 07, 2011 11:23 am

I'm thinking there was supposed to be more to that message ; )

Nagios Support Forum

Out of disk space after only a week?

Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?

Re: Out of disk space after only a week?