Troubleshooting a Maxed Out Linux HD

Yesterday I sat down at my main desktop PC to do some work and, out of nowhere, error message after error message popped up informing me the hard drive was at 100% capacity which meant the operating system had no room to write to. This baffled me as I was 100% sure there should have been over 60 gigs of space available. My first inclination was to search for larger files that might have grown out of hand - torrents especially.

After much searching I saw nothing. I even started at the / directory and was coming up with nothing. So naturally I went right to the log files. Believe it or not, it was not in the log files that I discovered where the problem was. Of course I thought I should share this experience with ghacks in order to illustrate how troubleshooting a Linux machine can go.

After the futile manual file search for files I went to the logs. The first log I went to (which is the first log I always turn to) dmesg which prints the message buffer from the kernel. To view this you just type dmesg at the terminal window. This was my first strike as the kernel buffer knew nothing of my capacity drive.

My next step was to head on over to /var/log and take a peek around any of the log files that might offer up a clue as to why my hard drive was maxed out. My instincts always take me to /var/log/messages first. This particular log file keeps track of general system information regarding boot up, networking. Another strike.

At this point I realized I had to take a break and clear some space because the warnings wouldn’t stop. I doubled checked to make sure the reports were correct by issuing the command:

df -h

which confirmed that /dev/sda1 was at 100% usage. I managed to free up a couple of gigs of space by deleting some torrents. The errors went away and I could continue working.

My next step was to check the size of my proxy logs and Dansguardian logs. I had to move both systems over to my main desktop and had a feeling those logs needed to be rotated. I was right, but it didn’t solve my problem. The tiny prox logs weren’t huge (by any stretch of the imagination), but they were many. So I deleted the older logs and moved on.

I was running out of log files to check and nothing had given me any idea what was going on.

Search and destroy

It was time to go back to the search method. But instead of using the manual method (how long would it take to weed through the ENTIRE Linux file system - I didn’t want to know) I opted to employ a little help thanks to the find command. The find command allows you to add switches to your search to indicate file size. In my case I wanted to first see if there were any files larger than 100 MB in size. To do this I will issue the command:

find / -size +1000000k -print0 | xargs -0 ls -l

as either root or using sudo. What this command does is tell find to search for files > 1000MB and send them to the standard output (that’s basically the terminal window), and pipe them to xargs so that you can see the detailed list (using “-l” of ls). Because I was starting at the root directory, I knew this would take some time.

It did. But after some time I discovered five files that were each 12 gigs in size in /var/cache/. These files were from a backup program I was working with and forgot to disable. So once a week my entire /home directory was being backed up. I deleted the files (recovering sixty gigs of space) and disabled the backup program. Problem solved.

Final thoughts

There are times when even the best logging system available will not tell you what you need to know. At those times you have to employ your best sluething techniques. Fortunately the Linux operating system encourages these types of administration tricks.