Maintaining a web site can be a complex and demanding task. It's important to keep the site secure and have a reliable presence on the web. Resources are always seem to be scarce. This week's CGI script helps you observe the disk space used by your site. With it, you'll be able to quickly find which parts of your web site are using the most disk space. You'll be amazed at where all the space goes!
The perl module File::Find performs most of the magic in this script. This function traverses a directory tree and returns the name of every directory and file. The usage is quite simple:
use File::Find;
find(\&wanted, '/foo','/bar');
sub wanted { ... }
This looks at every directory listed (/foo, /bar) and sets some variables before calling the subroutine wanted. For more information, check out the File::Find man page.
# start the search in this directory:
$TREEROOT="/home/dzs/html";
If you want to determine the size of all the documents in your entire web site, you may use something similar to the following:
$TREEROOT="/usr/local/etc/httpd/htdocs";
Of course, you should be aware that the script must read every directory and calculate the size of every file before it can display the results. Needless to say, this may take a few moments if your web site is large. Unless you really enjoy waiting, you probably shouldn't set it to $TREEROOT="/";
After making the change above, you can put the script into your cgi-bin directory. Next, make the script executable:
http://yoursite.com/cgi-bin/treesize.pl
When the script completes its run, you'll see a display with directory names, number of files in each directory, the size of each directory and a percent of the space each directory uses. Finally, the script does not attempt to calculate the size of symbolic links. If you're adventurous, you may add this feature. But you'll need to keep track of the inode value from stat() to avoid counting the same files more than once.
Note: This program has been tested on Perl 5.004 and Linux 2.0.32 running the Apache web server. It may not, however, work correctly on all web servers, especially non-Unix ones.