| the complete webmaster | ||||
| tutorials | reviews | reference | ||
|
Web Server Log AnalysisEvery publisher needs to know his or her audience. As a webmaster, you're looking to reach a certain target audience with your web site's content. Using feedback forms, news groups and keeping track of email addresses allow you to be in better contact with the visitors to your web site. Another very useful tool is your web server's access logs. The Apache web server can be configured to store vast amounts of information organized any way you want. Click here to see a small sample from an Apache access log named access_log.As you can see, there is quite a bit of information stored even in this small sample. We can see the host name or IP address of every user who connects to the web site. The log also stores the time and the HTTP request that was sent. So, you may get a feel for how long a user spends on a page before clicking onto another one on your web site. It's possible to store other information, so look at the documentation for your web server software to discover all the wonderful possibilities of access logging. With these logs from your web server, you'll want to analyze their content. One interesting but arcane way to find out the most recent requests on your web site is to issue a command like this in your Unix shell account: tail access_log This will show you the last 10 lines from the log. You'll be able to see, in near real time, what's happening and who's visiting your web site. If you've set up your server software to log errors to a separate file, such as error_log, you'll be able see what kinds of errors occur on your site. The ability to configure logging in Apache is quite powerful. As you can see from the examples above, it is useful to have the normal access logs separated from the error log, as it makes finding problems much easier. However, it also makes tracking down the cause of the errors more difficult, because the error log tracks only errors and doesn't show which files were successfully retrieved by a user. You may want to consider making a combined log which contains all the logging information. Of course, depending upon how busy your site is, you'll need to periodically reset the log files to save disk space. Of course, looking at text files is fun, but you'll probably want to do some automated analysis and formatting on your server logs. So, this week I've written a simple script to take your server logs and provide a little summary information on them. First, take a look at the sample output from it. There are two versions available; one runs as a CGI program on your web server to provide real-time updates and the other is a unix shell script to run whenever you want to process a batch of logs. This version operates as a CGI program and you'd put it in your cgi-bin directory. To configure it, you'll have to change the following code to point to your server's access log:
$logfile="access_log";
($h,$junk)=split(/\s/,$_,2); This version operates as a Unix shell script and is handy when you have a huge log or a batch of old, GZIPped logs that you want to summarize: cat aprillog maylog junelog | UNIX-logtohtml.pl > summary.html or, if you have a gzipped log: gunzip -c access_log.gz | UNIX-logtohtml.pl > summary.html You'll probably also have to modify "#find the host name" as above to find the host name in your log files. Author: Doug Steinwand
Date: [10/28/97]
More articles about CGI
More articles by Doug Steinwand Author Biography |
| write for us | about us | advertise |
Copyright 1997, 1998 A Big Lime. All rights reserved.