the complete webmaster
tutorials reviews reference
ASP
CGI
FrontPage
HTML
Java
JavaScript

Sponsored by El Scripto

Visit the Mortgage Loan Place for Home Loans and also click here to find VA Loans on our site.

Promote your website

home / articles / cgi

This is part of a series of articles discussing CGI processing. Part I introduces the basic HTML elements that are used to create a form. Part II discusses how those elements are transferred to the web server and processed by a CGI program. Part III discusses the difference between POST and GET. Finally, Part IV shows how data collected by a CGI program can be stored on the server.

Part II: Sending Form Data

For the example below, assume that we have an HTML file named form.html that looks like this:

<HTML>
 <BODY>
  <FORM METHOD="get" ACTION="/cgi-bin/script.pl">
   <INPUT TYPE="text" NAME="in" SIZE="20" 
    MAXLENGTH="40" VALUE="hello there">
   <INPUT TYPE="submit" NAME="button" VALUE="Send">
  </FORM>
 </BODY>
</HTML>

The file, when viewed in a web browser, it will look something like this:

The user will enter some text in the box and then click the "Send" button. Because this form uses the GET method, all the data sent by the web browser will be visible in the URL. If you're curious, you can try the form above. It will simply reload this page. When you do that, however, notice that the URL will change:

Original:.../cgi/020398.htm
New:.../cgi/020398.htm?in=hello+there&button=Send

As you can see, the data in the form has been "URL Encoded" and added to the URL for this page. Let's split the URL apart to find out what happened:

First, the URL can be split into two main parts. The part before the ? will be used by the web server to determine which script to run. (In our case, the script name is really an html file, but the Apache web server doesn't care. So, in fact no CGI processing will be done in this example.) Everything after the ? is the "URL Encoded" contents of the form, the "QUERY_STRING".

Script:.../cgi/020398.htm
QUERY_STRING:in=hello+there&button=Send

Next, let's find out what happened to the form data that's in QUERY_STRING. As you recall, there were two elements in the form. The text box was named "in" and the submit button was named "button". Both of those names appear in the form data, separated by &. Splitting up the form data on the &, we have:

Text box:in=hello+there
Submit button:button=Send

All the parts of a form are made up of key-value pairs. So, the text box named "in" has the value "hello+there". The key is "in" and its value is "hello+there". Similarly, the submit button's key is "button" and its value is "Send".

But, the original text box had "hello there". Where did that + come from? The contents of the form were "URL Encoded" to transform any characters that aren't allowed in a URL into characters that are accepted. Spaces are not allowed in URLs. So, the HTTP specification states that spaces will be changed to +'s. There are also many other characters that cannot appear in the key-value pairs. Those include ?, =, + and &. Those characters will be transformed into escape codes in the form %xx, where xx is a two-digit hexadecimal value. Here's a brief table:

ASCII
Character
%xx
Code
%%25
&%26
'%27
+%2B
=%3D
?%3F

For example, if you entered the following string in the text box:

M&M's taste good? M+M=2000

The resulting URL-encoded string would look like this (the red text highlights the escape codes):

M%26M%27s+taste+good%3F+M%2BM%3D2000

Perl and CGI

Perl is often used to process CGI forms because it can handle text manipulation tasks -- such as URL encoding and decoding -- very easily. Nearly all web servers setup special environment variables that will contain data from the CGI form and information about the web server and remote web browser. Here are the steps that a Perl program will take to translate the form data from a GET method back into useful strings:
  1. Split $ENV{'QUERY_STRING'} into separate keywords on "&"
  2. Each key-value will be split on "="
  3. Each key and value will be unencoded: each "+" will be changed to a space and %xx codes will be translated back to characters.
Here's a Perl script that does just that. It will display the results back to the web browser:

#! /usr/local/bin/perl
# a simple CGI script that demonstrates
# how to unencode form data from a GET method

# this script will spit out only plain text
print "Content-type: text/plain\n\nHere's the form data:\n\n";

# separate each keyword
foreach ( split( /&/, $ENV{'QUERY_STRING'} ) ) {
    # separate the key and value
    ( $key, $val ) = split( /=/, $_, 2 );

    # translate + to spaces
    $key=~s/\+/ /g;
    $val=~s/\+/ /g;

    # translate %xx codes to characters
    $key=~s/%([0-9a-f]{2})/pack("c",hex($1))/gie;
    $val=~s/%([0-9a-f]{2})/pack("c",hex($1))/gie;

    print "$key = $val\n";
}

Learning more about CGI Forms

The CGI Specification has many more examples and tips for learning and using CGI.

Author: Doug Steinwand
Date: [02/03/98]

More articles about CGI
More articles by Doug Steinwand
Author Biography

Get more website traffic
write for us about us advertise

Copyright 1997, 1998 A Big Lime. All rights reserved.

/body>