| the complete webmaster | ||||
| tutorials | reviews | reference | ||
|
Part II: Sending Form DataFor the example below, assume that we have an HTML file named form.html that looks like this:
<HTML>
<BODY>
<FORM METHOD="get" ACTION="/cgi-bin/script.pl">
<INPUT TYPE="text" NAME="in" SIZE="20"
MAXLENGTH="40" VALUE="hello there">
<INPUT TYPE="submit" NAME="button" VALUE="Send">
</FORM>
</BODY>
</HTML>
The file, when viewed in a web browser, it will look something like this: The user will enter some text in the box and then click the "Send" button. Because this form uses the GET method, all the data sent by the web browser will be visible in the URL. If you're curious, you can try the form above. It will simply reload this page. When you do that, however, notice that the URL will change:
As you can see, the data in the form has been "URL Encoded" and added to the URL for this page. Let's split the URL apart to find out what happened: First, the URL can be split into two main parts. The part before the ? will be used by the web server to determine which script to run. (In our case, the script name is really an html file, but the Apache web server doesn't care. So, in fact no CGI processing will be done in this example.) Everything after the ? is the "URL Encoded" contents of the form, the "QUERY_STRING".
Next, let's find out what happened to the form data that's in QUERY_STRING. As you recall, there were two elements in the form. The text box was named "in" and the submit button was named "button". Both of those names appear in the form data, separated by &. Splitting up the form data on the &, we have:
All the parts of a form are made up of key-value pairs. So, the text box named "in" has the value "hello+there". The key is "in" and its value is "hello+there". Similarly, the submit button's key is "button" and its value is "Send". But, the original text box had "hello there". Where did that + come from? The contents of the form were "URL Encoded" to transform any characters that aren't allowed in a URL into characters that are accepted. Spaces are not allowed in URLs. So, the HTTP specification states that spaces will be changed to +'s. There are also many other characters that cannot appear in the key-value pairs. Those include ?, =, + and &. Those characters will be transformed into escape codes in the form %xx, where xx is a two-digit hexadecimal value. Here's a brief table:
For example, if you entered the following string in the text box: M&M's taste good? M+M=2000 The resulting URL-encoded string would look like this (the red text highlights the escape codes): M%26M%27s+taste+good%3F+M%2BM%3D2000
Perl and CGIPerl is often used to process CGI forms because it can handle text manipulation tasks -- such as URL encoding and decoding -- very easily. Nearly all web servers setup special environment variables that will contain data from the CGI form and information about the web server and remote web browser. Here are the steps that a Perl program will take to translate the form data from a GET method back into useful strings:
#! /usr/local/bin/perl
# a simple CGI script that demonstrates
# how to unencode form data from a GET method
# this script will spit out only plain text
print "Content-type: text/plain\n\nHere's the form data:\n\n";
# separate each keyword
foreach ( split( /&/, $ENV{'QUERY_STRING'} ) ) {
# separate the key and value
( $key, $val ) = split( /=/, $_, 2 );
# translate + to spaces
$key=~s/\+/ /g;
$val=~s/\+/ /g;
# translate %xx codes to characters
$key=~s/%([0-9a-f]{2})/pack("c",hex($1))/gie;
$val=~s/%([0-9a-f]{2})/pack("c",hex($1))/gie;
print "$key = $val\n";
}
Learning more about CGI Forms
Author: Doug Steinwand
More articles about CGI |
| write for us | about us | advertise |
Copyright 1997, 1998 A Big Lime. All rights reserved.