Day 10

Chapter 19 Beginning CGI Scripting

What Is a CGI Script?
- How Do CGI Scripts Work?
- A Simple Example
Can I Use CGI Scripts?
Anatomy of a CGI Script
Creating Special Script Output
- Responding by Loading Another Document
- No Response
Scripts To Process Forms
Troubleshooting
CGI Variables
Programs To Decode Form Input
Nonparsed Headers Scripts
<ISINDEX> Scripts
Summary
Q&A

CGI stands for Common Gateway Interface, a method for running programs on the Web server based on input from a Web browser. CGI scripts enable your reader to interact with your Web pages-to search for an item in a database, to offer comments on what you've written, or to select several items from a form and get a customized reply in return. If you've ever come across a fill-in form or a search dialog on the Web, you've used a CGI script. You may not have realized it at the time because most of the work happens on the Web server, behind the scenes. You see only the result.

As a Web author, you create all the sides of the CGI script: the side the reader sees, the programming on the server side to deal with the reader's input, and the result given back to the reader. CGI scripts are an extremely powerful feature of Web browser and server interaction that can completely change how you think of a Web presentation.

In this chapter, you'll learn just about everything about CGI scripts, including

What a CGI script is and how it works
What the output of a CGI script looks like
How to create CGI scripts with and without parameters or arguments
How to create scripts that return special responses
How to create scripts to process input from forms
Troubleshooting problems with your CGI scripts
CGI variables you can use in your scripts
Scripts with non-parsed headers
Searches using <ISINDEX>

Note

This chapter and the next focus primarily on Web servers running on UNIX systems, and most of the examples and instructions will apply only to UNIX. If you run your Web server on a system other than UNIX, the procedures you'll learn in this section for creating CGI scripts may not apply. But this chapter will at least give you an idea of how CGI works, and then you can combine that with the documentation of CGI on your specific server.

What Is a CGI Script?

A CGI script, most simply, is a program that is run on a Web server, triggered by input from a browser. The script is usually a link between the server and some other program running on the system; for example, a database.

CGI scripts do not have to be actual scripts-depending on what your Web server supports, they can be compiled programs or batch files or any other executable entity. For the sake of a simple term for this chapter, however, I'll call them scripts.

New Term

A CGI script is any program that runs on the Web server. CGI stands for Common Gateway Interface and is a basic set of variables and mechanisms for passing information from the browser to the server.

CGI scripts are usually used in one of two ways: as the ACTION to a form or as a direct link on a page. Scripts to process forms are used slightly differently than regular CGI scripts, but both have very similar appearances and behavior. For the first part of this chapter you'll learn about generic CGI scripts and then move on to creating scripts that process forms.

How Do CGI Scripts Work?

CGI scripts are called by the server, based on information from the browser. Figure 19.1 shows the path of how things work between the browser, the server, and the script.

Figure 19.1 : Browser to server to script to program and back again.

Here's a short version of what's actually going on:

A URL points to a CGI script. A CGI script URL can appear anywhere that a regular URL can appear; for example, in a link or in an image. Most often, a URL appears as the ACTION to a form. The browser contacts the server with that URL.
The server receives the request, notes that the URL points to a script (based on the location of the file or based on its extension, depending on the server), and executes that script.
The script performs some action based on the input, if any, from the browser. The action may include querying a database, calculating a value, or simply calling some other program on the system.
The script generates some kind of output that the Web server can understand.
The Web server receives the output from the script and passes it back to the browser, which formats and displays it for the reader.

Got it? No? Don't be worried; it can be a confusing process. Read on, it'll become clearer with a couple of examples.

A Simple Example

Here's a simple example, with a step-by-step explanation of what's happening on all sides of the process. In your browser, you encounter a page that looks like the page shown in Figure 19.2.

Figure 19.2 : A page with a script link.

The link to Display the Date is a link to a CGI script. It is embedded in the HTML code for the page just like any other link. If you were to look at the HTML code for that page, that link might look like this:

<A HREF="http://www.somesite.com/cgi-bin/getdate">Display the Date</A>

The fact that there's a cgi-bin in the pathname is a strong hint that this is a CGI script. In many servers cgi-bin is the only place that CGI scripts can be kept.

When you select the link, your browser requests that URL from the server at the site www.somesite.com. The server receives the request and figures out from its configuration that the URL it's been given is a script called getdate. It executes that script.

The getdate script, in this case a shell script to be executed on a UNIX system, looks something like this:

#!/bin/sh echo Content-type: text/plain echo /bin/date

The first line is a special command that tells UNIX this is a shell script; the real fun begins on the line after that. This script does two things. First, it outputs the line Content-type: text/plain, followed by a blank line. Second, it calls the standard UNIX date program, which prints out the date and time. So the complete output of the script looks something like this:

Content-type: text/plain Tue Oct 25 16:15:57 EDT 1994

What's that Content-type thing? That's a special code the Web server passes on to the browser to tell it what kind of document this is. The browser then uses that code to figure out if it can display the document or not, or if it needs to load an external viewer. You'll learn specifics about this line later in this chapter.

So, after the script is finished executing, the server gets the result and passes it back to the browser over the Net. The browser has been waiting patiently all this time for some kind of response. When the browser gets the input from the server, it simply displays it (Figure 19.3).

Figure 19.3 : The result of the date script.

That's the basic idea. Although things can get much more complicated, it's this interaction between browser, server, and script that is at the heart of how CGI scripts work.

Can I Use CGI Scripts?

Before you can use CGI scripts in your Web presentations, there are several basic conditions that must be met by both you and your server. CGI scripting is an advanced Web feature and requires knowledge on your part as well as the cooperation of your Web server provider.

Make sure you can answer all the questions in this section before going on.

Is Your Server Configured To Allow CGI Scripts?

In order to write and run CGI scripts, you will need a Web server. Unlike with regular HTML files, you cannot write and test CGI scripts on your local system; you have to go through a Web server to do so.

But even if you have a Web server, that server has to be specially configured to run CGI scripts. That usually means that all your scripts will be kept in a special directory called cgi-bin.

Before trying out CGI scripts, ask your server administrator if you are allowed to install and run CGI scripts and, if so, where to put them when you're done writing them. Also, you must have a real Web server to run CGI scripts-if you publish your Web pages on an FTP or Gopher server, you cannot use CGI.

If you run your own server, you'll have to specially create a cgi-bin directory and configure your server to recognize that directory as a script directory (part of your server configuration, which of course varies from server to server). Also keep in mind the following issues that CGI scripts bring up:

Each script is a program, and it runs on your system when the browser requests it, using CPU time and memory during its execution. What happens to the system if dozens or hundreds or thousands of these scripts are running at the same time? Your system may not be able to handle the load, making it crash or unusable for normal work.
Unless you are very careful with the CGI scripts you write, you can potentially open yourself up to someone breaking into or damaging your system by passing arguments to your CGI script that are different from those it expects.

Can You Program?

Beginner beware! In order to do CGI, process forms, or do any sort of interactivity on the World Wide Web, you must have a basic grasp of programming concepts and methods, and you should have some familiarity with the system on which you are working. If you don't have this background, I strongly suggest that you consult with someone who does, pick up a book in programming basics, or take a class in programming at your local college. This book is far too short for me to explain both introductory programming and CGI programming at the same time; in this chapter in particular, I am going to assume that you can read and understand the code in these examples.

What Programming Language Should You Use?

You can use just about any programming language you're familiar with to write CGI scripts, as long as your script follows the rules in the next section, and as long as that language can run on the system your Web server runs on. Some servers, however, may only support programs written in a particular language. For example, MacHTTP and WebStar use AppleScript for their CGI scripts; WinHTTPD and WebSite use Visual Basic. To write CGI scripts for your server, you must program in the language that server accepts.

In this chapter and throughout this book, I'm going to be writing these CGI scripts in two languages: the UNIX Bourne shell and the Perl language. The Bourne shell is available on nearly any UNIX system and is reasonably easy to learn, but doing anything complicated with it can be difficult. Perl, on the other hand, is freely available, but you'll have to download and compile it on your system. The language itself is extremely flexible and powerful (nearly as powerful as a programming language such as C), but it is also very difficult to learn.

Is Your Server Set Up Right?

To run any CGI scripts, whether they are simple scripts or scripts to process forms, your server needs to be set up explicitly to run them. This might mean your scripts must be kept in a special directory or they must have a special file extension, depending on which server you're using and how it's set up.

If you are renting space on a Web server, or if someone else is in charge of administering your Web server, you have to ask the person in charge whether CGI scripts are allowed and, if so, where to put them.

If you run your own server, check with the documentation for that server to see how it handles CGI scripts.

What If You're Not on UNIX?

If you're not on UNIX, stick around. There's still lots of general information about CGI that might apply to your server. But just for general background, here's some information about CGI on other common Web servers.

WinHTTPD for Windows 3.x, and WebSite for Windows 95 and NT, both include CGI capabilities with which you can manage form and CGI input. Both servers include a DOS and Windows CGI mode, the latter of which allows you to manage CGI through Visual Basic. The DOS mode can be configured to handle CGI scripts using Perl or Tcl (or any other language). WebSite also has a CGI mode for running Perl and Windows shell script CGI programs.

MacHTTP has CGI capabilities in the form of AppleScript scripts. (The new version of MacHTTP will be called WebStar and is available from StarNine.) Jon Wiederspan has written an excellent tutorial on using AppleScript CGI, which is included as part of the MacHTTP documentation.

Anatomy of a CGI Script

If you've made it this far, past all the warnings and configuration, congratulations! You can write CGI scripts and create forms for your presentations. In this section you'll learn about how your scripts should behave so your server can talk to them and get the correct response back.

The Output Header

Your CGI scripts will generally get some sort of input from the browser by way of the server. You can do anything you want with that information in the body of your script, but the output of that script has to follow a special form.

Note

By "script output," I'm referring to the data your script sends back to the server. On UNIX, the output is sent to the standard output, and the server picks it up from there. On other systems and other servers, your script output may go somewhere else, for example, you may write to a file on the disk or send the output explicitly to another program. Again, this is a case where you should carefully examine the documentation for your server to see how CGI scripts have been implemented in that server.

The first thing your script should output is a special header that gives the server, and eventually the browser, information about the rest of the data your script is going to create. The header isn't actually part of the document; it's never displayed anywhere. Web servers and browsers actually send information like this back and forth all the time; you just never see it.

There are three types of headers that you can output from scripts: Content-type, Location, and Status. Content-type is the most popular, so I'll explain it here; you'll learn about Location and Status later in this chapter.

You learned about the content-type header earlier in this book; content-types are used by the browser to figure out what kind of data its receiving. Because script output doesn't have a file extension, you have to explicitly tell the browser what kind of data you're sending back. To do this, you use the Content-type header. A Content-type header has the words Content-type, a special code for describing the kind of file you're sending, and a blank line, like this:

Content-type: text/html

In this example, the contents of the data to follow are of the type text/html; in other words, it's an HTML file. Each file format you work with when you're creating Web presentations has a corresponding content-type, so you should match the format of the output of your script to the appropriate one. Table 19.1 shows some common formats and their equivalent content-types.

Table 19.1. Common formats and content-types.

Format	Content-Type
HTML	`text/html`
Text	`text/plain`
GIF	`image/gif`
JPEG	`image/jpeg`
PostScript	`application/postscript`
MPEG	`video/mpeg`

Note that the content-type line must be followed by a blank line. The server will not be able to figure out where the header ends if you don't include the blank line.

The Output Data

The remainder of your script is the actual data that you want to send back to the browser. The content you output in this part should match the content-type you told the server you were giving it; that is, if you use a content-type of text/html, the rest of the output should be in HTML. If you use a content-type of image/gif, the remainder of the output should be a binary GIF file, and so on for all the content-types.

Exercise 19.1: Try it.

This exercise is similar to the simple example from earlier in this chapter, the one that printed out the date. This CGI script checks to see if I'm logged into my Web server and reports back what it found (as shown in Figure 19.4).

Figure 19.4 : The pinglaura script results.

This is the most simple form of a CGI script, which can be called from a Web page by just linking to it like this:

<A HREF="http://www.lne.com/cgi-bin/pinglaura">Is Laura Logged in?</A>

When you link to a CGI script like this, selecting that link runs the script. There is no input to the script; it just runs and returns data.

First, determine the content-type you'll be outputting. Since this will be an HTML document, the content-type is text/html. So the first part of your script simply prints out a line containing the content-type, and a blank line after that (don't forget that blank line!):

#!/bin/sh echo Content-type: text/html echo

Now, add the remainder of the script: the body of the HTML document, which you had to construct yourself from inside the script. Basically what you're going to do here is

Print out the tags that make up the first part of the HTML document.
Test to see if I'm logged in, and output an appropriate message.
Print out the last bit of HTML tags to finish up the document.

Start with the first bit of the HTML. The following commands will do this in the UNIX shell:

echo "<HTML><HEAD>" echo "<TITLE>Is Laura There?</TITLE>" echo "</HEAD><BODY>"

Now test to see whether I'm logged into the system using the who command (my login ID is lemay), and store the result in the variable ison. If I'm logged in, the ison variable will have something in it; otherwise, ison will be empty.

ison=`who | grep lemay`

Test the result and return the appropriate message as part of the script output:

if [ ! -z "$ison" ]; then echo "<P>Laura is logged in."</P> else echo "<P>Laura isn't logged in."</P> fi

Finally, close up the remainder of the HTML tags:

echo "</BODY></HTML>"

And that's it. If you ran the program by itself from a command line to test its output, you would get a result that says I'm not logged into your system, something like this (unless, of course, I am logged into your system):

Content-type: text/html <HTML><HEAD> <TITLE>Are You There?</TITLE> </HEAD><BODY> <P>Laura is not logged in. </BODY></HTML>

Looks like your basic HTML document, doesn't it? That's precisely the point. The output from your script is what is sent back to the server and then out to the browser, so it should be in a format the server and browser can understand-here, an HTML file.

Now, install this script in the proper place for your server. This step will vary depending on the platform you're on and the server you're using. Most of the time, on UNIX servers, there will be a special cgi-bin directory for scripts. Copy the script there and make it executable.

Note

If you don't have access to the cgi-bin directory, you must ask your Web server administrator for access. You cannot just create a cgi-bin directory and copy the script there; that won't work. See your Webmaster.

Now that you've got a script ready to go, you can call it from a Web page by linking to it, as I mentioned earlier. Just for reference, here's what the final script looks like:

#!/bin/sh echo "Content-type: text/html" echo echo "<HTML><HEAD>" echo "<TITLE>Is Laura There?</TITLE>" echo "</HEAD><BODY>" ison=`who | grep lemay` if [ ! -z "$ison" ]; then echo "<P>Laura is logged in" else echo "<P>Laura isn't logged in" fi echo "</BODY></HTML>"

Scripts with Arguments

CGI scripts are most useful if they're written to be as generic as possible. For example, if you want to check whether different people are logged into the system using the script in the previous example, you might have to write several different scripts (pinglaura, pingeric, pingelsa, and so on). It would make more sense to have a single generic script, and then send the name you want to check for as an argument to the script.

To pass arguments to a script, specify those arguments in the script's URL with a question mark (?) separating the name of the script from the arguments, and with plus signs (+) separating each individual argument, like this:

<A HREF="/cgi-bin/myscript?arg1+arg2+arg3">run my script</A>

When the server receives the script request, it passes arg1, arg2, and arg3 to the script as arguments. You can then parse and use those arguments in the body of the script.

This method of passing arguments to a script is sometimes called a query, because it is how browsers communicated search keys in an earlier version of searches called ISINDEX searches (you'll learn more about these later on). These days, most searches are done using forms, but this form of encoding arguments is still used; you should be familiar with it if you use CGI scripts often.

Exercise 19.2: Check to see whether anyone is logged in.

Now that you know how to pass arguments to a script, let's modify the pinglaura script so that it is more generic. We'll call this script pinggeneric.

Start with the beginning of the script we used in the previous example, with a slightly different title:

#!/bin/sh echo "Content-type: text/html" echo echo "<HTML><HEAD>" echo "<TITLE>Are You There?</TITLE>" echo "</HEAD><BODY>"

In the previous example, the next step was to test whether I was logged on. Here's where the script becomes generic. Instead of the name lemay hardcoded into the script, use ${1} instead, with ${1} as the first argument, ${2} as the second, ${3} as the third, and so on.

ison=`who | grep "${1}"`

Note

Why the extra quotes around the ${1}? That's to keep nasty people from passing weird arguments to your script. It's a security issue that I'll explain in greater detail in Chapter 28, "Web Server Security and Access Control."

All that's left is to modify the rest of the script to use the argument instead of the hardcoded name:

if [ ! -z "$ison" ]; then echo "<P>$1 is logged in" else echo "<P>$1 isn't logged in" fi

Now finish up with the closing <HTML> tag:

echo "</BODY></HTML>"

With the script complete, let's modify the HTML page that calls that script. The pinglaura script was called with an HTML link, like this:

<A HREF="http://www.lne.com/cgi-bin/pinglaura">Is Laura Logged in?</A>

The generic version is called in a similar way, with the argument at the end of the URL, like this (this one tests for someone named John):

<A HREF="http://www.lne.com/cgi-bin/pinggeneric?john">Is John Logged in?</A>

Try it on your own server, with your own login ID in the URL for the script to see what kind of result you get.

Passing Other Information to the Script

In addition to the arguments passed to a script through query arguments, there is a second way of passing information to a CGI script (that still isn't forms). The second way is called path information and is used for arguments that can't change between invocations of the script, such as the name of a temporary file or the name of the file that called the script itself. As you'll see in the section on forms, the arguments after the question mark can indeed change based on input from the user. Path info is used for other information to be passed for the script (and indeed, you can use it for anything you want).

New Term

Path information is a way of passing extra information to a CGI script that are not as frequently changed as regular script arguments. Path information often refers to files on the Web server such as configuration files, temporary files, or the file that actually called the script in question.

To use path information, append the information you want to include to the end of the URL for the script, after the script name but before the ? and the rest of the arguments, as in the following example:

http://myhost/cgi-bin/myscript/remaining_path_info?arg1+arg2

When the script is run, the information in the path is placed in the environment variable PATH_INFO. You can then use that information any way you want to in the body of your script.

For example, let's say you had multiple links on multiple pages to the same script. You could use the path information to indicate the name of the HTML file that had the link. Then, after you've finished processing your script, when you send back an HTML file, you could include a link in that file back to the page that your reader came from.

You'll learn more about path information in Chapter 20, "Useful Forms and Scripts," when we work through a "guestbook" example that employs path information.

Creating Special Script Output

In the couple of examples you've created so far in this chapter, you've written scripts that output data, usually HTML data, and that data is sent to the browser for interpretation and display. But what if you don't want to send a stream of data as a result of a script's actions? What if you want to load an existing page instead? What if you just want the script to do something and not give any response back to the browser?

Fear not, you can do those things in CGI scripts. This section explains how.

Responding by Loading Another Document

CGI output doesn't have to be a stream of data. Sometimes it's easier just to tell the browser to go to another page you have stored on your server (or on any server, for that matter). To send this message, you use a line similar to the following:

Location: ../docs/final.html

The Location line is used in place of the normal output; that is, if you use Location, you do not need to use Content-type or include any other data in the output (and, in fact, you can't include any other data in the output). As with Content-type, however, you must also include a blank line after the Location line.

The pathname to the file can be either a full URL or a relative pathname. All relative pathnames will be relative to the location of the script itself. This one looks for the document final.html in a directory called docs one level up from the current directory:

echo Location: ../docs/final.html echo

You Can

You cannot combine Content-type and Location output. For example, if you want to output a standard page and then add custom content to the bottom of that same page, you'll have to use Content-type and construct both parts yourself. Note that you could use script commands to open up a local file and print it directly to the output; for example, cat filename would send the contents of the file filename as data.

No Response

Sometimes it may be appropriate for a CGI script to have no output at all. Sometimes you just want to take the information you get from the reader. You may not want to load a new document, either by outputting the result or by opening an existing file. The document that was on the browser's screen before should just stay there.

Fortunately, doing this is quite easy. Instead of outputting a Content-type or Location header, use the following commands instead (with a blank line after it, as always):

echo Status: 204 No Response echo

The Status header provides status codes to the server (and to the browser). The particular status of 204 is passed on to the browser, and the browser, if it can figure out what to do with it, should do nothing.

You'll need no other output from your script since you don't want the browser to do anything with it-just the one Status line with the blank line. Of course, your script should do something; otherwise, why bother calling the script at all?

Note

Although No Response is part of the official HTTP specification, it may not be supported in all browsers or may produce strange results. Before using a No Response header, you might want to experiment with several different browsers to see what the result will be.

Scripts To Process Forms

Most uses of CGI scripts these days are for processing form input. Calling a CGI script directly from a link can execute only that script with the hardcoded arguments. Forms allow any amount of information to be entered by the reader of the form, sent back to the server, and processed by a CGI script. They're the same scripts, and they behave in the same ways. You still use Content-type and Location headers to send a response back to the browser. However, there are a few differences, including how the CGI script is called and how the data is sent from the browser to the server.

Remember, most forms have two parts: the HTML layout for the form and the CGI script to process that form's data. The CGI script is called using attributes to the <FORM> tag.

Form Layout and Form Scripts

As you learned yesterday, every form you see on the Web has two parts: the HTML code for the form, which is displayed in the browser, and the script to process the contents of that form, which runs on the server. They are linked together in the HTML code.

The ACTION attribute inside the <FORM> tag contains the name of the script to process the form:

<FORM ACTION="http://www.myserver.com/cgi-bin/processorscript">

In addition to this reference to the script, each input field in the form (a text field, a radio button, and so on) has a NAME attribute, which names that form element. When the form data is submitted to the CGI script you named in ACTION, the names of the tags and the contents of that field are passed to the script as name/value pairs. In your script you can then get to the contents of each field (the value) by referring to that field's name.

`GET` and `POST`

One part of forms I didn't mention yesterday (except in passing) was the METHOD attribute. METHOD indicates the way the form data will be sent from the browser to the server to the script. METHOD has one of two values, GET and POST.

GET is just like the CGI scripts you learned about in the previous section. The form data is packaged and appended to the end of the URL you specified in the ACTION attribute as argument. So, if your action attribute looks like this:

ACTION="/cgi/myscript"

and you have the same two input tags as in the previous section, the final URL sent by the browser to the server when the form is submitted might look like this:

http://myhost/cgi-bin/myscript?username=Agamemnon&phone=555-6666

Note that this formatting is slightly different than the arguments you passed to the CGI script by hand; this format is called URL encoding and is explained in more detail later in this chapter.

When the server executes your CGI script to process the form, it sets the environment variable QUERY_STRING to everything after the question mark in the URL.

POST does much the same thing as GET, except that it sends the data separately from the actual call to the script. Your script then gets the form data through the standard input. (Some Web servers might store it in a temporary file instead of using standard input; UNIX servers to the latter.) The QUERY_STRING environment variable is not set if you use POST.

Which one should you use? POST is the safest method, particularly if you expect a lot of form data. When you use GET, the server assigns the QUERY_STRING variable to all the encoded form data, and there might be limits on the amount of data you can store in that variable. In other words, if you have lots of form data and you use GET, you might lose some of that data.

If you use POST, you can have as much data as you want, because the data is sent as a separate stream and is never assigned to a variable.

URL Encoding

URL encoding is the format that the browser uses to package the input to the form when it sends it to the server. The browser gets all the names and values from the form input, encodes them as name/value pairs, translates any characters that won't transfer over the wire, lines up all the data, and-depending on whether you're using GET or POST-sends them to the server either as part of the URL or separately through a direct link to the server. In either case, the form input ends up on the server side (and therefore in your script) as gobbledygook that looks something like this:

theName=Ichabod+Crane&gender=male&status=missing&headless=yes

URL encoding follows these rules:

Each name/value pair itself is separated by an ampersand (&).
The name/value pairs from the form are separated by an equal sign (=). In cases when the user of the form did not enter a value for a particular tag, the name still appears in the input, but with no value (as in "name=").
Any special characters (characters that are not simple seven-bit ASCII) are encoded in hexadecimal preceded by a percent sign (%NN). Special characters include the =, &, and % characters if they appear in the input itself.
Spaces in the input are indicated by plus signs (+).

Because form input is passed to your script in this URL-encoded form, you'll have to decode it before you can use it. However, because decoding this information is a common task, there are lots of tools for doing just that. There's no reason for you to write your own decoding program unless you want to do something very unusual. The decoding programs that are out there can do a fine job, and they might consider things that you haven't, such as how to avoid having your script break because someone gave your form funny input.

I've noted a few programs for decoding form input later on in this chapter, but the program I'm going to use for the examples in this book is called uncgi, which decodes the input from a form submission for you and creates a set of environment variables from the name/value pairs. Each environment variable has the same name as the name in the name/value pair, with the prefix WWW_ prepended to each one. Each value in the name/value pair is then assigned to its respective environment variable. So, for example, if you had a form with a name in it called username, the resulting environment variable uncgi created would be WWW_username, and its value would be whatever the reader typed in that form element. Once you've got the environment variables, you can test them just as you would any other variable.

You can get the source for uncgi from http://www.hyperion.com/~koreth/uncgi.html. Compile uncgi using the instructions that come with the source, install it in your cgi-bin directory, and you're ready to go.

Exercise 19.3: Tell me your name, Part 2.

Remember the form you created yesterday that prompts you for your name? Let's create the script to handle that form (the form is shown again in Figure 19.5, in case you've forgotten). Using this form, you would type in your name and submit the form using the Submit button.

Figure 19.5 : The Tell Me Your Name form.

The input is sent to the script, which sends back an HTML document that displays a hello message with your name in it (see Figure 19.6).

Figure 19.6 : The result of the name form.

What if you didn't type anything at the Enter your Name prompt? The script would send you the response shown in Figure 19.7.

Figure 19.7 : Another result.

Modify the HTML for the Form

In the examples yesterday, we used a testing program called post-query as the script to call in the ACTION attribute to the <FORM> tag. Now that we're working with real scripts, we'll modify the form so that it points to a real CGI script. The value of ACTION can be a full URL or a relative pathname to a script on your server. So, for example, the following <FORM> tag would call a script called form-name in a cgi-bin directory one level up from the current directory:

<FORM METHOD=POST ACTION="../cgi-bin/form-name"> </FORM>

If you're using uncgi to decode form input, as I am in these examples, things are slightly different. To make uncgi work properly, you call uncgi first, and then append the name of the actual script as if uncgi were a directory, like this:

<FORM METHOD=POST ACTION="../cgi-bin/uncgi/form-name"> </FORM>

Other than this one modification, you don't need to modify the HTML for the form at all; the original HTML code works just fine. Let's move onto the script to process the form.

The Script

The script to process the form input is a CGI script, just like the ones you've been creating up to this point in the chapter. All the same rules apply for Content-type headers and passing the data back to the browser.

The first step in a form script is usually to decode the information that was passed to your script through the POST method. In this example however, because we're using uncgi to decode form input, the form decoding has already been done for you. Remember how you put uncgi in the ACTION attribute to the form, followed by the name of your script? What happens there is that when the form input is submitted, the server passes that input to the uncgi program, which decodes the form input for you, and then calls your script with everything already decoded. Now, at the start of your script, all the name/value pairs are there for you to use.

Moving on, print out the usual CGI headers and HTML code to begin the page:

echo Content-type: text/html echo echo "<HTML><HEAD>" echo "<TITLE>Hello</TITLE>" echo "</HEAD><BODY>" echo "<P>"

Now comes the meat of the script. You have two branches to deal with: one to accuse the reader of not entering a name, and one to say hello when they do.

The value of the theName element, as you named the text field in your form, is contained in the WWW_theName environment variable. Using a simple Bourne shell test (-z), you can see if this environment variable is empty and include the appropriate response in the output:

if [ ! -z "$WWW_theName" ]; then echo "Hello, " echo $WWW_theName else echo "You don't have a name?" fi

Finally, add the last bit of HTML code to include the "go back" link. This link points back to the URL of the original form (here, called name1.html, in a directory one level up from cgi-bin:

echo "</P><P><A HREF="../lemay/name1.html">Go Back</A></P>" echo "</BODY></HTML>"

And that's it! That's all there is to it. Learning how to do CGI scripts is the hard part; linking them together with forms is easy. Even if you're confused and don't quite have it, bear with me; there are lots more examples to look at and work through tomorrow.

Troubleshooting

Here are some of the most common problems with CGI scripts and how to fix them:

The content of the script is being displayed, not executed.
Have you configured your server to accept CGI scripts? Are your scripts contained in the appropriate CGI directory (usually cgi-bin)? If your server allows CGI files with .cgi extensions, does your script have that extension?
Error 500: Server doesn't support POST.
You'll get this error from forms that use the POST method. This error most often means that you either haven't set up CGI scripts in your server, or you're trying to access a script that isn't contained in a CGI directory (see the previous bullet).
It can also mean, however, that you've misspelled the path to the script itself. Check the pathname in your form, and if it's correct, make sure that your script is in the appropriate CGI directory (usually cgi-bin) and that it has a .cgi extension (if your server allows this).
Document contains no data.
Make sure you included a blank line between your headers and the data in your script.
Error 500: Bad Script Request.
Make sure your script is executable (on UNIX, make sure you've done chmod +x yourscript.cgi to the script). You should be able to run your scripts from a command line before you try to call them from a browser.

CGI Variables

CGI variables are a set of special variables that are set in the environment when a CGI script is called. All of these variables are available to you in your script to use as you see fit. Table 19.2 summarizes these variables.

Table 19.2. CGI environment variables.

Environment Variable	What It Means
`SERVER_NAME`	The hostname or IP address on which the CGI script is running, as it appears in the URL.
`SERVER_SOFTWARE`	The type of server you are running: for example, CERN/3.0 or NCSA/1.3.
`GATEWAY_INTERFACE`	The version of CGI running on the server. For UNIX servers, this should be CGI/1.1.
`SERVER_PROTOCOL`	The HTTP protocol the server is running. This should be HTTP/1.0.
`SERVER_PORT`	The TCP port on which the server is running. Usually port 80 for Web servers.
`REQUEST_METHOD`	`POST` or `GET`, depending on how the form was submitted.
`HTTP_ACCEPT`	A list of Content-types the browser can accept directly, as defined by the HTTP Accept header.
`HTTP_USER_AGENT`	The browser that submitted the form information. Browser information usually contains the browser name, the version number, and extra information about the platform or extra capabilities.
`HTTP_REFERER`	The URL of the document that this form submission came from. Not all browsers send this value; do not rely on it.
`PATH_INFO`	Extra path information, as sent by the browser using the query method of `GET` in a form.
`PATH_TRANSLATED`	The actual system-specific pathname of the path contained in `PATH_INFO`.
`SCRIPT_NAME`	The pathname to this CGI script, as it appears in the URL (for example, `/cgi-bin/thescript`).
`QUERY_STRING`	The arguments to the script or the form input (if submitted using `GET`). `QUERY_STRING` contains everything after the question mark in the URL.
`REMOTE_HOST`	The name of the host that submitted the script. This value cannot be set.
`REMOTE_ADDR`	The IP address of the host that submitted the script.
`REMOTE_USER`	The name of the user that submitted the script. This value will be set only if server authentication is turned on.
`REMOTE_IDENT`	If the Web server is running `ident` (a protocol to verify the user connecting to you), and the system that submitted the form or script is also running `ident`, this variable contains the value returned by `ident`.
`CONTENT_TYPE`	In forms submitted with `POST`, the value will be `application/x-www-form-urlencoded`. In forms with file upload, content-type will be `multipart/form-data`.
`CONTENT_LENGTH`	For forms submitted with `POST`, the number of bytes in the standard input.

Programs To Decode Form Input

The one major difference between a plain CGI script and a CGI script that processes a form is that, because you get data back from the form in URL-encoded format, you need a method of decoding that data. Fortunately, because everyone who writes a CGI script to process a form needs to do this, programs exist to do it for you and to decode the name/value pairs into something you can more easily work with. I like two programs: uncgi for general-purpose use, and cgi-lib.pl, a Perl library for use when you're writing CGI scripts in Perl. You can, however, write your own program if the ones I've mentioned here aren't good enough.

Programs also exist to decode data sent from form-based file uploads, although there are fewer of them. At the end of this section, I mention a few that I've found.

`uncgi`

Steven Grimm's uncgi is a program written in C that decodes form input for you. You can get information and the source to uncgi from http://www.hyperion.com/~koreth/uncgi.html.

To use uncgi, it's best to install it in your cgi-bin directory. Make sure you edit the makefile before you compile the file to point to the location of that directory on your system so that it can find your scripts.

To use uncgi in a form, you'll have to slightly modify the ACTION attribute in the FORM tag. Instead of calling your CGI script directly in ACTION, you call uncgi with the name of the script appended. So, for example, if you had a CGI script called sleep2.cgi, the usual way to call it would be this:

<FORM METHOD=POST ACTION="http://www.myserver.com/cgi-bin/sleep2.cgi">

If you were using uncgi, you would do this:

<FORM METHOD=POST ACTION=" http://www.myserver.com/cgi-bin/uncgi/sleep2.cgi">

Note

The uncgi program is an excellent example of how path information is used. The uncgi script uses the name of the actual script from the path information to know which script to call.

The uncgi program reads the form input from either the GET or POST input (it figures out which automatically), decodes it, and creates a set of variables with the same names as the values of each NAME attribute, with WWW_ prepended to them. So, for example, if your form contained a text field with the name theName, the uncgi variable containing the value for theName would be WWW_theName.

If there are multiple name/pairs in the input with the same name, uncgi creates only one environment variable with the individual values separated by hash signs (#). For example, if the input contains the name/value pairs shopping=butter, shopping=milk, and shopping=beer, the resulting WWW_shopping environment variable contains butter#milk#beer. It is up to you in your script to handle this information properly.

`cgi-lib.pl`

The cgi-lib.pl package, written by Steve Brenner, is a set of routines for the Perl language to help you manage form input. It can take form input from GET or POST and put it in a Perl list or associative array. Newer versions can also handle file upload from forms. You can get information about (and source for) cgi-lib.pl from http://www.bio.cam.ac.uk/cgi-lib. If you decide to use the Perl language to handle your form input, cgi-lib.pl is a great library to have.

To use cgi-lib.pl, retrieve the source from the URL listed in the previous paragraph and put it in your Perl libraries directory (often /usr/lib/perl). Then in your Perl script itself, use the following line to include the subroutines from the library in your script:

require 'cgi-lib.pl';

Although there are several subroutines in cgi-lib.pl for managing forms, the most important one is the ReadParse subroutine. ReadParse reads either GET or POST input and conveniently stores the name/value pairs as name/value pairs in a Perl associative array. It's usually called in your Perl script something like this:

&ReadParse(*in);

In this example, the name of the array is in, but you can call it anything you want to.

Then, after the form input is decoded, you can read and process the name/value pairs by accessing the name part in your Perl script like this:

print $in{'theName'};

This particular example just prints out the value of the pair whose name is theName.

If there are multiple name/pairs with the same name, cgi-lib.pl separates the multiple values in the associative array with null characters (\0). It's up to you in your script to handle this information properly.

Decoding File Upload Input

Because form-based file upload is a newer feature requiring a different kind of form input, there are few programs that will decode the input you get back from a form used to upload local files.

Recent versions of cgi-lib.pl handle file uploads very nicely, encoding them into associative arrays without the need to do anything extra to deal with them. See the home page for cgi-lib.pl at http://www.bio.cam.ac.uk/cgi-lib/ for more information.

Another library for handling CGI data in Perl 5, CGI.pl, also deals with file uploads. See http://valine.ncsa.uiuc.edu/cgi_docs.html for details.

Doing It Yourself

Decoding form input is the sort of task that most people will want to leave up to a program such as the ones I've mentioned in this section. But, in case you don't have access to any of these programs, if you're using a system that these programs don't run on, or you feel you can write a better program, here's some information that will help you write your own.

The first thing your decoder program should check for is whether the form input was sent via the POST or GET method. Fortunately, this is easy. The CGI environment variable REQUEST_METHOD, set by the server before your program is called, indicates the method and tells you how to proceed.

If the form input is sent to the server using the GET method, the form input will be contained in the QUERY_STRING environment variable.

If the form input is sent to the server using the POST method, the form input is sent to your script through the standard input. The CONTENT_LENGTH environment variable indicates the number of bytes that the browser submitted. In your decoder, you should make sure you only read the number of bytes contained in CONTENT_LENGTH and then stop. Some browsers might not conveniently terminate the standard input for you.

A typical decoder script performs the following steps:

Separate the individual name/value pairs (separated by &).
Separate the name from the value (separated by =).
If there are multiple name keys with different values, you should have some method of preserving all those values.
Replace any plus signs with spaces.
Decode any hex characters (%NN) to their ASCII equivalents on your system.

Interested in decoding input from file uploads? The rules are entirely different. In particular, the input you'll get from file uploads conforms to MIME multipart messages, so you'll have to deal with lots of different kinds of data. If you're interested, you'll want to see the specifications for file upload, which will explain more. See those specifications at ftp://ds.internic.net/rfc/rfc1867.txt.

Nonparsed Headers Scripts

If you followed the basic rules outlined in this section for writing a CGI script, the output of your script (headers and data, if any) will be read by the server and sent back to the browser over the network. In most cases, this will be fine because the server can then do any checking it needs to do and add its own headers to yours.

In some cases, however, you might want to bypass the server and send your output straight to the browser: for example, to speed up the amount of time it takes for your script output to get back to the browser, or to send data back to the browser that the server might question. For most forms and CGI scripts, however, you won't need a script that does this.

CGI scripts to do this are called NPH (non-processed headers) scripts. If you do need an NPH script, you'll need to modify your script slightly:

The script should have an nph- prefix: for example, nph-pinglaura or nph-fixdata.
Your script must send extra HTTP headers instead of just the Content-type, Location, or Status headers.

The headers are the most obvious change you'll need to make to your script. In particular, the first header you output should be an HTTP/1.0 header with a status code, like this:

HTTP/1.0 200 OK

This header with the 200 status code means "everything's fine, the data is on its way." Another status code could be

HTTP/1.0 204 No Response

As you learned earlier in this section, this means that there is no data coming back from your script, and so the browser should not do anything (such as try to load a new page).

A second header you should probably include is the Server header. There is some confusion over whether this header is required, but it's probably a good idea to include it. After all, by using an NPH script you're trying to pretend you are a server, so including it can't hurt.

The Server header simply indicates the version of the server you're running, as in the following example:

Server: NCSA/1.3 Server: CERN/3.0pre6

After including these two headers, you must also include any of the other headers for your script, including Content-type or Location. The browser still needs this information in order to know how to deal with the data you're sending it.

Again, most of the time you won't need NPH scripts; the normal CGI scripts should be
just fine.

`<ISINDEX>` Scripts

To finish off the discussion on CGI, let's talk about what are called <ISINDEX> searches. The use of the <ISINDEX> tag was the way browsers sent information (usually search keys) back to the server in the early days of the Web. <ISINDEX> searches are all but obsolete these days because of forms; forms are much more flexible both in layout and with different form elements, and also in the scripts you use to process them. But since I'm a completist, I'll include a short description of how ISINDEX searches work here as well.

<ISINDEX> searches are CGI scripts that take arguments, just like the scripts you wrote earlier in this chapter to find out if someone was logged in. The CGI script for an <ISINDEX> search operates in the following ways:

If the script is called with no arguments, the HTML that is returned should prompt the reader for search keys. Use the <ISINDEX> tag to provide a way for the reader to enter them (remember, this was before there were forms).
When the reader submits the search keys, the ISINDEX script is called again with the search keys as the arguments, which are appended to the URLs as they would be if you had included them in a link. Your ISINDEX script then operates on those arguments in some way, returning the appropriate HTML file. Just as you would pass arguments to CGI scripts through links, you can get to <ISINDEX> search keys using $1, $2, and so on in a UNIX shell script.

The core of the <ISINDEX> searches is the <ISINDEX> tag. It is a special HTML tag used for these kinds of searches. It doesn't enclose any text, nor does it have a closing tag.

So what does <ISINDEX> do? It "turns on" searching in the browser that is reading this document. Depending on the browser, this may involve enabling a search button in the browser itself (see Figure 19.8). For newer browsers, it may involve including an input field on the page (see Figure 19.9). The reader can then enter a string to search for, and then press Enter or click on the button to submit the query to the server.

Figure 19.8 : A search prompt in the browser Window.

Figure 19.9 : A search prompt on the page itself.

According to the HTML 2.0 specification, The <ISINDEX> tag should go inside the <HEAD> part of the HTML document (it's one of the few tags that goes into <HEAD>, <TITLE> being the other obvious example). In older browsers, where there was a single location for the search prompt, this made sense because neither the search prompt nor the <ISINDEX> tag was actually part of the data of the document. However, because more recent browsers display the input field on the HTML page itself, it is useful to be able to put <ISINDEX> in the body of the document so that you can control where on the page the input field appears (if it's in the <HEAD>, it'll always be the first thing on the page). Most browsers will now accept an <ISINDEX> tag anywhere in the body of an HTML document and will draw the input box wherever that tag appears.

Finally, there is an HTML extension to the <ISINDEX> tag that allows you to define the search prompt. Again, in older browsers, the search prompt was fixed (it was usually something confusing like "This is a Searchable index. Enter keywords"). The new PROMPT attribute to <ISINDEX> allows you to define the string that will be used to indicate the input field, as in the following code for example. Figure 19.10 shows the result of this tag in Netscape.

Figure 19.10 : A Netscape search prompt.

<P> To search for a student in the online directory, enter the name (last name first): <ISINDEX PROMPT="Student's name: ">

<ISINDEX> is useful only in the context of an ISINDEX search. Although you can put it into any HTML document, it won't do anything unless it was a CGI script that generated that HTML page to begin with.

Most of the time creating HTML forms is a far easier way of prompting the user for information.

Summary

CGI scripts, sometimes called server-side scripts or gateway scripts, make it possible for programs to be run on the server, and HTML or other files to be generated on-the-fly.

In this chapter, you reviewed all the basics of creating CGI scripts: both simple scripts and scripts to process forms, including the special headers you use in your scripts; the difference between GET and POST in form input; and how to decode the information you get from the form input. Plus you learned some extras about path information, URL encoding, <ISINDEX> searches, and the various CGI variables you can use in your CGI scripts. From here, you should be able to write CGI scripts to accomplish just about anything.

Q&A

Q	What if I don't know how to program? Can I still use CGI scripts?
A	If you have your access to a Web server through a commercial provider, you may be able to get help from the provider with your CGI scripts (for a fee, of course). Also, if you know even a little programming, but you're unsure of what you're doing, there are many examples available for the platform and server you're working with. Usually these examples are either part of the server distribution or at the same FTP location. See the documentation that came with your server; it often has pointers to further help. In fact, for the operation you want to accomplish, there may already be a script you can use with only slight modification. But be careful; if you don't know what you're doing, you can rapidly get in over your head, or end up creating scripts with security holes that you don't know about.
Q	My Web server has a `cgi-bin` directory, but I don't have access to it. So I created by own `cgi-bin` directory and put my script there, but calling it from my Web pages didn't work. What did I do wrong?
A	Web servers must be specially configured to run CGI scripts, and usually that means indicating specific directories or files that are meant to be scripts. You cannot just create a directory or a file with a special extension without knowing how your Webmaster has set up your server; most of the time you'll guess wrong and your scripts will not work. Ask your Webmaster for help with installing your scripts.
Q	My Webmaster tells me I can just create a `cgi-bin` directory in my home directory, install my scripts there, and then call them using a special URL called `cgiwrap`. You haven't mentioned this way of having personal `cgi-bin` directories.
A	`cgiwrap` is a neat program that provides a secure wrapper for CGI scripts and allows users of public UNIX systems to have their own personal CGI directories. However, your Webmaster has to specifically set up and configure `cgiwrap` for your server before you can use it. If your Webmaster has allowed the use of `cgiwrap`, congratulations! CGI scripts will be easy for you to install and use. If you are a Webmaster and you're interested in finding out more information, check out `http://wwwcgi.umr.edu/~cgiwrap/` for more information.
Q	My scripts aren't working!
A	Did you look in the section on troubleshooting for the errors you're getting and the possible solutions? I covered most of the common problems you might be having in that section.
Q	My Web provider won't give me access to `cgi-bin` at all. No way, no how. I really want to use forms. Is there any way at all I can do this?
A	There is one way; it's called a Mailto form. Using Mailto forms, you use a Mailto URL with your e-mail address in the `ACTION` part of the form, like this: `<FORM METHOD=POST ACTION="mailto:lemay@lne.com"> ... </FORM>` Then, when the form is submitted by your reader, the contents of the form will be sent to you via e-mail (or at least they will if you include your e-mail address in the `mailto` instead of mine). No server scripts are required to do this. There are a few major problems with this solution, however. The first is that the e-mail you get will have all the form input in encoded form. Sometimes you may be able to read it anyhow, but it's messy. To get around URL encoding, there are special programs created just for Mailto forms that will decode the input for you. Check out `http://homepage.interaccess.com/~arachnid/mtfinfo.html` for more information. The second problem with Mailto forms is that they don't give any indication that the form input has been sent. There's no page to send back saying "Thank you, I got your form input." Your readers will just click Submit, and nothing will appear to happen. Because there's no feedback to your readers, they may very well submit the same information to you repeatedly. It might be useful to include a warning to your readers on the page itself to let them know that they won't get any feedback. The third problem with Mailto forms is that they are not supported by all browsers, so your forms may not work for everyone who reads your page. Most of the major commercial browsers do support Mailto forms, however.
Q	I'm writing a decoder program for form input. The last `name=value` pair in my list keeps getting all this garbage stuck to the end of it.
A	Are you reading only the number of bytes indicated by the `CONTENT_LENGTH` environment variable? You should test for that value and stop reading when you reach the end, or you might end up reading too far. Not all browsers will terminate the standard input for you.

Day 10

Chapter 19

Beginning CGI Scripting

CONTENTS

Exercise 19.1: Try it.

Exercise 19.2: Check to see whether anyone is logged in.

Exercise 19.3: Tell me your name, Part 2.

Modify the HTML for the Form

The Script