CGI Output

Every CGI script must print a header line, which the server uses to build the full HTTP headers of its response. If your CGI script produces invalid headers or no headers, the web server will generate a valid response for the client -- generally a Internal Server Error message.

Your CGI has the option of displaying full or partial headers. By default, CGI scripts should return only partial headers.

Partial Headers

CGI scripts must output one of the following three headers:

Let's review each of these options.

Outputting documents

The most common response for CGI scripts is to return HTML. A script must indicate to the server the media type of content it is returning prior to outputting any content. This is why all of the CGI scripts you have seen in the previous examples contained the following line:

print "Content-type: text/html\n\n";

You can send other HTTP headers from a CGI script, but this header field is the minimum necessary in order to output a document. HTML documents are by no means the only form of media type that may be outputted by CGI scripts. By specifying a different media type, you can output any type of document that you can imagine. For example, Example 3-4 later in this chapter shows how to return a dynamic image.

The two newlines at the end the Content-type header tell the web server that this is the last header line and that subsequent lines are part of the body of the message. This correlates to the extra CRLF that we discussed in the last chapter, which separates HTTP headers from the content body (see the upcoming sidebar, the sidebar "Line Endings").

Line Endings

Many operating systems use different combinations of line feeds and carriage returns to represent the end of a line of text. Unix systems use a line feed; Macintosh systems use a carriage return; and Microsoft systems use both a carriage return and a line feed, often abbreviated as CRLF. HTTP headers require a CRLF as well -- each header line must end with a carriage return and a line feed.

In Perl (on Unix), a line feed is represented as "n", and a carriage return is represented as "r". Thus, you may wonder why our previous examples have included this:

print "Content-type: text/html\n\n";

and not this:

print "Content-type: text/html\r\n\r\n";

The second format would work, but only if your script runs on Unix. Because Perl both began on Unix and has become a cross-platform language, printing "\n" in a script will always output the operating system's default line ending.

There is a simple solution. CGI requires that the web server translate your operating system's conventional line ending into a CRLF for you. Thus for the sake of portability, it is always best practice to print a simple line feed ("n"): Perl will output the operating system's default line ending, and the web server will automatically convert this to the CRLF required by HTTP.

Forwarding to another URL

Sometimes, it's not necessary to build an HTML document with your CGI script. In fact, unless the output varies from one visit to another, it is a good idea to create a simple, static HTML page (in addition to the CGI script), and forward the user to that page by using the Location header. Why? Interface changes are far more common than program logic changes, and it is much easier to reformat an HTML page than to make changes to a CGI script. Plus, if you have multiple CGI scripts that return the same message, then having them all forward to a common document reduces the number of resources you need to maintain. Finally, you get better performance. Perl is fast, but your web server will always be faster. It's a good idea to take advantage of any opportunity you have to shift work from your CGI scripts to your web server.

To forward a user to another URL, simply print the Location header with the URL to the new location:

print "Location: static_response.html\n\n";

The URL may be absolute or relative. An absolute URL or a relative URL with a relative path is sent back to the browser, which then creates another request for the new URL. A relative URL with a full path produces an internal redirect. An internal redirect is handled by the web server without talking to the browser. It gets the contents of the new resource as if it had received a new request, but it then returns the content for the new resource as if it is the output of your CGI script. This avoids a network response and request; the only difference to users is a faster response. The URL displayed by their browser does not change for internal redirects; it continues to show the URL of the original CGI script. See Figure 3-4 for a visual display of server redirection. Figure 3-4

Figure 3-4. Server redirection

When redirecting to absolute URLs, you may include a Content-type header and content body for the sake of older browsers, which may not forward automatically. Modern browsers will immediately fetch the new URL without displaying this content.

Specifying status codes

The Status header is different than the other headers because it does not map directly to an HTTP header, although it is associated with the status line. This field is used only to exchange information between the CGI script and the web server. It specifies the status code the server should include in the status line of the request. This field is optional: if you do not print it, the web server will automatically add a status of OK to your output if you print a Content-type header, and a status of Found if you print a Location header.

If you do print a status code, you are not bound to use the status code's associated message, but you should not try to use a status code for something other than for which it was intended. For example, if your CGI script must connect to a database in order to generate its output, you might return Database Unavailable if the database has no free connections The standard error message for messages is Service Unavailable , so our database message is an appropriately similar use of this status code.

Whenever you return an error status code, you should also return a Content-type header and a message body describing the reason for the error in human terms. Some browsers provide their own messages to users when they receive status codes indicating an error, but most do not. So unless you provide a message, many users will get an empty page or a message telling them "The document contains no data." If you don't want to admit to having a problem, you can always fall back to the ever-popular slogan, "The system is currently unavailable while we perform routine maintenance."

Here is the code to report our database error:

print <<END_OF_HTML; Status: 503 Database Unavailable Content-type: text/html <HTML>
<HEAD><TITLE>503 Database Unavailable</TITLE></HEAD>
<BODY>
<H1>Error</H1>
<P>Sorry, the database is currently not available. Please try again later.</P>
</BODY>
</HTML> END_OF_HTML

Below is a short description of the common status headers along with when (and whether) to use them in your CGI scripts:

We list these status codes here to be complete, but keep in mind that you do not have to print your own status code, even for errors. Although sending a status code to report an error might be the most appropriate action according to the HTTP protocol, you may prefer to simply redirect users to a help page or return a summary of the error as normal output (with a OK status).

Complete (Non-Parsed) Headers

Thus far, all the CGI scripts that we've discussed simply return partial header information. We leave it up to the server to fill in the other headers and return the document to the browser. We don't have to rely on the server though. We can also develop CGI scripts that generate a complete header.

CGI scripts that generate their own headers are called nph (non-parsed headers) scripts. The server must know in advance whether the particular CGI script intends to return a complete set of headers. Web servers handle this differently, but most recognize CGI scripts with a nph- prefix in their filename.

When sending complete headers, you must at least send the status line plus the Content-type and Server headers. You must print the entire status line; you should not print the Status header. As you will recall, the status line includes the protocol and version string (e.g., "HTTP/1.1"), but as you should recall, CGI provides this to you in the environment variable SERVER_PROTOCOL. Always use this variable in your CGI scripts, instead of hardcoding it, because the version in the SERVER_PROTOCOL may vary for older clients.

Example 3-3 provides a simple example that illustrates nph scripts.

Example 3-3. nph-count.cgi



#!/usr/bin/perl -wT use strict;
print "$ENV{SERVER_PROTOCOL} 200 OK\n";
print "Server: $ENV{SERVER_SOFTWARE}\n";
print "Content-type: text/plain\n\n";
print "OK, starting time consuming process ... \n"; # Tell Perl not to buffer our output $| = 1;
for ( my $loop = 1; $loop <= 30; $loop++ ) {
 print "Iteration: $loop\n"; ## Perform some time consuming task here ## sleep 1;
}
print "All Done!\n";

nph scripts were more common in the past, because versions of Apache prior to 1.3 buffered the output of standard CGI scripts (those generating partial headers) but did not buffer the output of nph scripts. By creating nph scripts, your output was sent immediately to the browser as it was generated. However Apache 1.3 no longer buffers CGI output, so this feature of nph scripts is no longer needed with Apache. Other web servers, such as iPlanet Enterprise Server 4, buffer both standard CGI as well as nph output. You can find out how your web server handles buffering by running Example 3-3.

Save the file as nph-count.cgi and access it from your browser; then save a copy as count.cgi and update it to output partial headers by commenting out the status line and the Server header:

# print "$ENV{SERVER_PROTOCOL} 200 OK\n"; # print "Server: $ENV{SERVER_SOFTWARE}\n";

Access this copy of the CGI script and compare the result. If your browser pauses for thirty seconds before displaying the page, then the server is buffering the output; if you see the lines displayed in real time, then it is not.