Creating CGI Programs with Bash: Introduction

Quick Introduction

What is a CGI?

A CGI (Common Gateway Interface) is a program that runs on a web server. CGIs are typically called from HTML forms, and usually are designed to run quickly and return an HTML page someone can use in a browser.

What is Bash?

Bash (Bourne-Again SHell) is a shell for UNIX, Linux, (and other operating systems). Bash is similar to the Bourne shell (sh), and can most Bourne-shell scripts, but adds many capabilities and improvements. As a shell, Bash can be used in two ways: interactive, where it presents a powerful command line to the user, and as a script interpreter, which is how it is used when used to run CGI programs.

Why use Bash to write CGI programs?

While Bash is not the most common language for writing CGI programs in, it has some advantages:

  1. Bash is available on most UNIX or Linux servers, so if your site is hosted with a UNIX or Linux server and your host allows CGI scripts, you probably already have all the tools you need to write Bash CGIs on your server.
  2. Bash is easy to learn, especially for simple things. If you don't already know a language already used for CGI programming and don't intend to make extremely complex CGI programs, Bash is easy to learn.
  3. Bash allows you to use standard UNIX tools, which are powerful and generally not horribly difficult to learn, at least enough to use in your Bash programs. These tools include sed, grep, awk, wc, cat, echo, and ls.

Why not to use Bash for writing CGI programs?

Although Bash has its advantages when it comes to writing CGI programs, it also has some disadvantages that are worth mentioning:

  1. Bash is not the most efficient language and requires more resources on your server than some other languages. This can sometimes be partially mitigated by using an interpreter other than Bash (sh, ash, dash, etc), which may work, providing your programs does not require any Bash-specific functions.
  2. To do much with Bash, you need to use standard UNIX tools like sed, awk, grep, cat, and echo. Each program has its own learning curve, which ranges from almost nothing (echo), to potentially quite steep (sed). Also, the use of these programs can incur a performance hit.
  3. For performance and maintenance reasons, Bash is not well suited for large or complex CGI programs, nor is it well suited for CGI programs you expect to get huge amounts of use.
  4. If you are using a Windows server, Bash is probably not avaiable and therefore can't be used. This also means that if your site comes to depend on many Bash CGI programs, it may be difficult to move to a Windows server. This is a minor issue.

The main thing to realize is, Bash works for writing CGI programs, and can work well, but it is not optimized for the task, and so it has some performance implications. Using Bash for CGIs isn't the best choice if your CGI programs are very complex (they are also probably harder to write in Bash if they are very complex, too), or if they will get extremely heavy usage.

Finally, a note of caution

This guide is about writing CGI programs. Any CGI program carries a potential security risk with it, since it allows data to be sent to a web server and processed their. A poorly written CGI program, in any language, including Bash, can leave a server open to attack or consume its resources, slowing it down or crashing it. The biggest risk of a CGI program is that it will be vulnerable to some attack and allow the server (and site) running it to become compromised. You should be aware of these risks before writing CGI code, and take care to do your best to write good, secure code that checks for errors, unexpected, and unwanted input or behavior and works around it. You should also be aware that not every hosting provider allows you to write and run your own CGI programs, and that many that do will deactivate your site if you write a CGI program that causes the host's server problems or causes it to get compromised. Talk to your hosting provider if you are unsure if you are allowed to run CGIs or not. Regardless, it is your responsibility to keep your CGIs safe, secure, and not write CGIs that can do potentially harmful things (ie it would be a bad idea to write a program that allows someone to remove files from your site, even if its at a hidden URL. Don't write CGIs that can do things you wouldn't want someone else or some automated virus or bot doing to your site!).

Beginning a Bash Script

Shell scripts start with a line identifing the file as a script and telling the operating system what script interpreter to use. Since we're using Bash, we'll tell the system to use Bash as the script interpreter:

The #! on the first line tells the system that the file is a shell script, and everything after the ! is the path to the interpreter to use to process the script. On most systems, the path to Bash is /bin/bash, but it may be something else on your particular system. If /bin/bash doesn't work, try /usr/bin/bash, and if that doesn't work, ask your system administrator or host provider what the path to Bash is on your system.

The rest of a Bash script consists of either Bash code or other programs. While programming in Bash, you will use many other programs found on the system to accomplish tasks not built into Bash, such as text parsing or sending HTML back to the browser.

Variables in Bash

Variables in Bash can be created at any time and have no specific type. The following code creates several variables:

In this example a variable VARIABLE is created which holds "This is a variable" while a variable named ANOTHER is created that holds the output of the command wc -l /tmp/example Capturing the output of a command and saving it in a variable is a very important tool in Bash. To do so, use VARIABLE_NAME=command

To reference a variable (ie get its value), put a dollar sign in front of the variable, as in the example below:

Which outputs

In this example, the command echo is used to output text. The outputted text is VARIABLE= and then the content of the variable VARIABLE, which is referenced with $VARIABLE.

Creating an Extremely simple CGI in Bash

Below is an extremly simple CGI program in Bash. All it does is show the user a webpage that says "Welcome to your Bash CGI!"

This Bash program, when run by the web server, will send a page that says "Welcome to your Bash CGI!" to a browser window. The output looks like the screenshot below:

Screenshot of output of simple Bash program

The most important lines in this file are:

These two lines tell your browser that the rest of the content comming from the program is HTML, and should be treated as such. Leaving these lines out will often cause your browser to download the output of the program to disk as a text file instead of displaying it, since it doesn't understand that it is HTML!

This program uses the echo program to send text to a browser. Normally, echo prints whatever comes after it to the screen, but in Bash CGIs, it instead sends whatever is after it to the browser. Using echo you can send any text to the browser. Remember that the browser will treat it as HTML, so you can also include HTML formatting tags to make the output of your CGI look however you want.

Creating a Form in HTML

Now that handling variables is covered, we'll create a simple CGI program in Bash that gets some data from an HTML form. CGI programs are often started by a user submitting HTML form data, with the CGI program then processing the result and giving some kind of output.

First, lets start by creating a simple HTML form, which will submit data to our CGI program, which is /cgi-bin/

For those unfamilar with HTML forms, here's a breakdown of the above:

When a CGI program sees the data submitted from this form, it sees something like this:

Assuming the user entered "John Doe" into the text box and selected the "create" radio button.

Getting Form Data in a Bash CGI

We've created a great HTML form that submits data to a CGI program. Now all we need to do is create a CGI program that does something with this data. Below is a simple Bash program that will tell the user what they entered:

This will give the person who clicked submit in our form a webpage that tells them what they entered into the form. For example, if the person enterd "John Doe" as their username and clicked the "Create" radio button, they would get a webpage that says:

Lets look at how this works, line by line:

We are already familar with the first 5 lines, so we'll skip ahead to the first new one:

puts the form data named "username" into a variable named USERNAME.

puts the form data named "whatToDo" into a vairable named WHATTODO.

The next two lines send some HTML code the the browser.

The second-to-last line sends the browser the text telling the user what they entered. Since the variable names on this line are preceded with a $, the values of the variables are substituted for the vairable names.

Finally, the last line ends the HTML code for the browser. At this point, the CGI is finished and the program on the web server ends.

The most important lines are 3, 4, 6, and 7, discussed below:

The lines that actually get the form input from the form into the CGI are copied below, and color coded:

How does this work?

When your user hits the submit button in your HTML form, the web browser sends everything they put into the form to the CGI program referenced in the form. Since the form uses the GET method to transmit data, the web server takes the data it receives from the form and puts it in an enviromental variable named QUERY_STRING. It then starts the CGI program referenced in the form, which is our Bash program.

Our Bash program begins with the normal initialization of a Bash CGI program. Eventually, it needs to read in the form data. Remember that this is stored in an enviromental variable. Lines 3 and 4 of our Bash CGI, color coded above, do the work of pulling the data out of the enviromtal variable and into variables in our Bash program.

First, a variable in Bash is created (colored red). The variable is created by parsing reading the QUERY_STRING enviromental variable set by the web server (green). The QUERY_STRING variable is sent through the sed program, which looks for a pattern matching the name of a form element (blue) and collects any data belonging to that form element. Finally, the result is sent to another sed program (orange), which replaces any occurances of %20 with a space, since when the form data is transfered from the browser to the server, spaces are turned into %20 (so "Hi there" becomes "Hi%20there").

You can use any Bash variable you like, and to get different form elements, simply change the name sed looks for, which in this example is hilighted in blue.

Differences between POST and GET

In the previous page, handling data from forms submitted using GET was discussed, and from that page your should be able to figure out how to write a Bash CGI that collects and can do things with data submitted with GET. GET however, is only one of the ways a form may submit its data. The other way is POST.

GET data is sent as part of the URL. You may notice when browsing URLs such as this nicely color-coded example:

The green part of this URL is the site and path to the CGI program, and the actual CGI program is in red. Everything in blue is GET data. As you can see, GET data is identified by a ? after the CGI that will handle the data. The data is made up of the form form_element_name=value separated by ampersands (&). In the example above, we see that an HTML form contained elements named "action", "item", "price", and "sub" and their values were "buy", "Big Table" (remember spaces are turned into %20 when the data is submitted), "32.99", and "Submit".

Note: One nice thing about GET data is that it doesn't necessarily have to come from an HTML form. Since it is simply part of a URL, you can create links to CGI scripts that accept GET data and simple pass the data you want to the program in the URL of the link, without having to create a form and have the user click "submit."

For example, say I had a CGI program that displayed a page that showed either the date or time, depending on what something called "whatToShow" equaled. If "whatToShow" equals "date" it shows the date, if it equals "time" it shows the time. One way to access this CGI would be to create an HTML form and have the user submit this, as shown below:

Alternatively, since GET data is just part of the URL, I could just create two links to the CGI, one for the date and one for the time, and get the same result, but with less typing and a different look:

This trick is very useful for times when your CGI program only accepts some simple, predefined input that doesn't require a whole form to get, and you want the user to access the CGI by just clicking a link, instead of having to fill out and submit an entire form. This trick is also useful for debugging CGIs that accept GET data: you can keep running them with different data without having to keep changing your HTML form; just change the URL to make the new data you want to submit your CGI.

GET has several benefits: it is easy to use, you can send get data in a regular link (see the above's "Note"), and you can see the data being transmitted (great for debugging). However, GET has some drawbacks:

Fortunately, an alternative to GET exists which solves most of these problems: POST. POST data is not sent as part of the URL. Instead, it is sent as its own separate stream. POST can send any type of data, can send any amount of data (no limitations of the amount of data like with GET), and the data isn't easily visible to the user in the URL. (Note: Just because the data sent using POST isn't easily visible to the user doesn't mean it is sent securely. Unless you are using an encrypted connection (HTTPS), POST data can be viewed without too much difficult by anyone between your user and your web server. Don't think sending passwords via POST is secure just because someone can't look at the URL to see the password. Unless you are running over an encrypted connection, POST is really no more secure or safe than GET).

Since POST data can be any length and any type, it is not put into an environmental variable on the web server like GET data is. This means our methods for handling GET data sent from a browser, which involved using echo and sed to pull values out of the QUERY_STRING environmental variable won't work for POST data. The following code will read both POST and GET data and assign the values from the POST and GET data into variables in your Bash program:

(Credit: The above code was written by Phillippe Kehi and came from: )

How this code works is not as important as what is does. With that in mind, we'll skip a discussion on how the code works and instead jump right into how to use it in your programs:

This code should be early in your CGI program, probably right after the two echo lines that identify the output of your CGI as HTML (echo "Content-type: text/html" and echo ""). The above code consists of three functions, but the magic line is the last one, cgi_getvars BOTH ALL. This line uses the other three lines to create variables in your program that are named the same as the form elements they correspond to, and with the values of those form elements. By using BOTH and ALL, the code will get all form elements and will work with both GET and POST data.Here's an example of how to use this code. Suppose you have a web page that allows users to enter comments and some additional information. You know some people type a lot, so you don't want to use GET, since your users might type more than GET supports. So you use POST. Your HTML form on you web page looks like this:

If, the CGI our form runs to process the data it sends, contained the code above for POST data processing, it would end up containing variables named "name", "likeProduct", and "comments", since these were the names of form elements in the HTML form that invoked the CGI, and the CGI code copies the form element names into Bash variables of the same name. Since the code also also sets the variables to the values from the form, our script ends up with variables like this:

As you can see, this is a useful piece of code, since it essentially maps form input with variable names in Bash. You can use the values of these variables however you want in your CGI programs.

Being able to read and write files on your server is one of the biggest reasons to write CGI programs. You might have a file on your server that contains the high scores for a game, or a guest book, or simply a list of images which you want to randomly display to your visitors. In each case, to properly work, you need to be able to read, and possibly write, to and from these files. To do so, you need some kind of server-side program, and one possibility is a Bash CGI. In this section, the basics of reading and writing to files will be discussed. To keep things simple (and practical), only reading and writing from simple text files will be covered, and more advanced things like working with databases won't be discussed at all. Bash is probably not the best choice if that is the kind of thing you want to do, anyway. However, for reading and writing to flat text files, Bash is a very good choice.

Reading Lines for a Text file with Bash

One way to read files is line by line. Each line might contain some piece of information which you want to read into your program. To do this, 2 things are needed:

  1. Something to determine the number of lines in the file.
  2. A loop to read reach line of the file into a variable, stopping at the end.

This is easy to do in Bash. The example below will read in and print back every line in a file.

Here's how it works:

  1. The first line is the familiar #!/bin/bash identifying this as a Bash program.

  2. Next, the name of the file to read is stored in a variable named FILE.

  3. The next line, in red, stores the number of lines of the file in a variable named LINES. It does this by sending the result of the command wc -l $FILE, which returns the number of lines and the filename, through a sed command which returns only the number part of the wc command. Thus, only the line number gets stored in the variable.

  4. Next, a loop is created which starts at 1 and runs for the number of lines in the file ( seq 1 $lines will go from 1 to whatever number is in variable LINES, which in this case is the number of lines in the file). The current iteration of the loop is stored in a variable i. For each iteration of the loop, everything between do and done gets executed. Everything necessary for the loop is in green.

  5. Inside the loop, two things happen:

    1. A variable THIS_LINE gets set to the current line, which is i. This works by using the command cat to write the entire file to the sed program, which then returns only the line numbered whatever variable i is.
    2. Next, the echo command is used to display the variable THIS_LINE, which contains the current line of the file.

Of course, your program would probably do something else with the lines of the file, especially since their are easier ways to just print the contents of a file (cat $FILE would work, and in fact would do everything the loop in the above example does with only one command and one line). However, being able to read each line is useful if, for example, your program is looking for a specific line, perhaps one that matches the value of some element in a form, which it will then do something with.

Reading only part of a line

The above example reads an entire line, which might be useful to some programs. However, often more useful is the ability to read just part of a line. For example, consider the file below:

The file consists of a number followed by some text. Suppose we want to read the file and see which number is asscoiated with what text. An easy way to do this with Bash is to read the file line-by-line like above, but instead of reading the entire line, modify the sed command to only return either the number of the text after the number. So, after modifing the sed command, we get the following Bash program:

Running this program in a directory with a file named fruitANDnumbers.txt (which contains the fruit and numbers above), gives the following result:

Note: This program isn't a valid CGI because it doesn't contain the lines

However, if you added these two lines right after #!/bin/bash you'd have a valid Bash CGI which would display its output correctly in a web browser.

Note: The above file and Bash program is a good way to make a high score reading Bash program. The numbers are the score, and the text after is the name of the person with that score. So, above, Grapefruit leads with a score of 1000, while Cranberries are in last place with a score of only 9.

The way to read different parts of files or files with different formatting is to change the sed commands to match only the part of the line or file that you want. A more in-depth discussion of sed is beyond the scope of this section, however.

Working with Files Formatted in Columns

If you have a large amount of information (perhaps a small database), you might format it in columns and rows, with some character used to seperate each column. For example, take the following file:

Here, columns are separated by colons (in bold red), and each line is a new row.

The awk program can be used to seperate the data in between the deliminators, which in this case is the colon. The following will seperate the values on each line in a file named "example.txt" above into variables NAME, FAVFOOD, FAVNUMBER, and FAVCOLOR and print them to the screen.

This code first reads in the number of lines in the file (using the wc program) and then loops through each line, putting each part of the line seperated by a ":" into a variable. It uses the awk program to do this. It sends awk the current line, tells it the deliminator to use (-F:), and which part of the line seperated by the deliminator to keep ('{ print $num }' where num is the section of the line to keep.

The output of this code is below: