Setting Up the Apache Web Server

Setting Up the Apache Web Server

You probably already know how it feels to use the Web, but you may not know how to set up a Web server so that you, too, can provide information to the world through Web pages. To become an information provider on the Web, you have to run a Web server on your CentOS Linux PC on the Internet. You also have to prepare the Web pages for your website-a task that may be more demanding than the Web server setup.

Web servers provide information using HTTP. Web servers are also known as HTTP daemons (because continuously running server processes are called daemons in UNIX) or HTTPD for short. The Web server program is usually named httpd.

Among the available Web servers, the Apache Web server is the most popular. The Apache Web server started out as an improved version of the NCSA HTTPD server but soon grew into a separate development effort. Like NCSA HTTPD, the Apache server is developed and maintained by a team of collaborators. Apache is freely available over the Internet.

The following sections describe the installation and configuration of the Apache Web server.

Learning More about the Apache Web Server

The Apache Web server has too many options and configuration directives to describe in detail in this book. Whole books are devoted to configuring the Apache Web server. You should consult one of these books for more information:

  • Mohammed J. Kabir, Apache Server 2 Bible, John Wiley & Sons, 2002.

  • Mohammed J. Kabir, Apache Server Administrator's Handbook, Hungry Minds, Inc., 1999.

  • Ken A. L. Coar, Apache Server for Dummies, IDG Books Worldwide, 1998.

You can also find late-breaking news and detailed information about the latest version of Apache HTTPD from Apache's website at . In particular, you can browse a complete list of Apache directives at .org/docs-2.0/mod/directives.html.

Installing the Apache Web Server

Installing CentOS Linux from this book's companion CD-ROMs gives you the option to install the Apache Web server. As described in , simply select the Web Server package group when you are prompted for the components to install. This package group includes the Apache Web server. The Web server program is called httpd, so the Apache Web server package is called httpd.

Perform these steps to verify that the Apache Web server software is installed on your system:

  1. Use the rpm -q command to check whether or not the Apache package is installed:

    rpm -q httpd
    httpd-2.0.40-16

    If the output shows an httpd package name, you have installed the Apache software.

  2. Type the following command to check whether or not the httpd process is running (httpd is the name of the Apache Web server program):

    ps ax | grep httpd

    If the Apache Web server is running, the output should show a number of httpd processes. It is common to run several Web server processes-one parent and several child processes-so that several HTTP requests can be handled efficiently by assigning each request to an httpd process. If there is no httpd process, log in as root and start the httpd service with the following command:

    service httpd start
  3. Use the telnet program on your Linux system, and use the HTTP HEAD command to query the Web server, as follows:

    telnet localhost 80
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    HEAD / HTTP/1.0  (press Enter twice)
    HTTP/1.1 403 Forbidden
    Date: Sat, 15 Feb 2003 22:12:28 GMT
    Server: Apache/2.0.40 (CentOS Linux)
    Accept-Ranges: bytes
    Content-Length: 2898
    Connection: close
    Content-Type: text/html; charset=ISO-8859-1
    Connection closed by foreign host.

    If you get a response such as that in the preceding code, your system already has the Apache Web server installed and set up correctly. All you have to do is understand the configuration so that you can place the HTML documents in the proper directory.

Use a Web server to load the homepage from your system. For instance, if your system's IP address is 192.168.0.100, use the URL or try and see what happens. You should see a Web page with the title 'Test Page for the Apache Web Server on CentOS Linux.'

Configuring the Apache Web Server

CentOS Linux configures the Apache Web server software to use these files and directories:

Apache Configuration Directives

The Apache httpd server's operation is controlled by the directives stored in the httpd.conf file located in the /etc/httpd/conf directory as well as separate .conf files located in the /etc/httpd/conf.d directory. The directives in these configuration files specify general attributes of the server, such as the server's name, the port number and the directory in which the server's directories are located. The configuration directives also specify information about the server resources-the documents and other information the Web server provides to users-and access control directives that control access to the entire Web server as well as to specific directories.

The next few sections show you the key information about the Apache httpd configuration directives. Typically, you do not have to change much in the configuration files to use the Apache Web server, except for setting the ServerName in the httpd.conf file. However, it is useful to know the format of the configuration files and the meaning of the various keywords used in them.

As you study the /etc/httpd/conf/httpd.conf file, keep these syntax rules in mind:

The following sections show the Apache directives grouped into three separate categories: general HTTPD directives, resource configuration directives, and access-control directives.

General HTTPD Directives

Some interesting items from the httpd.conf file are

Many more directives control the way that the Apache Web server works. The following list summarizes some of the directives you can use in the httpd.conf file. You can leave most of these directives in their default settings, but it's important to know about them if you are maintaining a Web server.

Resource Configuration Directives

The resource configuration directives specify the location of the Web pages, as well as how to specify the data types of various files. To get started, you can leave the directives at their default settings. These are some of the resource configuration directives for the Apache Web server:

Access-Control Directives

Access-control directives enable you to control who can access different directories in the system. These are the global access-configuration directives. In each directory containing documents served by the Apache Web server, you can have another access-configuration file with the name specified by the AccessFileName directive. (That per directory access-configuration file is named .htaccess by default.)

Stripped of most of the comment lines, the access-control directive has this format:

# First, we configure the "default" to be a
# very restrictive set of permissions.
<Directory />
Options None
AllowOverride None
</Directory>
# The following directory name should
# match DocumentRoot in httpd.conf
<Directory /var/www/html>
    Options Indexes Includes FollowSymLinks
    AllowOverride None
    order allow,deny
    allow from all
</Directory>
# The directory name should match the
# location of the cgi-bin directory
<Directory "/var/www/cgi-bin">
    AllowOverride None
    Options ExecCGI
    Order allow,deny
    Allow from all 
</Directory>

Access-control directives use a different syntax from the other Apache directives. The syntax is like that of HTML. Various access-control directives are enclosed within pairs of tags, such as <Directory> ... </Directory>.

The following list describes some of the access-control directives. In particular, notice the AuthUserFile directive; you can have password-based access control for specific directories.

Supporting Virtual Hosts with the Apache HTTP Server

A useful feature of the Apache HTTP server is its ability to handle virtual Web servers. This ability enables a single server to respond to many different IP addresses and to serve Web pages from different directories, depending on the IP address. That means that you can set up a single Web server to respond to both and and serve a unique home page for each hostname. A server with this capability is known as multihomed Web server, a virtual Web server, or a server with virtual host support.

As you might guess, ISPs use virtual host capability to offer virtual websites to their customers. You must meet the following requirements to support virtual hosts:

For the latest information on how to set up virtual hosts in an Apache HTTP server, consult the following URL:

http://httpd.apache.org/docs-2.0/vhosts/index.html

The Apache HTTP server can respond to different host names with different home pages. You have two options when supporting virtual hosts:

You should run multiple HTTP daemons only if you do not expect heavy traffic on your system; the system may not able to respond well because of the overhead associated with running multiple daemons. However, you may need multiple HTTP daemons if each virtual host has a unique configuration need for the following directives:

For a site with heavy traffic, you should configure the Web server so that a single HTTP daemon can serve multiple virtual hosts. Of course, this recommendation implies that there is only one configuration file. In that configuration file, use the VirtualHost directive to configure each virtual host.

Most ISPs use the VirtualHost capability of Apache HTTP server to provide virtual websites to their customers. Unless you pay for a dedicated Web host, you typically get a virtual site where you have your own domain name, but share the server and the actual host with many other customers.

The syntax of the VirtualHost directive is as follows:

<VirtualHost hostaddr> 
    ... directives that apply to this host
    ... 
</VirtualHost> 

With this syntax, you use <VirtualHost> and </VirtualHost> to enclose a group of directives that will apply only to the particular virtual host identified by the hostaddr parameter. The hostaddr can be an IP address, or the fully qualified domain name of the virtual host.

You can place almost any Apache directives within the <VirtualHost> block. At a minimum, Webmasters include the following directives in the <VirtualHost> block:

When the server receives a request for a document in a particular virtual host's DocumentRoot directory, it uses the configuration parameters within that server's <VirtualHost> block to handle that request.

Here is a typical example of a <VirtualHost> directive that sets up the virtual host :

<VirtualHost www.lnbsoft.com>
    DocumentRoot    /home/naba/httpd/htdocs
    ServerName   www.lnbsoft.com
    ServerAdmin   webmaster@lnbsoft.com
    ScriptAlias   /cgi-bin/   /home/naba/httpd/cgi-bin/
    ErrorLog  /usr/home/naba/httpd/logs/error_log
    CustomLog   /home/naba/httpd/logs/access_log common
</VirtualHost> 

Here, the name common in the CustomLog directive refers to the name of a format defined earlier in the httpd.conf file by the LogFormat directive, as follows:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

This format string for the log produces lines in the log file that look like this:

dial236.dc.psn.net - - [29/Oct/2002:18:09:00 -0500] "GET / HTTP/1.0" 200 1243

The format string contains two letter tokens that start with a percent sign (%). The meaning of these tokens is shown in .

Table 14-1: LogFormat Tokens

Token

Meaning

%b

The number of bytes sent to the client, excluding header information

%h

The hostname of the client machine

%l

The identity of the user, if available

%r

The HTTP request from the client (for example, GET / HTTP/1.0)

%s

The server response code from the Web server

%t

The current local date and time

%u

The user name the user supplies (only when access-control rules require user name/password authentication)

Configuring Apache for Server-Side Includes (SSI)

'Server-side include' (SSI) refers to a feature of the Apache Web server whereby it can include a file or the value of an environment variable in an HTML document. The feature is like the include files in many programming languages such as C and C++. Just as a preprocessor processes the include files in a programming language, the Web browser reads the HTML file and parses the server-side includes before returning the document to the Web browser.

Server-side includes provide a convenient way to include date, file size, and any file into an HTML document. The SSI directives look like special comments in the HTML file. For example, you can show the size of a graphics file by placing the following SSI directive in the HTML file:

File size = <!--#fsize file="nbphoto.png"-->

The Web server replaces everything to the right of the equal sign with the size of the file nbphoto.gif.

Similarly, to display today's date, you can use the following SSI directive:

Today is <!--#echo var="DATE_LOCAL" --> 

To enable SSI on the Apache Web server, place the following directive in the /etc/httpd/conf/httpd.conf file:

Options +Includes

Apache directives can apply to specific directories. Therefore, it's best if you place this directive in the block of directives that apply to the directory where you want to allow SSI.

Supporting CGI Programs in Apache

Sometimes an HTML document's content may not be known in advance. For example, if a website provides a search capability, the result of a search depends on which keywords the user enters in the search form. To handle these needs, the Web server relies on external programs called gateways.

A gateway program accepts the user input and responds with the requested data formatted as an HTML document. Often, the gateway program acts as a bridge between the Web server and some other repository of information such as a database.

Gateway programs have to interact with the Web server. To allow anyone to write a gateway program, the method of interaction between the Web server and the gateway program had to be specified completely. Common Gateway Interface (CGI) is the standard method used by gateway programs to exchange information with the Web server. The Apache Web server supports CGI programs.

The URL specifying a CGI program looks like any other URL, but the Apache Web server can examine the directory name and determine whether the URL is a normal document or a CGI program. Typically, a directory is set aside for CGI programs, and you specify that directory through the ScriptAlias directive in the /etc/httpd/conf/httpd.conf file. You can use multiple ScriptAlias directives to specify multiple directories for CGI programs. Here's a typical ScriptAlias directive:

ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

This tells Apache that CGI URLs use the /cgi-bin/ directory name. The ScriptAlias directive also specifies that Apache should substitute the full pathname /var/www/cgi-bin/ for /cgi-bin/.

For example, if a URL specifies , then the Apache server at recognizes this as a CGI URL and invokes the dbquery program. The ScriptAlias directive in that Web server's httpd.conf configuration file indicates which directory on the server's system contains the CGI programs. In other words for the example ScriptAlias directive, the Web server translates the CGI program name /cgi-bin/dbquery to the full pathname /var/www/cgi-bin/dbquery.