If a client already has a persistent connection open to the server, it can use that connection to send its request. Otherwise, the client needs to open a new connection to the server (refer back to Chapter 4 to review HTTP connection-management technology).

Handling New Connections

When a client requests a TCP connection to the web server, the web server establishes the connection and determines which client is on the other side of the connection, extracting the IP address from the TCP connection. Once a new connection is established and accepted, the server adds the new connection to its list of existing web server connections and prepares to watch for data on the connection.

Different operating systems have different interfaces and data structures for manipulating TCP connections. In Unix environments, the TCP connection is represented by a socket, and the IP address of the client can be found from the socket using the getpeername call.

The web server is free to reject and immediately close any connection. Some web servers close connections because the client IP address or hostname is unauthorized or is a known malicious client. Other identification techniques can also be used.

Client Hostname Identification

Most web servers can be configured to convert client IP addresses into client hostnames, using "reverse DNS." Web servers can use the client hostname for detailed access control and logging. Be warned that hostname lookups can take a very long time, slowing down web transactions. Many high-capacity web servers either disable hostname resolution or enable it only for particular content.

You can enable hostname lookups in Apache with the HostnameLookups configuration directive. For example, the Apache configuration directives in Example 5-2 turn on hostname resolution for only HTML and CGI resources.

Example 5-2. Configuring Apache to look up hostnames for HTML and CGI resources

HostnameLookups off
<Files ~ "\.(html|htm|cgi)$">
 HostnameLookups on
</Files>

Determining the Client User Through ident

Some web servers also support the IETF ident protocol. The ident protocol lets servers find out what username initiated an HTTP connection. This information is particularly useful for web server logging-the second field of the popular Common Log Format contains the ident username of each HTTP request.

This Common Log Format ident field is called "rfc931," after an outdated version of the RFC defining the ident protocol (the updated ident specification is documented by RFC 1413).

If a client supports the ident protocol, the client listens on TCP port 113 for ident requests. Screenshot 5-4 shows how the ident protocol works. In Screenshot 5-4a, the client opens an HTTP connection. The server then opens its own connection back to the client's identd server port (113), sends a simple request asking for the username corresponding to the new connection (specified by client and server port numbers), and retrieves from the client the response containing the username.

Using the ident protocol to determine HTTP client username
Using the ident protocol to determine HTTP client username
(Screenshot 5-4.)

ident can work inside organizations, but it does not work well across the public Internet for many reasons, including:

·         Many client PCs don't run the identd Identification Protocol daemon software.

·         The ident protocol significantly delays HTTP transactions.

·         Many firewalls won't permit incoming ident traffic.

·         The ident protocol is insecure and easy to fabricate.

·         The ident protocol doesn't support virtual IP addresses well.

·         There are privacy concerns about exposing client usernames.

You can tell Apache web servers to use ident lookups with Apache's IdentityCheck on directive. If no ident information is available, Apache will fill ident log fields with hyphens (-). Common Log Format log files typically contain hyphens in the second field because no ident information is available.

 


Hypertext Transfer Protocol (HTTP)