What to Log?
For the most part, logging is done for two reasons: to look for problems on the server or proxy (e.g., which requests are failing), and to generate statistics about how web sites are accessed. Statistics are useful for marketing, billing, and capacity planning (for instance, determining the need for additional servers or bandwidth).
You could log all of the headers in an HTTP transaction, but for servers and proxies that process millions of transactions per day, the sheer bulk of all of that data quickly would get out of hand. You also would end up logging a lot of information that you don't really care about and may never even look at.
Typically, just the basics of a transaction are logged. A few examples of commonly logged fields are:
· HTTP method
· HTTP version of client and server
· URL of the requested resource
· HTTP status code of the response
· Size of the request and response messages (including any entity bodies)
· Timestamp of when the transaction occurred
· Referer and User-Agent header values
The HTTP method and URL tell what the request was trying to do-for example, GETting a resource or POSTing an order form. The URL can be used to track popularity of pages on the web site.
The version strings give hints about the client and server, which are useful in debugging strange or unexpected interactions between clients and servers. For example, if requests are failing at a higher-than-expected rate, the version information may point to a new release of a browser that is unable to interact with the server.
The HTTP status code tells what happened to the request: whether it was successful, the authorization attempt failed, the resource was found, etc. (See Section 3.2.2.4 for a list of HTTP status codes.)
The size of the request/response and the timestamp are used mainly for accounting purposes; i.e., to track how many bytes flowed into, out of, or through the application. The timestamp also can be used to correlate observed problems with the requests that were being made at the time.