Measuring performance

The NFS RPC mixture is useful for tuning the server to handle the load placed on it, but the real measure of success is whether the clients see a faster server or not. Users may still get "server not responding" messages after some bottlenecks are eliminated because you haven't removed all of the constraints, or because something other than the server is causing performance problems. Measuring the success of a tuning effort requires you to measure the average response time as seen by an average client. There are two schools of thought on how to determine this threshold for this value:

Use an absolute value for the "threshold of pain" in average server response time. The system begins to appear sluggish as response time approaches 40 milliseconds. As of this writing, typical NFS servers are capable of providing response times well below this threshold, in the range of one to ten milliseconds, and they keep getting faster.
Base the threshold on the performance of the server with a minimal load, such as only one client. When the server's performance exceeds twice this "ideal" response time, the server has become loaded.

It's easy to measure the average server response time on a client by dividing the number of NFS RPC calls made by the time in which they were completed. Use the nfsstat utility to track the number of NFS calls, and a clock or the Unix time command to measure the elapsed time in a benchmark or network observation. Obviously, this must be done over a short, well-monitored period of time when the client is generating NFS requests nearly continuously. Any gap in the NFS requests will increase the average server response time. You can also use NFS benchmark traffic generators such as the SPEC [44] SFS97 RPC-generating benchmark, or review the smoothed response times recorded by some versions of nfsstat -m.

[44]The Standard Performance Evaluation Corporation (http://www.spec.org) mission is to "establish, maintain, and endorse a standardized set of relevant benchmarks and metrics for performance evaluation of modern computer systems."

You'll get different average response times for different RPC mixtures, since disk-intensive client activity is likely to raise the average response time. However, it is the average response that matters most. The first request may always take a little longer, as caches get flushed and the server begins fetching data from a new part of the disk. Over time, these initial bumps may be smoothed out, although applications with very poor locality of reference may suffer more of them. You must take the average over the full range of RPC operations, and measure response over a long enough period of time to iron out any short-term fluctuations. Users are most sensitive to the sum of response times for all requests in an operation. One or two slow responses may not be noticed in the sequence of an operation with several hundred NFS requests, but a train of requests with long response times will produce complaints of system sluggishness. An NFS server must be able to handle the traffic bursts without a prolonged increase in response time. The randomness of the NFS requests modulates the server's response time curve, subject to various constraints on the server. Disk bandwidth and CPU scheduling constraints can increase the time required for the server's response time to return to its average value. Ideally, the average response time curve should remain relatively "flat" as the number of NFS requests increases. During bursts of NFS activity, the server's response time may increase, but it should return to the average level quickly. If a server requires a relatively long time to recover from the burst, then its average response time will remain inflated even when the level of activity subsides. During this period of increased response time, some clients may experience RPC timeouts, and retransmit their requests. This additional load increases the server's response time again, increasing the total burst recovery time. NFS performance does not scale linearly above the point at which a system constraint is hit. The NFS retransmission algorithm introduces positive feedback when the server just can't keep up with the request arrival rate. As the average response time increases, the server becomes even more loaded from retransmitted requests. A slow server removes some of the random elements from the network: the server's clients that are retransmitting requests generate them with a fairly uniform distribution; the clients fall into lock step waiting for the server, and the server itself becomes saturated. Tuning a server and its clients should move the "knee" of the performance curve out as far as possible, as shown in Figure 16-1.

Figure 16-1. Ideal versus actual server response

Knowing what to measure and how to measure it lets you evaluate the relative success of your tuning efforts, and provides valuable data for evaluating NFS server benchmarks.