Identifying NFS performance bottlenecks

The stateless design of NFS makes crash recovery simple, but it also makes it impossible for a client to distinguish between a server that is slow and one that has crashed. In either case, the client does not receive an RPC reply before the RPC timeout period expires. Clients can't tell why a server appears slow, either: packets could be dropped by the network and never reach the server, or the server could simply be overloaded. Using NFS performance figures alone, it is hard to distinguish a slow server from an unreliable network. Users complain that "the system is slow," but there are several areas that contribute to system sluggishness. An overloaded server responds to all packets that it enqueues for its nfsd daemons, perhaps dropping some incoming packets due to the high load. Those requests that are received generate a response, albeit a response that arrives sometime after the client has retransmitted the request. If the network itself is to blame, then packets may not make it from the client or server onto the wire, or they may vanish in transit between the two hosts.

Problem areas

The potential bottlenecks in the client-server relationship are:

Client network interface
The client may not be able to transmit or receive packets due to hardware or configuration problems at its network interface. We will explore client-side bottlenecks in "Client-Side Performance Tuning".

Network bandwidth
An overly congested network slows down both client transmissions and server replies. Network partitioning hardware installed to reduce network saturation adds delays to roundtrip times, increasing the effective time required to complete an RPC call. If the delays caused by network congestion are serious, they contribute to RPC timeouts. We explore network bottlenecks in detail in "Network Performance Analysis".

Server network interface
A busy server may be so flooded with packets that it cannot receive all of them, or it cannot queue the incoming requests in a protocol-specific structure once the network interface receives the packet. Interrupt handling limitations can also impact the ability of the server to pull packets in from the network.

Server CPU loading
NFS is rarely CPU-constrained. Once a server has an NFS request, it has to schedule an nfsd thread to have the appropriate operation performed. If the server has adequate CPU cycles, then the CPU does not affect server performance. However, if the server has few free CPU cycles, then scheduling latencies may limit NFS performance; conversely a system that is providing its maximum NFS service will not make a good CPU server. CPU loading also affects NIS performance, since a heavily loaded system is slower to perform NIS map lookups in response to client requests.

Server memory usage
NFS performance is somewhat related to the size of the server's memory, if the server is doing nothing but NFS. NFS will use either the local disk buffer cache (in systems that do not have a page-mapped VM system) or free memory to cache disk pages that have recently been read from disk. Running large processes on an NFS server hurts NFS performance. As a server runs out of memory and begins paging, its performance as either an NIS or NFS server suffers. Disk bandwidth is wasted in a system that is paging local applications, consumed by page fault handling rather than NFS requests.

Server disk bandwidth
This area is the most common bottleneck: the server simply cannot get data to or from the disks quickly enough. NFS requests tend to be random in nature, exhibiting little locality of reference for a particular disk. Many clients mounting filesystems from a server increase the degree of randomness in the system. Furthermore, NFS is stateless, so NFS Version 2 write operations on the server must be committed to disk before the client is notified that the RPC call completed. This synchronous nature of NFS write operations further impairs performance, since caching and disk controller ordering will not be utilized to their fullest extent. NFS Version 3 eases this constraint with the use of safe asynchronous writes, which are described in detail in the next section.

Configuration effects
Loosely grouped in this category are constrictive server kernel configurations, poor disk balancing, and inefficient mount point naming schemes. With poor configurations, all services operate properly but inefficiently.

Throughput

The next two sections summarize NFS throughput issues.

NFS writes (NFS Version 2 versus NFS Version 3)

Write operations over NFS Version 2 are synchronous, forcing servers to flush data to disk[45] before a reply to the NFS client can be generated. This severely limits the speed at which synchronous write requests can be generated by the NFS client, since it has to wait for acknowledgment from the server before it can generate the next request. NFS Version 3 overcomes this limitation by introducing a two-phased commit write operation. The NFS Version 3 client generates asynchronous write requests, allowing the server to acknowledge the requests without requiring it to flush the data to disk. This results in a reduction of the round-trip time between the client and server, allowing requests to be sent more quickly. Since the server no longer flushes the data to disk before it replies, the data may be lost if the server crashes or reboots unexpectedly. The NFS Version 3 client assumes the responsibility of recovering from these conditions by caching a copy of the data. The client must first issue a commit operation for the data to the server before it can flush its cached copy of the data. In response to the commit request, the server either ensures the data has been written to disk and responds affirmatively, or in the case of a crash, responds with an error causing the client to synchronously retransmit the cached copy of the data to the server. In short, the client is still responsible for holding on to the data until it receives acknowledgment from the server indicating that the data has been flushed to disk.

[45]The effect of NVRAM is discussed in "Disk array caching and Prestoserve" later in this chapter.

For all practical purposes, the NFS Version 3 protocol removes any limitations on the size of the data block that can be transmitted, although the data block size may still be limited by the underlying transport. Most NFS Version 3 implementations use a 32 KB data block size. The larger NFS writes reduce protocol overhead and disk seek time, resulting in much higher sequential file access.

NFS/TCP versus NFS/UDP

TCP handles retransmissions and flow control for NFS, requiring only individual packets to be retransmitted in case of loss, and making NFS practical over lossy and wide area network practical. In contrast, UDP requires the whole NFS operation to be retransmitted if one or more packets is lost, making it impractical over lossy networks. TCP allows read and write operations to be increased from 8 KB to 32 KB. By default, Solaris clients will attempt to mount NFS filesystems using NFS Version 3 over TCP when supported by the server. Note that workloads that mainly access attributes or consist of short reads will benefit less from the larger transfer size, and as such you may want to reduce the default read size block by using the rsize=n option of the mount command. This is explored in more detail in "Client-Side Performance Tuning".

Locating bottlenecks

Given all of the areas in which NFS can break down, it is hard to pick a starting point for performance analysis. Inspecting server behavior, for example, may not tell you anything if the network is overly congested or dropping packets. One approach is to start with a typical NFS client, and evaluate its view of the network's services. Tools that examine the local network interface, the network load perceived by the client, and NFS timeout and retransmission statistics indicate whether the bulk of your performance problems are due to the network or the NFS servers. In this and the next two chapters, we look at performance problems from excessive server loading to network congestion, and offer suggestions for easing constraints at each of the problem areas outlined above. However, you may want to get a rough idea of whether your NFS servers or your network is the biggest contributor to performance problems before walking through all diagnostic steps. On a typical NFS client, use the nfsstat tool to compare the retransmission and duplicate reply rates:

% nfsstat -rc Client rpc: Connection oriented: calls badcalls badxids timeouts newcreds badverfs 1753584 1412 18 64 0 0 timers cantconn nomem interrupts 0 1317 0 18 Connectionless: calls badcalls retrans badxids timeouts newcreds 12443 41 334 80 166 0 badverfs timers nomem cantsend 0 4321 0 206

The timeout value indicates the number of NFS RPC calls that did not complete within the RPC timeout period. Divide timeout by calls to determine the retransmission rate for this client. We'll look at an equation for calculating the maximum allowable retransmission rate on each client in "Retransmission rate thresholds". If the client-side RPC counts for timeout and badxid are close in value, the network is healthy. Requests are making it to the server but the server cannot handle them and generate replies before the client's RPC call times out. The server eventually works its way through the backlog of requests, generating duplicate replies that increment the badxid count. In this case, the emphasis should be on improving server response time. Alternatively, nfsstat may show that timeout is large while badxid is zero or negligible. In this case, packets are never making it to the server, and the network interfaces of client and server, as well as the network itself, should be examined. NFS does not query the lower protocol layers to determine where packets are being consumed; to NFS the entire RPC and transport mechanisms are a black box. Note that NFS is like spray in this regard -- it doesn't matter whether it's the local host's interface, network congestion, or the remote host's interface that dropped the packet -- the packets are simply lost. To eliminate all network-related effects, you must examine each of these areas.