NFS statistics

The client- and server-side implementations of NFS compile per-call statistics of NFS service usage at both the RPC and application layers. nfsstat -c displays the client-side statistics while nfsstat -s shows the server tallies. With no arguments, nfsstat prints out both sets of statistics:

% nfsstat -s Server rpc: Connection oriented: calls badcalls nullrecv badlen xdrcall dupchecks 10733943 0 0 0 0 1935861 dupreqs 0 Connectionless: calls badcalls nullrecv badlen xdrcall dupchecks 136499 0 0 0 0 0 dupreqs 0 Server nfs: calls badcalls 10870161 14 Version 2: (1716 calls) null getattr setattr root lookup readlink 48 2% 0 0% 0 0% 0 0% 1537 89% 13 0% read wrcache write create remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% link symlink mkdir rmdir readdir statfs 0 0% 0 0% 0 0% 0 0% 111 6% 7 0% Version 3: (10856042 calls) null getattr setattr lookup access readlink 136447 1% 4245200 39% 95412 0% 1430880 13% 2436623 22% 74093 0% read write create mkdir symlink mknod 376522 3% 277812 2% 165838 1% 25497 0% 24480 0% 0 0% remove rmdir rename link readdir readdirplus 359460 3% 33293 0% 8211 0% 69484 0% 69898 0% 876367 8% fsstat fsinfo pathconf commit 1579 0% 7698 0% 4253 0% 136995 1% Server nfs_acl: Version 2: (2357 calls) null getacl setacl getattr access 0 0% 5 0% 0 0% 2170 92% 182 7% Version 3: (10046 calls) null getacl setacl 0 0% 10039 99% 7 0% 


The server-side RPC fields indicate if there are problems removing the packets from the NFS service end point. The kernel reports statistics on connection-oriented RPC and connectionless RPC separately. The fields detail each kind of problem: The statistics for each NFS version are reported independently, showing the total number of NFS calls made to this server using each version of the protocol. A version-specific breakdown by procedure of the calls handled is also provided. Each of the call types corresponds to a procedure within the NFS RPC and NFS_ACL RPC services. The null procedure is included in every RPC program for pinging the RPC server. The null procedure returns no value, but a successful return from a call to null ensures that the network is operational and that the server host is alive. rpcinfo calls the null procedure to check RPC server health. The automounter (see "The Automounter") calls the null procedure of all NFS servers in parallel when multiple machines are listed for a single mount point. The automounter and rpcinfo should account for the total null calls reported by nfsstat. Client-side RPC statistics include the number of calls of each type made to all servers, while the client NFS statistics indicate how successful the client machine is in reaching NFS servers:

% nfsstat -c Client rpc: Connection oriented: calls badcalls badxids timeouts newcreds badverfs 1753584 1412 18 64 0 0 timers cantconn nomem interrupts 0 1317 0 18 Connectionless: calls badcalls retrans badxids timeouts newcreds 12443 41 334 80 166 0 badverfs timers nomem cantsend 0 4321 0 206 Client nfs: calls badcalls clgets cltoomany 1661217 23 1661217 3521 Version 2: (234258 calls) null getattr setattr root lookup readlink 0 0% 37 0% 0 0% 0 0% 184504 78% 811 0% read wrcache write create remove rename 49 0% 0 0% 24301 10% 3 0% 2 0% 0 0% link symlink mkdir rmdir readdir statfs 0 0% 0 0% 12 0% 12 0% 24500 10% 27 0% Version 3: (1011525 calls) null getattr setattr lookup access readlink 0 0% 417691 41% 14598 1% 223609 22% 47438 4% 695 0% read write create mkdir symlink mknod 56347 5% 221334 21% 1565 0% 106 0% 48 0% 0 0% remove rmdir rename link readdir readdirplus 807 0% 14 0% 676 0% 24 0% 475 0% 5204 0% fsstat fsinfo pathconf commit 8 0% 10612 1% 95 0% 10179 1% Client nfs_acl: Version 2: (411477 calls) null getacl setacl getattr access 0 0% 181399 44% 0 0% 185858 45% 44220 10% Version 3: (3957 calls) null getacl setacl 0 0% 3957 100% 0 0% 


In addition to the total number of NFS calls made and the number of rejected NFS calls (badcalls), the client-side statistics indicate if NFS calls are being delayed due to a lack of client RPC handles. Client RPC handles are opaque pointers used by the kernel to hold server connection information. In SunOS 4.x, the number of client handles was fixed, causing the NFS call to block until client handles became available. In Solaris, client handles are allocated dynamically. The kernel maintains a cache of up to 16 client handles, which are reused to speed up communication with the server. The clgets count indicates the number of times a client handle has been requested. If the NFS call cannot find an unused client handle in the cache, it will not block until one frees up. Instead, it will create a brand new client handle and proceed. This count is reflected by cltoomany. The client handle is destroyed when the reply to the NFS call arrives. This count is of little use to system administrators since nothing can be done to increase the cache size and reduce the number of misses. Included in the client RPC statistics are counts for various failures experienced while trying to send NFS requests to a server:

timeout + badcalls >= retrans


The final retransmission of a request on a soft-mounted filesystem increments badcalls (as previously explained). For example, if a filesystem is mounted with retrans=5, the client reissues the same request five times before noting an RPC failure. All five requests are counted in timeout, since no replies are received. Of the failed attempts, four are counted in the retrans statistic and the last shows up in badcalls. The statistics shown by nfsstat are cumulative from the time the machine was booted, or the last time they were zeroed using nfsstat -z:

nfsstat -z Resets all counters. nfsstat -sz Zeros server-side RPC and NFS statistics. nfsstat -cz Zeros client-side RPC and NFS statistics. nfsstat -crz Zeros client-side RPC statistics only.


Only the superuser can reset the counters.nfsstat provides a very coarse look at NFS activity and is limited in its usefulness for resolving performance problems. Server statistics are collected for all clients, while in many cases it is important to know the distribution of calls from each client. Similarly, client-side statistics are aggregated for all NFS servers. However, you can still glean useful information from nfsstat. Consider the case where a client reports a high number of bad verifiers. The high badverfs count is most likely an indication that the client is having to retransmit its secure RPC requests. As explained in "User-oriented network security", every secure RPC call has a unique credential and verifier with a unique timestamp (in the case of AUTH_DES) or a unique sequence number (in the case of RPCSEC_GSS). The client expects the server to include this verifier (or some form of it) in its reply, so that the client can verify that it is indeed obtaining the reply from the server it called. Consider the scenario where the client makes a secure RPC call using AUTH_DES, using timestamp T1 to generate its verifier. If no reply is received within the timeout period, the client retransmits the request, using timestamp T1+delta to generate its verifier (bumping up the retrans count). In the meantime, the server replies to the original request using timestamp T1 to generate its verifier:

RPC call (T1) ---> ** time out ** RPC call (retry: T1+delta) --->
<--- Server reply to first RPC call (T1 verifier)


The reply to the client's original request will cause the verifier check to fail because the client now expects T1+delta in the verifier, not T1. This consequently bumps up the badverf count. Fortunately, the Solaris client will wait for more replies to its retransmissions and, if the reply passes the verifier test, an NFS authentication error will be avoided. Bad verifiers are not a big problem, unless the count gets too high, especially when the system starts experiencing NFS authentication errors. Increasing the NFS timeo on the mount or automounter map may help alleviate this problem. Note also that this is less of a problem with TCP than UDP. Analysis of situations such as this will be the focus of "Characterization of NFS behavior", "Network Performance Analysis", and "Client-Side Performance Tuning". For completeness, we should mention that verifier failures can also be caused when the security content expires before the response is received. This is rare but possible. It usually occurs when you have a network partition that is longer than the lifetime of the security context. Another cause might be a significant time skew between the client and server, as well as a router with a ghost packet stored, that fires after being delayed for a very long time. Note that this is not a problem with TCP.

I/O statistics

Solaris' iostat utility has been extended to report I/O statistics on NFS mounted filesystems, in addition to its traditional reports on disk, tape I/O, terminal activity, and CPU utilization. The iostat utility helps you measure and monitor performance by providing disk and network I/O throughput, utilization, queue lengths and response time. The -xn directives instruct iostat to report extended disk statistics in tabular form, as well as display the names of the devices in descriptive format (for example, server:/export/path). The following example shows the output of iostat -xn 20 during NFS activity on the client, while it concurrently reads from two separate NFS filesystems. The server assisi is connected to the same hub to which the client is connected, while the test server paris is on the other side of the hub and other side of the building network switches. The two servers are identical; they have the same memory, CPU, and OS configuration:

% iostat -xn 20 ... extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.1 0.0 0.4 0.0 0.0 0.0 3.6 0 0 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rome:vold(pid239) 9.7 0.0 310.4 0.0 0.0 3.3 0.2 336.7 0 100 paris:/export 34.1 0.0 1092.4 0.0 0.0 3.2 0.2 93.2 0 99 assisi:/export


The iostat utility iteratively reports the disk statistics every 20 seconds and calculates its statistics based on a delta from the previous values. The first set of statistics is usually uninteresting, since it reports the cumulative values since boot time. You should focus your attention on the following set of values reporting the current disk and network activity. Note that the previous example does not show the cumulative statistics. The output shown represents the second set of values, which report the I/O statistics within the last 20 seconds. The first two lines represent the header, then every disk and NFS filesystem on the system is presented in separate lines. The first line reports statistics for the local hard disk c0t0d0. The second line reports statistics for the local floppy disk fd0. The third line reports statistics for the volume manager vold. In Solaris, the volume manager is implemented as an NFS user-level server. The fourth and fifth lines report statistics for the NFS filesystems mounted on this host. Included in the statistics are various values that will help you analyze the performance of the NFS activity: