Design and Implementation of the Sun
Network File System
Sandberg, Goldberg, Kleiman, Walsh, Lyon (1985)
What kind of paper is this?
- Motivate need for system.
- Establish goals.
- Describe real system.
- Evaluate Performance.
- Design modifications into system; not glued on side.
Goals
- Machine and OS Independence
- Simple crash recovery for both clients and servers.
- Transparent access to files (i.e. use vanilla pathnames).
- Provide UNIX-semantics to client.
- "Reasonable" performance.
Overall Design
- Motivate the VFS/vnode design.
- Virtual File System (VFS): encapsulates operations on
file systems (mount, unmount, sync).
- Virtual Node (Vnode): encapsulates objects within a file
system (read, write).
- Protocol
- Use Sun RPC for communication. (uses UDP/IP)
- Use XDR for data representation.
- Stateless: allows for fast, simple recovery (performance
hit).
- Server crash: server does nothing; client resends
requests until it receives an ACK.
Computer Science 261
Copyright 2005 Page 2 of 3
- Client crash: do nothing (application crashed too).
- File Handles: used to identify files in messages. (fsid, file
id, generation number)
- Protocol routines: very similar to Vnode-Ops.
- Server
- Statelessness: implies synchronous operations.
- Not only data, but all meta-data synchronously written on
every update.
- Add generation numbers to distinguish newly created
files from old files.
- Client
- Mount remote file systems into file system name space.
(just like regular mounts)
- The VFS/Vnode interface gets you into the appropriate
code to handle local or remote accesses.
- NFS operations map cleanly into Vnode operations.
- Vnode interface also provides interface to buffer
management system.
Implementation Issues
- Convert kernel to vnodes.
- identify all places that use inodes explicitly.
- convert all calls to jump through vnodes.
- rewrite namei to use vnode op (lookup).
- abstraction cost up to 2% in performance.
- Add RPC and XDR to system.
- took about 3 months.
- tuned RPC round-trip to 8.8 ms. (where did it start?)
- Write the NFS XDR calls.
- modify kernel to do synchronous writes.
- build mount protocol; break out from NFS.
- two types of mounts: hard and soft (retry or fail).
- implement user-level nfsd daemons (nfsd).
Challenging Issues
- Root file systems: punt; no NFS-mounted root (/tmp: files by PID, /
dev; /etc/crontab
- Chaos: enforce by policy.
- Authentication based on uid/gid. Assumes consistent mappings
across machines. This is problematic; provoked development of yp.
- Turn off root mapping on most machines.
- No network locking (still no good solution).
- Deletes while file open: implemented as rename, delete on close
(leaves garbage around in case of crashes).
- Time skew can be problematic.
Performance
- Base performance on common UNIX utilities (compile, tbl, nroff, f77,
sort, matrix inversion, make).
- Measurements: number of runs? standard deviations?
- Improvements (basic engineering):
- Client caching.
- Enlarge UDP packets (2K to 9000).
- Remove one bcopy from path length.
- Added client attribute cache.
- Read-ahead small executables.
- Added name caching.
- Multiple getattr hack finally a part of NFS-version 3.