NetApp Tech OnTap

Accelerating Shared Data Access for
Compute Clusters

Access to shared data is critical to the performance of today's compute clusters running a variety of scientific, engineering, and business applications. The most widely used standard for shared data access-NFS-can become a bottleneck for large-scale compute clusters, which can overwhelm the file servers that are the single point of access for all files in a shared file system.

Figure 1) Standard NFS file servers may become a bottleneck as the size of your compute cluster grows. Latency (L) increases to unacceptable levels as throughput (T) approaches server limits.

Unfortunately, most of the solutions available to provide higher-performance, shared data access have been more or less proprietary and have failed to gain the kind of heterogeneous system support and widespread adoption that standard protocols such as NFS have achieved.

Parallel NFS is a new standard-part of the NFS version 4.1 protocol specification-that addresses the single server bottleneck and has great promise to become the solution for parallel data access. In this article we'll explain how pNFS works and describe the current state of the standard effort.

Understanding pNFS

The pNFS protocol gives clients direct access to files striped across two or more data servers. By accessing multiple data servers in parallel, clients achieve significant I/O acceleration. The pNFS protocol has been designed to deliver graceful performance scaling on both a per-client and per-file basis, without sacrificing backward compatibility with the standard NFS protocol; clients without the pNFS extension are still able to access data.

pNFS Architecture and Core Protocols

The pNFS architecture consists of three main components:
  • The metadata server (MDS), which handles all nondata traffic. The metadata server is responsible for maintaining metadata that describes where and how each file is stored.
  • Data servers, which store file data and respond directly to client read and write requests. File data can be striped across a number of data servers.
  • One or more clients that are able to access data servers directly based on information in the metadata received from the metadata server.
There are three types of protocols used between the clients, metadata server, and data servers:
  • A control protocol is used between the metadata server and data servers to provide synchronization.
  • pNFS protocol is used between clients and the metadata server. This is essentially NFSv4 with a few pNFS-specific extensions. It is used to retrieve and manipulate layouts, which contain the metadata that describes the location and storage access protocol required to access files stored on multiple data servers.
  • A set of storage access protocols used by clients to directly access data servers. The pNFS specification currently has three categories of storage protocols: file-based, block-based, and object-based. These allow pNFS to accommodate various layout types to support different kinds of storage infrastructure.

Layout Types

The storage access protocol that is employed depends on the type of storage on the underlying data servers. The layout for a file that the metadata server sends to a client provides the client with the information to determine where each stripe of a file is stored, how to access it, and with what protocol.

  • When the file layout is employed, pNFS uses multiple NFSv4.1 file servers as its data servers. NFSv4 itself serves as the file access protocol.
  • When the block layout is used, disk LUNs are hosted on a SAN. Either the iSCSI or Fibre Channel protocol is used to access SAN devices using the SCSI block command set.
  • Object layouts allow data to be stored on object-based storage devices (OSDs) and accessed via the T-10 object-based storage device protocol currently being standardized.
Example of File Access

Figure 2) Elements of pNFS. Clients request layout from metadata server (1, pNFS protocol) and then access data servers directly (2, storage access protocol).

Regardless of the layout type, to access a file a client contacts the metadata server to open the file and request the file's layout. Once the client receives the file layout, it uses that information to perform I/O directly to and from the data servers in parallel, using the appropriate storage access protocol without further involving the metadata server. When the client completes its I/O, it sends modified metadata to the metadata server and closes the file.

Choosing a Layout Type

Here are a few criteria you will want to consider when deciding on a layout to use with pNFS:

Starting Point

  • If you are starting from a NAS framework to begin with, a file layout obviously makes the most sense. You can use your existing NAS systems and networks.
  • If you have an existing Fibre Channel SAN, you will probably want to choose a block-based layout.
  • If you are starting from scratch, keep reading.

Network Infrastructure

  • With a file-based layout, you can use your existing NAS storage and your existing Ethernet infrastructure. You don't necessarily have to add more bandwidth.
  • With a block- or object-based layout, you most likely will have a Fibre Channel storage area network (SAN). All clients will have to connect to the FC SAN to access data servers directly. iSCSI may be a less expensive alternative.


  • With a file-based back end, security is enforced by each data server, which uses the same methods as a standard NFS server, including Kerberos authentication, ACLs, and so on. Security is well understood and familiar.
  • Using pNFS with a block- or object-based back end puts a large part of the burden for security on the pNFS client implementation. Since the client is part of the client operating system, you might not have much control over the security it provides (or does not provide). In other words, you might not have much choice in choosing client implementations, so you might prefer to choose a layout where the data servers are responsible for security.

Multiple Client Access and Management

  • With a file-based layout, two different pNFS clients can access the same logical region of the same file for reading or writing.
  • With the block- or object-based layout, only one pNFS client at a time can hold a writable layout to a region of a file.

Availability of Building Blocks

  • Available sources for object-based back-end storage might be limited.

Current Status of pNFS

The pNFS Standard

The NFSv4.1 standards effort, of which pNFS is a part, is a broad-based effort within the Internet Engineering Task Force (IETF). The working group includes members from a broad cross section of leading storage and system vendors and researchers such as NetApp, EMC, IBM, University of Michigan, and Sun Microsystems. The NFSv4.1 and pNFS standard is nearing completion, and the IETF is expected to finalize the specification before the end of 2008.

NetApp has been a major driver of both NFSv4.1 and pNFS, cochairing the efforts of the working group. In addition, NetApp has authored and edited a significant portion of the NFSv4.1 specification. This is consistent with our commitment to tackle the problems of storage using industry standards.

For more information on the IETF specification for NFSv4.1 and pNFS, visit the IETF NFS v4 working group Web site.

NFSV4.1/pNFS Testing

Interoperability is obviously essential to the adoption of pNFS, as it was for NFS. Having client and server implementations that can work seamlessly together will accelerate adoption. Interoperability testing of various pNFS implementations has been under way since March 2005. NFSv4.1 and pNFS have been tested at the annual Connectathon, a vendor-neutral forum for testing hardware and software interoperability.

In addition, Bake-a-thons are held several times a year. The most recent Bake-a-thon was held in September 2008. Six pNFS server implementations and two client implementations were tested. No issues were found with either the NFSv4.1 or the pNFS specifications, so the protocol is maturing nicely. You can usually find the latest status on pNFS testing at Mike's NFS blog.

Linux Client and Server Development

Because of the prevalence of Linux® in the compute clusters used by many of the scientific, engineering, and other applications that stand to benefit most from pNFS, it is important to have a well-designed and tested pNFS client for Linux that will meet these applications' performance needs. NetApp and others have recognized this need and are investing in the creation of a robust Linux pNFS client. As the maturity of this client increases, the implementation will be part of the mainline Linux kernel. This will allow pNFS to be used on Linux machines without the need to install and maintain any additional software.

NetApp is contributing heavily to the pNFS client and the file layout driver and is also developing a server for Linux. Having a simple pNFS server in addition to the client will give potential pNFS adopters a platform for testing, proofs of concept, and familiarization.


With a recognized standard nearing completion, widespread vendor support, and the ability to leverage existing storage infrastructure, pNFS looks like it has a good shot at becoming a widely adopted standard for high-performance, parallel I/O capable of meeting the needs of any application that requires more I/O performance than a single file server can provide.

Got Opinions About pNFS?

Ask questions, exchange ideas, and share your thoughts online in NetApp communities.
Mike Eisler

Mike Eisler
Senior Technical Director

Mike is the leader of NetApp's NFS-related development efforts. He is the author of the NFSv4 specification and several other specifications relating to NFS and security. Mike's first exposure to NFS and NIS came while working for Lachman Associates, Inc., where he was responsible for porting NFS and NIS to System V platforms. He joined NetApp from Sun, where he was responsible for several NFS and security-related projects.

Joshua Konkle

Joshua Konkle
Technical Evangelist for NAS and Engineering Applications

Joshua champions technologies and solutions that help customers be more productive. His background includes both UNIX® and Windows® experience, with emphasis on security. He has spoken on numerous storage- and security-related topics at various industry and technical venues.