Talk to enterprise data storage vendors today, and all you hear is NVMe-this, NVMe-that, blah blah blah.
There's a good reason for all the noise. Implementing NVMe (NVM Express) as the end-to-end data transfer protocol in your SAN environment can significantly improve throughput and reduce latency, delivering a vastly better experience to your users. But the "end-to-end" part is crucial; unfortunately, many of the NVMe data storage products now on the market deliver only a small fraction of NVMe's potential performance improvements.
That's because the NVMe data transfer standard has two distinct aspects:
The reason it's important to read the fine print is that in most cases, less than 20% of the potential speed boost of NVMe comes from using back-end NVMe media, with 80% or more of the benefit coming from using NVMe-oF to replace SCSI-based front-end data transfer protocols. Some data center marketing blogs are actually pure baloney, so always ascertain whether the storage system in question is actually running NVMe-oF rather than just back-end NVMe flash media.
Bringing NVMe's massive parallelism to the Data Fabric promises to deliver huge performance improvements. So the question faced by IT leaders and architects is which flavor of the fabric to adopt, given that there are big differences in performance, reliability, and cost.
From its inception in 2016, the NVMe-oF standard was designed to ensure that the NVMe command set could be transported by the widest possible variety of fabric and network transports.
Today, the IT world's main data transport protocols are:
The three corresponding types of fabrics supported by NVMe are:
Let's look at the technology underlying these three ways of implementing NVMe across a Data Fabric, and then examine the pros and cons of each approach.
The main data transfer protocols used by SAN systems today are FC Protocol, iSCSI, and FCoE. You can ignore those acronyms from now on, because they're all built on top of SCSI, a set of 1970s interface standards that were designed for floppy disks and hard disk drives.
The NVMe standard was developed over the past decade and is specifically designed to take full advantage of flash memory, solid-state drives (SSDs), NVMe-attached SSDs, and even storage technologies that haven't been invented yet. Instead of SCSI's single command queue (with a depth of 32 commands), NVMe supports 65K queues with 65K commands per queue, which means that a far greater number of commands can be executed simultaneously.
The first iterations of NVMe focused on optimizing I/O between a host computer and local NVMe media connected across a high-speed Peripheral Component Interconnect Express (PCIe) bus. When it evolved to NVMe-oF, a key design objective was to ensure that it supported the widest possible variety of fabric and network protocols. Today that means three main data transport protocols: NVMe/FC, NVMe over RDMA (NVMe/RDMA), and NVMe/TCP.
Most enterprises currently entrust their mission-critical workloads to FC-based SAN systems, because of their consistently high speed, efficiency, and availability.
RDMA is a way of exchanging data between the main memory of two computers in a network without involving the processor, cache, or OS of either computer. Because RDMA bypasses the OS, it is generally the fastest and lowest-overhead mechanism for communicating data across a network.
There are two main RDMA variants in enterprise computing: InfiniBand and RDMA over Converged Ethernet (RoCE).
InfiniBand was one of the earliest implementations of RDMA, and is known for blazing-fast performance. NetApp has been shipping E-Series hybrid and all-flash arrays supporting 100Gbps InfiniBand since 2017, providing sub-100-microsecond latency for big data analytics workloads. Despite its advantages, InfiniBand is not as popular as its close relative, RoCE, nor the enterprise standard FC.
Among RDMA protocols, the up-and-coming contender is RoCE, which runs on Converged Ethernet, a set of data center bridging (DCB) enhancements to the Ethernet protocol that aim to make it lossless. RoCE v1 operates at layer 2, the data link layer in the Open Systems Interconnection (OSI) model. Therefore, it cannot route between subnets, so it only supports communication between two hosts in the same Ethernet network. RoCE v2 is much more valuable because it uses User Datagram Protocol (UDP), and thus, like NVMe/TCP, operates at OSI layer 3 and can be routed.
To date, the cost of FC or InfiniBand networks has kept some organizations out of the NVMe-oF market. To address this gap in the market, NetApp and other members of the NVMe.org consortium developed and published a new NVMe-oF standard (NVMe/TCP) that uses an Ethernet LAN with TCP datagrams as the transport.
In fact, in November 2018 the NVMe standards body ratified NVMe/TCP as a new transport mechanism. In the future, it's likely that TCP/IP will evolve to be an important data center transport for NVMe.
For enterprise IT architects planning to upgrade their infrastructure to support NVMe-oF, the main question is which fabric. Naturally, the answer will depend on the contents of their current infrastructure, plus their plans and budgets for the future.
The other key factor is timing. NVMe/RoCE v2 shows great potential, but it will probably need a couple more years to evolve before it's ready to reliably take on tier 1 enterprise workloads. And NVMe/TCP also looks likely to deliver excellent price/performance value when the technology matures, but that's also a few years down the road.
For now, most IT architects have concluded that FC provides the most mature data transfer protocol for enterprise mission-critical workloads, making NVMe/FC the right fabric choice. A 2018 report from the technical analysts at Demartek, Performance Benefits of NVMe over Fibre Channel, confirms the magnitude of the performance gains attributable to the NVMe/FC fabric, shown in the following figure.
For a typical Oracle workload running on a NetApp AFF 700 system, IOPS were around 50% higher for NVMe/FC than for SCSI FC Protocol. The lab tests were performed on a single-node AFF A700 system by using a simulated Oracle workload with an 80/20 read/write mix at 8KB block size (typical OLTP database I/O), plus a small amount of 64KB sequential writes (typical redo logs). The results showed that NVMe/FC achieved 58% higher IOPS at 375μs latency, as compared with SCSI FC Protocol.
We've seen similar results in our labs with the AFF A800 SAN storage systems, which have been shipping since May 2018. These systems deliver complete end-to-end NVMe connectivity, with both NVMe-attached flash media and NVMe/FC connectivity across the fabric between storage controllers and hosts. The test results confirm that although NVMe-attached media at the back end provide a measurable performance boost when running Oracle apps on the AFF A800, the highly parallelized front-end NVMe-oF contributes most of the improvement.
It's the best of both worlds: they're able to nondisruptively implement today's most mature storage networking technology, while preparing for the all-NVMe future that's coming.
To learn more, read the Performance Benefits of NVMe over Fibre Channel report (no registration required).
Mike Kieran is a Technical Marketing Engineer at NetApp, responsible for SAN and business applications. He has more than 20 years of experience as a technology strategist and communicator, focusing on the concerns of enterprise data center buyers. Mike’s work centers on creating messaging programs, go-to-market materials, and technical content. He writes and edits technical reports, white papers, presentations, customer stories, and blog posts. Mike was previously at Nimble Storage, Gigya, Socialtext, and E-Color, where he managed teams responsible for creating outstanding customer satisfaction programs and compelling B2B tech marketing content. Mike studied physics and astronomy at the University of Toronto, and is the author of four books and 150+ magazine features, primarily on digital imaging technology. Many evenings you’ll find him in his woodshop, quietly building heirloom-quality hardwood toys and furniture.