In my last blog, NetApp end-to-end NVMe, I ended by referring to the innovations NetApp made in its EF-Series controllers. As great as supercomputing performance is for solving big science problems like gene sequencing, NetApp wanted to make this technology available to everyone. Although it was tempting to join the marketing hype, a dedication to solid engineering meant we had to wait for a few more pieces of the puzzle before we could broaden the use case for end-to-end NVMe into the data center, including operating system support and network readiness.
Network readiness was the biggest hurdle, and although InfiniBand is common in supercomputing environments, it’s nowhere to be found in most data centers. Many people, including me, were enthusiastic about RDMA over Converged Ethernet (RoCE). The technical underpinnings for it are similar to the ill-fated Fibre Channel over Ethernet (FCoE), which also seemed promising but never gained the expected market adoption. FCoE looked great on paper, but most network teams simply didn’t put enough care into designing storage-class, lossless networks with Ethernet. That’s also true for RoCE, usually pronounced as “rocky,” which seems oddly prophetic given the experiences early adopters faced when trying to build resilient networks and tweaking operating systems to make those networks work. In several cases, a brand new dedicated Ethernet network switch and RDMA-capable network interface cards were needed.
This technology is what Pure Storage chose, first when they used relatively low bandwidth 25GbE RoCE connections for their external NVMe shelf in May 2018 and later when they used it in 2019 for host connections.
Coincidentally, in May 2018, just as Pure started to deliver NVMe over Fabrics for shelf connections, NetApp released the industry’s first true end-to-end NVMe array with NVMe all the way from the server to the media. While we were developing this array, we looked at all options. If you’re interested in what went into that analysis, check out When You’re Implementing NVMe Over Fabrics, the Fabric Really Matters. Like Pure, NetApp also opted for RoCE for its external NVMe shelf, though we took it far more seriously: The NS224 shelf uses multiple 100GbE connections to provide up to 400Gbps to avoid bottlenecks. Unlike Pure, NetApp didn’t expect its customers to build an entirely new Ethernet-based storage environment to get the benefits of end-to-end NVMe. We met customers where they were.
The most commonly deployed high-performance, high-resilience storage-class lossless network in the data center was—and still is—Fibre Channel (FC). As NetApp’s high-performance database expert Jeff Steiner says, “Fibre Channel is just better.... FC is everywhere, FC was designed for storage networking, it has built-in quality of service, and FC zones are a proven way of securing your data.” So when we released the world’s first end-to-end NVMe array, we designed it so it could communicate with servers by using NVMe over Fibre Channel (NVMe/FC).
The following illustration compares a traditional FC array with local NVMe and true end-to-end NVMe.
With this release, NetApp was the first to give customers the kinds of NVMe improvements that Pure had been hyping when they released their FlashArray //X platform.
With the release of the NetApp® AFF A800 all-flash storage system, we extended the benefits of centralized data services by delivering performance and CPU efficiency similar to that of local NVMe media. Furthermore, because most of the functionality is delivered by software, just as it was with the NetApp EF570 all-flash array, these benefits are available to almost every NetApp customer. This includes customers using arrays running NetApp ONTAP® software from as far back as 2014, even if they still use SAS media, all through a simple, nondisruptive software upgrade.
These facts support the superiority of our software-defined approach and rigorous engineering, and the proof shines in all the available benchmark data provided by both NetApp and Pure. I’ll talk more about those benchmarks later, but keep in mind that when Pure announced their proprietary NVMe media, they claimed: “This end-to-end, software-to-raw flash optimization reduces latency by up to 50 percent, and increases write bandwidth by up to 2x and performance density by up to 4x.” This was almost 2 years before they actually released a true end-to-end product with NVMe all the way through to the host. When they finally released a true end-to-end NVMe array, and were asked about how much better the performance would really be, Pure was uncharacteristically meek in their Computer Weekly interview.
Regarding performance, Pure has kept its cards close to its chest about gains expected in the new arrays. No IOPS or throughput specs have been furnished by the array maker for the new systems while specs for existing arrays have also disappeared from Pure’s website.
According to Pure, IOPS specs don’t reflect the performance of the arrays in real conditions and the firm therefore prefers to keep silent about the subject. Pure has indicated, however, that the latency of the new arrays will be close to 250µs and that customers can expect significant gains in throughout.
Keep that 250-microsecond number in mind for when you reach the end of this blog.
In 2018, with the release of ONTAP 9.5, NetApp began delivering on the promise of end-to-end NVMe—not just as a science experiment, but as something you can use for mission-critical workloads. Since then, we’ve invested heavily in integration work with other leading vendors such as Brocade and VMware. We do this integration to make it easy for our customers to get the benefit of NVMe while still leveraging all the built-in goodness of ONTAP and its hybrid cloud ecosystem. For example, in 2019, with ONTAP 9.6 we introduced features like 512-byte blocks and Atomic Test and Set (ATS). Those features make it significantly easier to use NVMe with VMware, and in May 2021 we added support for vVols with ONTAP 9.9.1.
This level of integration, as outlined in Power of 3: NetApp, VMware, and Broadcom Redefining Enterprise IT Architecture, lets NetApp customers double the performance of core business applications running on VMware just by switching to NVMe. No extra hardware is required—just better, smarter software.
It’s true that Pure finally released NVMe/FC, but unusually for them, they didn’t make a big deal about it. Maybe that’s because they were well behind NetApp in delivery, or maybe it’s because they shouldn’t have bothered. Although Pure claimed that their direct flash end-to-end NVMe was going to bring the same performance as local NVMe, their benchmarks seem to indicate that what they ended up delivering was far short of the 250-microsecond latency they were aiming for. If you really want to reduce latency by 50% and get 4 times the performance density, you should be speaking to NetApp.
NetApp’s innovation story doesn’t stop there. Some of you might have already guessed what we’re about to announce, but if you can wait just a little longer, all will be revealed in my next blog post. In the meantime, to learn more about the AFF end-to-end NVMe systems, visit the AFF A-Series product page, and our market comparisons microsite where we detail why NetApp is best for flash.
Ricky Martin leads NetApp’s global market strategy for its portfolio of hybrid cloud solutions, providing technology insights and market intelligence to trends that impact NetApp and its customers. With nearly 40 years of IT industry experience, Ricky joined NetApp as a systems engineer in 2006, and has served in various leadership roles in the NetApp APAC region including developing and advocating NetApp’s solutions for artificial intelligence, machine learning and large-scale data lakes.