Menu

Simplifying the AI-powered enterprise

AFX shows linear scaling during NVIDIA  Magnum IO GPUDirect ​​®​​  Storage Testing 

Table of Contents

Share this page

Pranoop Erasani
Pranoop Erasani
892 views

NetApp made some announcements last week at NetApp INSIGHT® that will supercharge your organization’s AI investments. We announced a new architecture for AI workloads in the NetApp® AFX 1K; an end-to-end AI data service for GenAI, RAG, and agentic AI in the NetApp AI Data Engine; and much more

In this blog post, I’ll dive deep into validation of this powerful, AI-ready architecture: NetApp AFX. A proven AI workhorse, AFX is certified to bring disaggregated ONTAP® to NVIDIA DGX SuperPODTM with DGXTM GB300 systems. 

Additionally, we completed extensive testing with NVIDIA’s Magnum IO GPUDirect Storage (GDS) capabilities. GDS testing benchmarks performance to evaluate how efficiently data can move between storage systems and NVIDIA accelerated computing using Magnum IO GDS, which enables GPUs to bypass the CPU and system memory when accessing data from storage. Instead of data traveling through multiple layers (CPU, RAM, etc.), it moves directly from storage to GPU memory, reducing latency and increasing throughput.

For enterprises running AI workloads, GDS testing helps answer these critical questions: 

  • Can the storage system keep up with GPU demand? 
  • Will performance scale predictably as workloads grow? 
  • Can this level of performance be achieved without disrupting operations or requiring specialized infrastructure? 

The 2025 GDS testing validated how well NetApp AFX supports high-performance AI workloads without compromising enterprise-grade reliability, security, and scalability. 

NetApp AFX testing results

In our latest testing, NetApp achieved 457GiB/s of sustained throughput across 8 AFX nodes for NVIDIA Magnum IO GPUDirect workloads—a 33% increase over last year’s results. Thanks to the disaggregated architecture of AFX, where performance is decoupled from capacity, we have eliminated the need for overprovisioning capacity to meet performance requirements.  

The 33% boost in performance was delivered using just 1/8th the storage capacity compared to previous results. We demonstrated the linear scaling of performance with AFX nodes without diminishing returns. Bottom line: NetApp AFX brings the best of enterprise-ready ONTAP, enabling multi-tenancy, nondisruptive scaling, and IT-friendly NFS over RDMA, making it the ideal platform for today’s most demanding AI workloads. 

AFX: Built for the AI-powered enterprise

Enterprise AI is no longer a science project. AI workloads now run at the heart of business operations, where disruption, outages, and fragile infrastructure mean real monetary costs. Enterprise infrastructure teams can’t afford brittle, siloed architectures that require forklift upgrades every time performance needs grow, nor can they afford to hire specialized staff to babysit bespoke architectures. 

That’s why this year’s results matter: NetApp tested the new AFX disaggregated storage systems, which are built for the AI-powered enterprise. Through disaggregation, these systems deliver independently scalable performance and capacity with linear gains, letting you scale throughput predictably. Customers get high-performance computing capabilities without having to sacrifice security or uptime. Additionally, because it’s ONTAP, AFX integrates seamlessly into the customer’s enterprise data estates and delivers granular, policy-based security so that AI accesses only the specified data. 

The AI advantages of AFX in the enterprise

AFX empowers enterprises to accelerate AI adoption with simplicity, scalability, and seamless integration—without disrupting existing workflows or infrastructure. 

  • Simple. AFX uses the same IT-friendly NFS over RDMA stack you already know, with pNFS and session trunking for full-cluster performance from a single mount point. No exotic file systems, no specialized training. Grow your AI capabilities without growing your headcount. 
  • Scalable. Add compute nodes or storage enclosures independently to meet performance and capacity demands. Scale linearly without rearchitecting your environment or overprovisioning. 
  • Silo free. AFX runs the same ONTAP data management OS that’s trusted for over 100 exabytes of enterprise data, delivering hybrid cloud integration and multi-tenant isolation and enabling you to rapidly onboard AI for every business function, all on one platform. 

Nonstop innovation

  • 2023. We proved that ONTAP could deliver HPC-class performance with NVIDIA GPUDirect Storage, hitting 171GiB/s on an AFF A800 cluster. 
  • 2024. We doubled that on AFF A90—351GiB/s on a 4-system cluster—while simplifying RDMA/NFS operations and enabling nondisruptive upgrades. 
  • 2025. We delivered 457GiB/s with NetApp AFX disaggregated storage—a 33% performance increase with 1/8th of the capacity deployed.

How we tested

We pushed our new disaggregated AFX system to the limits to deliver this amazing result. GPUs are data hungry, and AFX is built to support them. To drive to these system limits, we need a lot of GPUs. 

The NetApp Performance Team built a disaggregated AFX storage cluster using 8 AFX nodes and a single disk shelf. Compared with our A90 publication, we used 87% fewer NVMe SSDs to achieve even higher performance. 

The AFX controllers used 3 high-performance I/O expansion cards running at 200Gbe with support for up to 400Gbe (NetApp P/N X50131A – NVIDIA ConnectXTM-7). The team stayed with a pair of NVIDIA Spectrum-3 SN4600 for apples-to-apples comparison with our previous GDSIO results. 

We continued using a mix of NVIDIA DGX A100 systems and Lenovo SR675v3 servers with NVIDIA L40S GPUs to drive I/O. Clients operated at 200Gbe for consistency across the network. Clients leveraged CUDA 12.6, GDS 1.11.0.15, and cuFile tuned for NUMA locality. 

The AFX cluster presented a single FlexGroup volume spanning all 8 AFX nodes, harnessing the full performance of the entire cluster in a single, simple-to-manage namespace. The FlexGroup volume was mounted over NFSv4.1, pNFS enabled, 1M rsize/wsize, session trunking with 16 sessions, and of course Magnum IO GPUDirect Storage enabled for GDSIO to leverage. To enable more than 2 subnets/network links to work as anticipated, we used to ARP configuration values to eliminate cross-subnet responses: 

net.ipv4.conf.all.arp_announce = 2 

net.ipv4.conf.all.arp_ignore = 2 

System deployment

Storage Systems

Benchmark 

  • NVIDIA GDSIO tool for GDS validation 
  • Sequential read workloads for model loads; multithreaded streams 
  • GDSIO parameters (consistent with previous years):
    • -s 1g Size of the files used by each GDSIO thread 
    • -i 256k Transfer block size 
    • -x 0 Transfer mode = GPUDirect 
    • -I 0 – Transfer type = read 
    • -T 3600 Tuntime = 1 hour 
    • -D <mount point> Single mount point used across all hosts  
    • -d <GPU device> 
    • -n <numa node> Optimal NUMA node for each GPU determined by NVIDIA-smi topo -m 
    • -w <number of threads> Number of threads per mount point was adjusted on a per-host basis as the system was scaled 

Note: All reported results are based on cached read performance. For 2-node, we used lower thread count to limit the working set size to fit in AFX system memory.  

Results 

Throughput scaling 

  • 2 nodes → 114 GiB/s 
  • 4 nodes → 232 GiB/s 
  • 8 nodes → 457 GiB/s 

Note: 2-node clusters are not supported for customer deployment, but were used in this scenario to demonstrate scaling. 

Key insight: We achieved these results without adding any storage capacity.
AI innovation delivered: NetApp AFX is enterprise ready

As AI becomes foundational to business strategy, the infrastructure supporting it must evolve beyond raw performance. It must be scalable, secure, and enterprise ready. NetApp AFX delivers exactly that—HPC-class throughput with the operational simplicity and resilience that enterprises demand. 

This year’s testing proves that NetApp isn’t just keeping pace with AI innovation—we’re enabling it. With disaggregated architecture, linear performance scaling, and the power of ONTAP, AFX empowers organizations to grow their AI capabilities without growing complexity. Whether expanding AI across departments or deploying new models into production, NetApp AFX means that your data infrastructure is ready for what’s next. 

Check out NetApp AFX

Learn about our AI approach 

Pranoop Erasani

Pranoop Erasani is vice president of engineering in the Shared Platform team at NetApp, where he is responsible for AI/ML, NAS, and replication technologies for NetApp’s hybrid cloud storage platform, ONTAP. In this role, he is tasked with building a cost-effective next-generation ONTAP data platform optimized for AI/ML applications. Leveraging the best of ONTAP data management capabilities, the next-generation data platform for AI/ML will not only be optimized for all AI workflows of training, inference, and checkpointing, it will also provide the ability to make data ready for AI/ML applications natively on storage.

View all Posts by Pranoop Erasani

Next Steps

Drift chat loading