AFX shows linear scaling during NVIDIA Magnum IO™ GPUDirect ® Storage Testing
NetApp made some announcements last week at NetApp INSIGHT® that will supercharge your organization’s AI investments. We announced a new architecture for AI workloads in the NetApp® AFX 1K; an end-to-end AI data service for GenAI, RAG, and agentic AI in the NetApp AI Data Engine; and much more.
In this blog post, I’ll dive deep into validation of this powerful, AI-ready architecture: NetApp AFX. A proven AI workhorse, AFX is certified to bring disaggregated ONTAP® to NVIDIA DGX SuperPODTM with DGXTM GB300 systems.
Additionally, we completed extensive testing with NVIDIA’s Magnum IO GPUDirect Storage (GDS) capabilities. GDS testing benchmarks performance to evaluate how efficiently data can move between storage systems and NVIDIA accelerated computing using Magnum IO GDS, which enables GPUs to bypass the CPU and system memory when accessing data from storage. Instead of data traveling through multiple layers (CPU, RAM, etc.), it moves directly from storage to GPU memory, reducing latency and increasing throughput.
For enterprises running AI workloads, GDS testing helps answer these critical questions:
The 2025 GDS testing validated how well NetApp AFX supports high-performance AI workloads without compromising enterprise-grade reliability, security, and scalability.
In our latest testing, NetApp achieved 457GiB/s of sustained throughput across 8 AFX nodes for NVIDIA Magnum IO GPUDirect workloads—a 33% increase over last year’s results. Thanks to the disaggregated architecture of AFX, where performance is decoupled from capacity, we have eliminated the need for overprovisioning capacity to meet performance requirements.
The 33% boost in performance was delivered using just 1/8th the storage capacity compared to previous results. We demonstrated the linear scaling of performance with AFX nodes without diminishing returns. Bottom line: NetApp AFX brings the best of enterprise-ready ONTAP, enabling multi-tenancy, nondisruptive scaling, and IT-friendly NFS over RDMA, making it the ideal platform for today’s most demanding AI workloads.
Enterprise AI is no longer a science project. AI workloads now run at the heart of business operations, where disruption, outages, and fragile infrastructure mean real monetary costs. Enterprise infrastructure teams can’t afford brittle, siloed architectures that require forklift upgrades every time performance needs grow, nor can they afford to hire specialized staff to babysit bespoke architectures.
That’s why this year’s results matter: NetApp tested the new AFX disaggregated storage systems, which are built for the AI-powered enterprise. Through disaggregation, these systems deliver independently scalable performance and capacity with linear gains, letting you scale throughput predictably. Customers get high-performance computing capabilities without having to sacrifice security or uptime. Additionally, because it’s ONTAP, AFX integrates seamlessly into the customer’s enterprise data estates and delivers granular, policy-based security so that AI accesses only the specified data.
AFX empowers enterprises to accelerate AI adoption with simplicity, scalability, and seamless integration—without disrupting existing workflows or infrastructure.
We pushed our new disaggregated AFX system to the limits to deliver this amazing result. GPUs are data hungry, and AFX is built to support them. To drive to these system limits, we need a lot of GPUs.
The NetApp Performance Team built a disaggregated AFX storage cluster using 8 AFX nodes and a single disk shelf. Compared with our A90 publication, we used 87% fewer NVMe SSDs to achieve even higher performance.
The AFX controllers used 3 high-performance I/O expansion cards running at 200Gbe with support for up to 400Gbe (NetApp P/N X50131A – NVIDIA ConnectXTM-7). The team stayed with a pair of NVIDIA Spectrum-3 SN4600 for apples-to-apples comparison with our previous GDSIO results.
We continued using a mix of NVIDIA DGX A100 systems and Lenovo SR675v3 servers with NVIDIA L40S GPUs to drive I/O. Clients operated at 200Gbe for consistency across the network. Clients leveraged CUDA 12.6, GDS 1.11.0.15, and cuFile tuned for NUMA locality.
The AFX cluster presented a single FlexGroup volume spanning all 8 AFX nodes, harnessing the full performance of the entire cluster in a single, simple-to-manage namespace. The FlexGroup volume was mounted over NFSv4.1, pNFS enabled, 1M rsize/wsize, session trunking with 16 sessions, and of course Magnum IO GPUDirect Storage enabled for GDSIO to leverage. To enable more than 2 subnets/network links to work as anticipated, we used to ARP configuration values to eliminate cross-subnet responses:
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 2
Benchmark
Note: All reported results are based on cached read performance. For 2-node, we used lower thread count to limit the working set size to fit in AFX system memory.
Results
Throughput scaling
Note: 2-node clusters are not supported for customer deployment, but were used in this scenario to demonstrate scaling.
As AI becomes foundational to business strategy, the infrastructure supporting it must evolve beyond raw performance. It must be scalable, secure, and enterprise ready. NetApp AFX delivers exactly that—HPC-class throughput with the operational simplicity and resilience that enterprises demand.
This year’s testing proves that NetApp isn’t just keeping pace with AI innovation—we’re enabling it. With disaggregated architecture, linear performance scaling, and the power of ONTAP, AFX empowers organizations to grow their AI capabilities without growing complexity. Whether expanding AI across departments or deploying new models into production, NetApp AFX means that your data infrastructure is ready for what’s next.
Check out NetApp AFX
Learn about our AI approach
Pranoop Erasani is vice president of engineering in the Shared Platform team at NetApp, where he is responsible for AI/ML, NAS, and replication technologies for NetApp’s hybrid cloud storage platform, ONTAP. In this role, he is tasked with building a cost-effective next-generation ONTAP data platform optimized for AI/ML applications. Leveraging the best of ONTAP data management capabilities, the next-generation data platform for AI/ML will not only be optimized for all AI workflows of training, inference, and checkpointing, it will also provide the ability to make data ready for AI/ML applications natively on storage.