BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Hi, I'm Dave Arnett. I'm a principal technical marketing engineer with the NetApp solutions team focused on GPU accelerated workloads. I'm going to be talking about GPU direct storage from Nvidia, which is a hostbased software capability that can dramatically improve the storage performance for workloads that can take advantage of it. Now, obviously, this is a storage solution, and I'm going to talk about the NetApp solutions for GDS in a minute, but I'm going to start with a quick explanation of how GDS improves performance inside the host. Remote direct memory access has been around forever and provides a means for a system to move data from an external device through the nick into main system memory without invoking a CPU call. This means it doesn't invoke system calls or network protocols that would add latency. And this allows the CPU to continue running the actual workload processes and the processes themselves can make sure the memory the data they need is resident in memory when they need it. Now this is great when the process is running on the CPU which is right next to the memory banks. But what happens when the process is running on a GPU on another device? Um, this would ultimately without GDS require another CPU interrupt and some bounce buffering to move data over into the GPU. Now, that happens pretty quick because the RDMA transfer is already quick and the rest is internal to the system, but it still adds latency and with some of the workloads coming down the pipe, we need to reduce the latency as much as possible. GDS then in order to do that, Nvidia developed a couple of key technologies. The first is the ability to expose the memory space on the GPU so that RDMA processes can access that memory directly just like they do system memory. And the second is the ability for devices on the PCI bus to be able to transfer data directly. So the nick can move data directly into the GPU without involving the CPU. Now all this is happening inside the host. And that means from a storage system perspective, it's still IO that we handle just like any other IO as long as the system supports RDMA. We've been supporting RDMA on the E-series for a long time using Infiniband and NetApp now sells and supports the BGFS uh file system which allows customers to scale performance and capacity as far as they'd like to go. This is the solution we're working to certify on the NetApps I mean the NVIDIA superpod architecture for worldclass computing and that's typically used for things like largecale nang natural language processing or HPC simulations that would use the entire cluster for a single job and now starting with onap 9.10 10 on tap will support NFS over RDMA and be able to support GDS in environments for customers who either don't have or don't want Infiniband or parallel file systems or for use cases and workloads where the native data management capabilities of ONAP can really help streamline data pipelines or data science workflows in very largecale data uh artificial intelligence development environments. So, I hope this explanation of GDS has been helpful. If you'd like more information, please go to netapp.com/ai. Thanks very much and have a great day.
Dramatically improve storage performance for GPU workloads. Keep your data free flowing with a system that can keep pace with the NVIDIA GPU.