It’s no secret that data is growing at a mind-boggling pace. In fact, IDC predicts that the collective sum of the world’s data will grow to 175ZB by 2025—a compounded annual growth rate of 61%. IDC also estimates that 90ZB of that data will be created on Internet of Things (IoT) devices. What happens to all this data that is being created?
Data from multiple sources—seismic research, medical images, IoT devices, and more—is captured in storage and then streamed to NVIDIA GPU-accelerated applications where data analytics and artificial intelligence (AI) are applied. This process requires a high-performance computing (HPC) infrastructure that enables fast and reliable access to petabytes of data.
Traditional HPC solutions are configured in such a way that the CPU controls the loading of data from the storage to the GPU for processing. As the trend for slower CPUs and faster GPUs takes hold, input/output bottlenecks on the path between the storage and GPU are becoming a problem—especially in use cases where access to real-time data is crucial.
NetApp and NVIDIA are collaborating to deliver an innovative way to accelerate HPC workloads. NVIDIA Magnum IO is a multi-GPU, multinode networking and storage I/O optimization stack. Its set of APIs integrate computing, networking, distributed file systems, and storage elements to maximize I/O performance and functionality for multi-GPU, multinode accelerated systems. NVIDIA Magnum IO interfaces with CUDA-X HPC and AI libraries to accelerate I/O for a broad range of AI, HPC, data analytics, and visualization use cases.
NVIDIA Magnum IO includes innovative I/O optimization technologies such as the NVIDIA Collective Communications Library (NCCL), NVIDIA GPUDirect remote direct memory access (RDMA), and NVIDIA Fabric Manager. GPUDirect Storage, a key feature of NVIDIA Magnum IO, opens a direct data path between GPU memory and storage, avoiding the CPU altogether. This direct path can increase bandwidth, decrease latency, and decrease the utilization load on the CPU and GPU. By relieving the I/O bottleneck created by the CPU, the GPU has full and free access to the data it needs.
The key to keeping the data free flowing is to have a storage system that can match the pace of the NVIDIA GPU.
With support for RDMA, NetApp® EF600 all-flash NVMe storage delivers the speed and reliability required to continuously feed data to GPU applications. EF600 systems deliver up to 2M sustained IOPS, response times under 100 microseconds, 44GBps of throughput, and 99.9999% availability. Because E-Series systems offer massive scalability, you can easily accommodate data that’s coming in from the IoT as well as data that’s generated from machine learning and deep learning training.
With industry-leading density, NetApp E-Series storage helps reduce your power, cooling, and support costs to significantly lower your TCO.
NVIDIA Magnum IO is a game changer for HPC environments. To learn more about the groundbreaking technology behind it, read this NVIDIA Developer Blog: GPUDirect Storage: A Direct Path Between Storage and GPU Memory. To learn more about RDMA at NetApp, read the NetApp E-Series and NVMe over Fabrics Support technical report.
Stan Skelton is Chief Architect and Senior Director of Business Development for the NetApp E-Series product line. With nearly 4 decades of experience in the industry, Stan has held extensive roles in engineering, product management, advanced development, architecture, and business development at NCR, AT&T, Symbios, LSI, Engenio, and NetApp. A true visionary, Stan is continually looking beyond the horizon to the market’s future. When Stan is not looking into the next major technology innovation, he can most likely be found traveling with his wife, riding a bicycle, or both. An avid cyclist, his passion for bicycles includes everything from building and riding them to immersing himself into the culture and studying the industry as a great example of innovation.