BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Hi, I'm Chris Weber, a software engineer at NetApp. And today I'm here to talk about the video super pod with NetApp storage. Similar to how the cloud changed computing and data storage across the world, artificial intelligence and machine learning are transforming how businesses operate today. Organizations at the forefront of this change, however, are finding out that working with A.I. at scale is extremely difficult. You essentially need to design, build and maintain an entire supercomputer. That's where NVIDIA's SuperPOD reference architecture comes in. It takes the hard work out of getting massive A.I. infrastructure up and running quickly. And now NetApp is proud to be a certified storage partner for the SuperPOD reference architecture. Today, I'm going to talk more about SuperPOD, the value that NetApp brings as the certified storage behind SuperPOD, why we chose BeeGFS as our parallel file system and cover a few of the integrations we've added to BeeGFS along the way. So what is SuperPOD? Like I mentioned, it's a reference architecture that for a supercomputer essentially, but it's more than just the hardware. It's all of the libraries and application frameworks that you need to get started with A.I. and in addition to that, it has NVIDIA's Base Command manager to help monitor, report and orchestrate the entire cluster. And what we're really excited about is that now NetApp is the certified storage in the hardware stack of a SuperPOD reference architecture. So let's dig into this a little bit more. The nice thing about SuperPOD is it is a validated deployment with the best of breed of compute, networking and storage. A superPOD starts at 20 A100 nodes in one scalable unit, and you can add these 20 nodes at a time to size whatever size of SuperPOD that you need for your workloads. 20. This is a lot of power, so the use cases that we're targeting are A.I. research, medium to large scale A.I. training simulation models, deep learning. We're talking about enterprises with big labs or universities and defense research institutions. And just in case you're curious, what is scalable unit of a SuperPOD looks like, here's one example. And you can see the NetApp storage off on the right side of this SuperPOD deployment. So we built on the back end, the SuperPOD reference architecture has EF600s from NetApp as the block file storage. If you're not familiar with E- Series, we've shipped over one million systems to date and we have a field proven six nines of reliability on the air and E-Series systems. These were designed for high availability, so you won't have any data outages or loss of data in your SuperPod deployment. And this the reference architecture, it lets us scale up and add blocks as we need them. So BeeGFS. Why did we pick that. Its hardware vendor neutral. You don't get locked in.theory, you could swap out the NetApp storage for something else. But we have optimized BeeGFS with our EF600s and built a lot of integrations around that as well. But for us in general, it was designed to be a high performance computing file system for when you reach those workloads where you need a parallel file system. The nice thing about it is it's a modern one. You don't need to go re compiling your kernel to get it installed and up and running. It's easy to deploy and maintain and monitor, and it supports very many of the Linux distributions out there. I mentioned some of the BeeGFS integrations that NetApp has done to date. We have Ansible playbooks to help deploy BeeGFS and EF600s and a SuperPOD deployment. It's really as simple as just setting the APIs and running some playbooks. We have developed an open source BeeGFS CSI driver for use with Kubernetes. That means you can use your BeeGFS as volumes. As persistent volumes on demand volumes, you can access existing BeeGFS directories in your Kubernetes deployments. And we've also built some monitoring integrations around BeeGFS using Grafana. So let's recap a little bit about what we covered here today. SuperPOD based around the NetApp BeeGFS and EF600 building blocks. This allows you a flexible A.I. supercomputer reference architecture that you can scale and add to,size to any size that you see fit. It's simple to use and monitor. You know, NetApp and NVIDIA can be here all the way along the way, helping you design and deploy this to hit the ground running. And it is a trusted, pre validated guaranteed solution, and we've done the legwork making sure that this is going to work. And you know, and NVIDIA and NetApp will support this hand-in-hand all the way from start to finish and beyond. So if you would like to learn more about this, go to NetApp dot com slash A.I. and you can see NetApp's whole suite of A.I. solutions, including SuperPOD and others. And if you would like to get in touch with someone, you can on that web page, you can easily email or chat with a NetApp AI specialist and reach out for more information. Thank you very much.
NetApp EF600 all-flash NVMe storage combined with the BeeGFS parallel file system is certified for NVIDIA DGX SuperPOD, simplifying artificial intelligence and high-performance computing infrastructure and enabling very fast implementation.