Many enterprises that recently adopted modern artificial intelligence techniques are now discovering that training an AI model is only a small part of the overall AI lifecycle. Once you have a trained model that you’re happy with, then you need to actually do something with that model in order to make it useful. You’re going to want to use the model to make real-world predictions that deliver business value. In AI lingo, this is referred to as “inferencing,” and unfortunately, deploying a model for inferencing is complicated.
There are lots of questions that need to be answered before a model can be deployed for inferencing. A model is typically just a collection of stored weights and biases that have been “trained” to recognize certain types of inputs. Before you can deploy a model, you need to know a lot about the applications and/or users that are going to use the model to make predictions. How will users or consuming applications interact with the model? How will they pass input data into the model? How will they receive the model results?
Even when you have the answers to these questions, you’re still not done. You now need to figure out how to actually deploy the model in a manner that meets the requirements. Until recently, deployment required extensive DevOps expertise, which proved to be a challenge for many data science teams. Coinciding with the rise of MLOps practices, tools and platforms have been created to fill this gap. Triton Inference Server, an open-source tool developed by NVIDIA, has emerged as one of the preeminent open-source inferencing platforms. Triton Inference Server streamlines AI inference by enabling teams to deploy and scale trained AI models in a standardized manner. It supports all of the popular AI frameworks, including TensorFlow, PyTorch, MXNet, and OpenVino, among others. This support allows data scientists to continue using their framework of choice without impacting the production deployment method.
So now that you have an awesome open-source tool like Triton Inference Server, all is well and good in the world, right? Not so fast… There’s still the matter of storing the model repository containing the models that Triton will make available for inferencing. Triton Inference Server is typically deployed in a container, often on Kubernetes, and therefore persistent storage is required for the model repository. If you’re working in one of the major public clouds, then this part is easy because Triton offers integrations with native cloud storage offerings. If you’re deploying Triton Inference Server on premises, however, or if you’re concerned about storage efficiency or data sovereignty, then you have to customize your installation. Once again, extensive DevOps expertise is required. This is where many customers hit a roadblock.
Well, I have good news! At NetApp, we’ve torn down this roadblock. We’ve built new Triton Inference Server management capabilities into the NetApp® DataOps Toolkit for Kubernetes that greatly simplify and streamline the process of deploying Triton Inference Server with NetApp persistent storage. With one simple command, a data scientist or MLOps engineer can deploy a Triton Inference Server instance with a model repository that is backed by NetApp persistent storage. Yes, you read that right—all that’s required is one simple "create triton-server" command.
In fact, you can use this same command to deploy a Triton Inference Server instance to any Kubernetes cluster. The only prerequisite is that the Kubernetes cluster needs to be tied to NetApp persistent storage. Deploying to a Kubernetes cluster in the public cloud? You can use a NetApp cloud storage service offering, such as Amazon FSx for NetApp ONTAP. Deploying to a Kubernetes cluster that is hosted in your data center? You can use the NetApp ONTAP® cluster that you already have. The best part? By implementing a data fabric powered by NetApp, you can seamlessly synchronize or copy models and data between these different environments. But wait, there’s more! Since your model repository will be backed by NetApp persistent storage, you can take advantage of all of NetApp’s enterprise-class data protection features. This includes NetApp Snapshot™ copies for quick and efficient backup, which can be easily managed using the same NetApp DataOps Toolkit for Kubernetes.
In a world full of complicated architectures, confusing jargon, and other roadblocks, NetApp and NVIDIA have made inferencing simple. A standardized inference server, backed by persistent storage, can be deployed with one simple command. So what are you waiting for? Visit our GitHub repo to check out the NetApp DataOps Toolkit’s Triton Inference Server management capabilities today. While you’re at it, you can also visit the NetApp AI landing page to check out all of NetApp’s AI solutions. The DataOps Toolkit’s Triton Inference Server management capabilities are compatible with almost all of them.
Mike is a Technical Marketing Engineer at NetApp focused on MLOps and Data Pipeline solutions. He architects and validates full-stack AI/ML/DL data and experiment management solutions that span a hybrid cloud. Mike has a DevOps background and a strong knowledge of DevOps processes and tools. Prior to joining NetApp, Mike worked on a line of business application development team at a large global financial services company. Outside of work, Mike loves to travel. One of his passions is experiencing other places and cultures through their food.