BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Hi, I'm Mike Oglesby and I'm a technical marketing engineer focused on MLOps solutions here at NetApp. Today I'm going to show you how you can use the NetApp data ops toolkit to accelerate your AI and analytics workflows. So let's say that you have adata set and you want to use this data set to you know run some sort of high performance analytics job or maybe to train adeep neural network model. Uh so typically you're going to need to modify this data set in some way before you can use it for the specific job that you want to run. And you usually don't want to modify the original copy of the data set itselfbecause that's your gold source and you don't want to accidentally delete something within your gold source orchange something within your gold source be because then you won't be able to restore the original copy. So what you're usually going to do is you're going to create a copy of this data set. So let's create a copy of it and we'll call it copy a uh and especially when we're talking about things likeuh you know deep neural networks, modern deep learning, uh we'retalking about pretty large data sets andso this copy operation that you have to run tocopy your data set it can take hours or sometimes even days. And so this is a lot of time that as a data scientist, you're just sitting around waiting before you can go ahead and proceed with the specific job that you want to run. Uh andactually it'seven more of a problem than that because oftent times there's more than one data scientist andoftent times these multiple data scientists are working with the same data set at the same time either running different analytics jobs ormaybe experimenting with different model architectures as theyboth work on training the same type of model. And so you're usually going to have to actually create more than one copy of this data set. You know, one for each data scientist. And so, you know, let's create a second copy.And we'll call it copy B over here. And each time you create a copy, sometimes you're creating five, even 10 copies. And each time you create a copy, it's going to take hours oreven again days. And so this is apretty big bottleneck in yourprocess. This is a lot of time sitting around andwaiting for your data scientists. Sowhat if instead of hours or days, you could create an exact clone of your gold source data set almost instantaneously. Let's say in just a couple of seconds, you know, two seconds. Just think about how much more quickly you could arrive at aaccurate so an accurate and an efficient model. Well, that is exactly what you can do with the NetApp data ops toolkit. The NetApp Data Ops Toolkit is a free and open-source Python package from NetApp that enables data scientists and data engineers to nearinstantaneously clone data sets in just a couple of seconds. If you're interested in learning more, just Google NetApp Data Ops Toolkit. Uh, and the GitHub repository will be at or near the top of the search results. Uh, and if you already have NetApp storage, you can get started and experience the wonders of near instantaneous data set cloning today.
Streamline your AI workflows to make it easy for developers, data scientists, and data engineers to perform data management tasks.