BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
my name is david arnett i'm a principal technical marketing engineer with uh netapp's uh ai solutions team um focused on ai and machine learning and deep learning and to some extent hpc as well umi just want to give a little background for anybody who's not familiar with netapphas obviously been a big player in enterprise it for manyyearswe've been involved in the i.t space for almost i mean the ai space now for almost about five years we've produced several hundred documents we've got probably 30 or 40 dedicated staff members between engineering and business development and sales who are focused on ai and they're all out there assisting the rest of our field teams to uh to bring these messages to customers we've got somewhere north of 300 customers several uh several in the fortune 500 and even a couple in the fortune 100. so this is a space we've been playing in forquite some night time now and even as a company we're seeing kind of significant growth in this space just for netapp alone so with that said i'm going to go ahead and get started take a look at an example here right we're talking about um ai and you know thedata science here is very important here um but thescale of the data is what really becomes a critical factor here right um this is a fun graphic i've shown a few times thatreally shows the potential scale we're talking about right here right cars on the road may collect as much as a terabyte an hour of data you could have hundreds oryou know we potentially talk about millions of cars on the road collecting data that maybe seem a little farfetched but when you consider that anybody who's got a tesla is sending data back to tesla that they're using to train their models for the next generation so the number of a million cars out there collecting data is actually notthat far-fetchedi think the actual capacity numbers are a little bit high a terabyte an hour is maybe what thecars are collecting raw but uh what's interesting to me is about this is one of the spaces where ai is actually helping with the development of ai meaning all of the vendors who were doing this kind of autonomous vehicle development are actually developing models to help them sift through all that raw data so they don't actually have to save all of it they only save the bits that are interesting and important um but at the end of the day we're talking about exabytes of data right the uh the actual raw data numbers turn into massive quantities um over time that becomes even bigger and because of the nature of some of the things we're talking about there are definitely some kind of regulatory compliance issues here so a lot of this data is going to have to be saved for longer periods of time than might otherwise be saved orsaved for longer than any other data of this kind has been saved before so the challenges around the data is reallywhere this is becoming an issue in terms of the overall process of the ai software development so over the next hour and a half or so i'm hoping we're going to show you how our solutions go beyond just like raw performance numbers right there's a lot of talk about um speeds and feeds in this space but at the end of the day that's not the whole story it's it there are some kind of minimum performance requirements in order toeven play in the game right there's some tablespace stakes to get here um but it's theother parts of this uh the processes and the management of the whole thing that really kind of help determine uh success or failure forcompanies trying to go down this road hi my name is max mandy i'm a solution engineer and i specialist out of the munich germany office of netapp in my role as an ai specialist we get involved as soon as one hour of our account managers or solution engineers spots an ai or ml workload at one of our customerswe even tried to talk directly to the data scientists and data engineers to understand what are the challenges and how can we as netapp help them for um for ai journey andum and i think we all know or especially from my perspective working as a data scientist before but the data science tech language and the it infrastructure tech language are actually quite different but i think it's really important to understand both languages to be really be able to help the data scientists and data engineers with their problems and we get involved in quite many different companies in different stages of ai journey and we but although the challenges although we're at different stages of our journey and working on different problems we see that they have the same challenges we're facing and yes as netapp we might be biased but we see usually the bias or the biggest problem regarding the data in the beginning of the eye journey we tend to see the challenge regarding data access and data availability the best data scientist cannot build a good model if he does not have access to correct data late in the journey we tend to see more challenges regarding data gravity we see this especially at our customers when they're training the models or influencing the models in the public cloud but the data resides on on-premises systems then it can be quite tricky to move the data around between the cloud and on-premises we have a challenge with that which we tend to see quite often is around shadow ai and shadow i.t when we svi solution specialists get involvedwe first tried to talk to our traditional conduct point points of contact at our customers meaning traditionally more of a storage admin or the iit infrastructure specialistand we when we ask them then hey what are you doing with ai in your company where are you in your ai journey we here yes or often here yes we have some ai guys in our company doing fancy stuff but we're not reallyin contact with them when we're talking to the data scientists and data engineers in their companies they tend to hear the same it's like of course we have an i.t department but we're not really sure what we're doing we're not really aware of what are we working on currently and i think that's really unfortunate since in many cases we see that the i.t department already found solutions for challenges on which the data of where data scientists are working on but it also goes very round in many scenarios we often see that the idea that the ai department is further with their cloud journey than the iit department was i think both sides would really benefit from each other if they would work better together and work better as a team from an early stage my next two points on that slide i want to treat this one complexity and scaling ai projects then we see that our customer starts with very ai journey we usually hire one or two data scientists or a couple of ai working students and give them access to my workstation on workstation or tell them to work in the public cloud but both approaches are fine at first but yes the more the project scales and the more and more people are working on and more project or finished models they finish training the bigger challenges appear for example on prem at the point where we move from jupiter notebooks and jupiter labs to like an mlops tool that scaling is often not as easy as we expected when working when you're working with public cloud we see that the costs are usually going up way far way faster than they initially expected and those challenges are not bad if they appear in an early phase of a journey but if they appear in a later stage they can be really difficult and costly to solve it's really important for the companies to solve those challenges in an early stagefrom my european and especially german perspective i think most of our customers if we have a look ongardner's ai maturity scale model and scale are in an early phase like most of them are level one or level two with some reaching level three but when i talk to my american colleagues we i actually realized that many or when we actually realized that many of their customers are like on average like half a level yeahfarther than the german companies we're working together with and we asked ourselves how can we help them yeah how can we help our companies in germany to like bridge the gap and we're working on two strategies therefore with the first being working together with a um with ai consultancies but also we're working together with an ai accelerator called applied ai they claim to be europe's largest ai accelerator and they're part of a technical university of munich and together with them and their partners we can help their customers and also our customers to really solve those challenges i showed in the previous slide before they get into problems when i started out at netapp i came fresh out of data science and still had a very strong data science mindset in my headandi was really surprised when i learned how many technologies netapp had to offer which would have greatly benefited me in my time as a data scientist for example data versioning is a huge challenge for many data scientists and also for me back then but with snapshots you could really facilitate that process data cloning and data copying could be really facilitated if you have access to flex clones or writable snapshots and i really fell in love with the data fabric story since in my opinion that's really a good chance to fight data gravity and to overcome data gravity by making it really easy to move data from the edge to the core to the cloud but i asked myself if those tutors are cool why do more data scientists actually use them and i think one of the answers therefore is that it's not the data scientists job to be an i.t infrastructure expert butwouldn't it be great if we could give access to those solutions to the data scientists from their working environment with like one line of code without them having to be them to be storage experts and with that thought i'm gonna hand overto my colleague mike oglesby my name is mike oglesby i'm a technical a senior technical marketing engineer focused on mlaps solutions and tooling it at netapp and as max mentioned i'm going to dive into some of the tools and capabilities that we're working on exposing to the data scientists anddata engineers of the world uh and you know i like this quote right here and i wanted to start with it because ifeel like it really encapsulates what we're trying to do on our team you know andrew ing who i'm sure all of youknow he'sone of theforemost thought leaders inthe deep learning world uh he says ml model code is only five to ten percent of an ml project and you know working with our customers i i'll say wedefinitely have found that tobe the case so if theml model code's only five to ten percent of a project what's the other 90 to 95 percent andrew calls it the poc to production gapuh andbasically this is everything that's not the model itself right this is getting the data there so you can use the data to train the model this is managing the data this is all the infrastructure that you know because as much as we all wish infrastructure would just go away and disappear infrastructure still exists and it has to work and everything has to run on infrastructureuh andso this is basically everything thatenables the data scientists todo to do what they do best uh and you know obviously we at netapp are not going to solve all of the world's problems but we've been working hard to take some of our storage capabilities and present them to data scientists in a easy to consume format so that we can help bridge some aspects of this gap and help them bring their models into production what are these capabilities first i'd like to start with a tool that we've developed thattakes these takes these you know takes these capabilities andpresents them to the data scientists and data engineers of the world and then i'm going to jump into the capabilities themselvesso thenetapp data ops toolkit whatis the netapp data ops toolkit before we talk about what it can do it's just a python module it's a super simple python module when we first started talking to data scientists four or five years ago you know daveand myself and the early members of our team you know we found that we had a lot of capabilities on the truck that were you know could really help solve some gaps in the data science process but data scientists are used to working in jupiter notebooks with python modules and python libraries and python code rightthey're not used to all the it's storage admin stuff and that stuff was just too complicated and unapproachable for themand so we decided let's take those capabilities and wrap them in a super simple python module that's designed to be approachable for data scientists developers devops engineers ml ops engineers so that they can actually take advantage of these capabilities you know it's just like any other python module you can install it with pip it'sfree and open source if you already have netapp storage you could you go download it andinstall it today we've got two different flavors of it one that supports vm and bare metal environments we call that the netapp data ops toolkit for traditional environments and then we have another one we call the netapp data ops toolkit for kubernetes that's designed for kubernetes specifically and bring some cool additional capabilities that take advantage of the you know the kubernetes api and scheduler andthe workload managementcapabilities that kubernetes brings so that's enough about that what you know what are these capabilities that we'retrying to get into the hands of the data scientists what are these capabilities of ours that they're using tohelp fill these gaps in their process the first is around workspace creation so you know what we usually find when we go talk to adata science team for the first time is that oftentimes they've kind of been working in their own silo you know it didn't really know how to support them they didn't really know how to get support from mighty and so they just kind of they were forced to set everything up themselves and one of the big bottlenecks in their process is typically creating theworkspaces the development environments that they work in to train their models and validate their models and we see a lot of manual provisioning and copying data you know tedious manual processes that take hours even days for some of these larger data sets they're typically not using enterprise caliber storage so there's you know often no data protection if something happens to their machine their stuff's just gone um there's no traceability oftentimes which i'll touch on is a big challenge uh andit'sreally you know theythey're having a hard time getting from idea to a workspace where they can actually implement their idea and so in the data ops toolkit we built the ability to just in one cli command or one python function call near instantaneously provision a workspace that's backed by netapp storage so if we're talking thevm or bare metal version of the tool kit they can one simple function call say hey i need a 500 terabyte workspace to store mydata in store my notebooks my jupyter notebooks in what have you and couple seconds later they'll have mounted on their machine a 500 terabyte nfs share that you know is that the path they specified and they can just get to work in it if they've got the kubernetes if they're running in kubernetes we can do something even cooler they can say hey give me a 800 terabyte jupiter lab workspace uh in just a couple of seconds later they get a url they can use topull up their jupiter lab workspace it's backed by netapp storage persistent storage but they don't have to know or care about that they just know that they needed a 800 terabyte jupiter lab workspace and they got one a couple seconds later they can log in their web browser pull in all their data um you know save their notebooks there and get to work i see there's a lot of parallels between this and the infrastructure as code tooling as like terraform and massive and such is that did you like draw inspiration from there or yeah so mybackground is actually in the devops world so i my background's on a application development team for a financial services company and so iwas you know i've done a lot of work on kubernetes and with ansible automation and so yeah wedrew a lot of basically wemarried we kind of drew inspiration from that world and tried to marry it with the feedback we were getting from actual data scientists andbuild them something that would be simple for them to use and consume because you know we've got a bunch of ansible modules right it's easy to automate netapp stuff with ansible but umdata scientists they don't know ansible and they you know they're used to python code not a bunch of yaml and ansible playbooks and it'seven ansible is you know kind of out of their wheelhouse so we tried to apply the same concepts andbring them in a format that they could use in a kind of self-service way soi've talked about provisioning workspaces but deep learning training deep neural networksit's an extremely iterative process right so it's not just a one and done thingyou know data scientists don't just get some data run a training job they're done it's over that model's done and it they deploy it to production you know usually there's a lot of experimentation and tweaking and they'll run the same training job over and over again as they try to refine theirmodels and oftentimes this necessitates modifying something so they'll have a workspace and they'll need to make a copy of it for a particular experiment so that they can modify a data set while preserving the gold source tweak some hyper parameters what have you and this ourcustomers have told us and uh max has told me from his data science experience isa hugecommon bottleneck and there'sa lot of time spent around with data scientists drinking a coffeeand you know getting irritated while they're waiting for some copy job to complete when they really just want to get on with theirproject and so umyeahare the data op functionsuhsomething that you invoke directly from jupiter or is it something you invoke outside of jupiter notebooks or both or yeah so e bothand yeah so it's packaged as a python module right and it's there's a cli interface and a library of python functions that could be imported into a jupiter notebook or any python program or workflow so like adata scientist working in jupiter if they want to clone theirworkspace they just call a clone workspace function source equals new workspace name equals andthat you know they know how theytie thetoolkit calls to a netapp storage solution orhow does that tie-in work yeah so on thetraditional toolkit side vm and bare metal side it's built on top of our rest apiso it'susing our python sdk under the hood and the rest api but that it's taken these complicated api calls that we tried to say to data scientists hey make this api call and they were likewhat the heck is that it's taking them and wrapping them in like a simple function basically on the kubernetes side it's built on top of astra triton our csi driver huh yeah so same idea and then there's different versions of jupiter now like jupiter lab versus regular jupiter you guys support both or yeah so theum on kubernetes if you want to manage workspaces at the jupiter lab level we support jupiter lab there but our library of python functions you could use that from any python based interface so it could be the old notebook interface yeahit couldbe your laptop you're talking to data scientist needs right now but is the data ops toolkit for all kinds of data ops workloads and problems and solutions yeah so we primarily developed it for data scientists but we have found that um we've actually had customers in like on more traditional development teams and devops teams that have started using it so we uh we've worked with a couple of customers in the semiconductor space who were using it to you know build clones into some of their eda workloads so that they can quickly have a workspace for a validation job or something like that yeah like it's just starting in the just the traditional database world as well this whole concept of data so yeah and we actually uhto that point we first called it the netapp data science toolkit we changed the name to netapp data ops toolkit because we were finding this interest outside of data science yeah so the other question i had so we've just heard a presentation about very giant volumes of data right like just mind-blowing that we talked about umbut you're talking here about copying and moving and stuff but traditionally when we're dealing with not just big data or very big data but giant data um we tend not to move it or copy it and everything just because it's so big yeah soumis that is what's the sense of data scalewith what you're talking about with this solution so we find that our customers typically fall into one of two broad categories when we're talking to data science teams there's the more traditional hpc folks who have just massive amounts of data scale out clusters you know big file systems and they tend tonot move or copy things around as much and then there's the more traditional enterprise customers who maybe they didn't do a lot of data science until four or five years ago and they started to you know implement some of these deep neural networks some more uh you know cutting edge deep learning techniques they unlike the you know big massive scale out folks they they're usually working with smaller data sets and doing a lot of copies and iteration andthe toolkit um the cloning capability is definitely more applicable to the latter group theregot it got it yeah that we've found that both um bothappreciate the snapshot capability which i'll get to in a couple of slides here but thatgives me a good segue so we've been talking about um developing and training models up to this point right well there's this whole other piece of data science right once you've trained your model you have to actually deploy it so that it can do something anddeliver actual value uh and that's where inferencing comes in where you're actually using the model right to make predictions andyou know be it real time or in batch and um in the early days ofdeep learning there was a yeah there was a pretty big gap from having a model to deploying it you had to basically build a custom web server for every model and build your own api onthe front end or integrate it in a custom manner into your web app buta lot of tools have emerged tomake that simpler and one of my favorites is thetriton inference server from our partners at nvidiaum yeah fromwhen i started todabble in this space towhere we are now withsomething like the triton inference server it'samazing how far it's come basic you know basically it's if you're not familiar with it it's a pre-built web server with a pre-built api and you can just drop in your model it supports all of the standard frameworks like tensorflow pi torch what have you and you just drop in your model and then you can call this api toperform your inferencing you don't have to develop anything except for themodel and a little config file and so you drop that in but thereare some challenges around hooking it up to your storage because it needs storage to serve as the model repo and if you're not in a very vanilla hyperscaler environment there's some customization that's required there that's that uh we found a lot of customersit's kind of like the other things we've talked about outside of their wheelhouse and so um we built a very simple operation in the dataops toolkit that enables data scientists or mlaps engineers to with one function call or one command deploy a triton inference server andhave it be hooked up to a kubernetes persistent volume that serves as themodel repo and so they can just drop their models into this persistent volume and they'll be automatically picked up by thetriton inference server and they can start hitting theapi as soon as they drop it in there my understanding yeah a lot of the models and stuff like that areusing uh i'll call it git or github kinds of solutions to control their sources i mean does the data ops toolkit natively interface with github or how does that play out no there'sno native interface so it'smeant more for the development environment so what i've seen customers doing is they'lluse it to provision their development environment or their inference server and then within that environment they'll you know pull stuff down from github pull some data and run their training maybe commit some code back to github and right and that's a good segue to my next slide for traceability save their snapshot id in github when they commit the code so they've got a traceability from theirdata set totheir model there's sometimes a model repo involved too so the code goes to github the data goes into a snapshot and the model goes into a model repo and you know with that snapshot id you can have full traceability from the actual data set that was used to train the model you know to the code thatdefines the model to the actual model itself sitting in thatmodel repoand wehave a financial services customer who basically told us when we first started working with them we got all these great ideas we've trained all these models our compliance guys won't let us put them in production because we kind of didn't think about traceability i use thesefolks as an example but we've had i can't even count on two hands how yeah it would take more than two hands to count the number of customers who have told me that andwhen they started using snapshots to save off a point-in-time copy of their workspaceand implement that traceability with the snapshot id to the code in github into the model and the model repository they were able to check that compliance box and startactually putting models into production all right soi'm going to quickly touch on these next couple slides just in the interest of time umso oftentimes there's a high performance training to your right and we're telling customers to save off snapshots for traceabilityyou don't want those snapshots to fill up the high performance training tier right that with that expensive powerful storageuh and so uh we have a lot of customers who are taking advantage of our cold data tiering we could spend a whole presentation onthese uh our these cold data tiering capabilities so i won't go into them but basically they'll have the snapshots teared off to an archive object storage tier or an archive file storage tier so that they have one front-end interface butthey're not they're not consuming all that high-performance storage in their high-performance environmentwith these point-in-time backup copies that are for compliance that's all driven to the data ops the cool thing aboutthe tiering is it you just set it up once and then it just happens yeah it's policy driven so the data ops lets you take the snapshot and then it just automatically gets teared off as cold data in terms of practical uh customer applications of this how much data are we talking about for typical customers that can be tiered out to cold data instead of kept uh online so that yeah that'sa great questionyeah i know i mean i know customers who were in hundreds of terabytes of scale with their you know data sets they're managing anduh yeah dave you might be better positioned to answer that than me but i know it we can we're talking big numbers here yeah i've got a couple of customers i'm working with a large pharmaceutical customer now who is talking about petabytes of raw data they need to store petabytes of raw data they only usually use a couple hundred terabytes of that at a time for any given training job right so uh thefabric pool tiering means that you know data gets ingested it gets written into a high performance store and then based on an age policy or whatever it gets moved off if somebody calls that back up it gets brought back up into the uh into the accelerated tier they'resnapshotting hundreds of terabytes of data they're snapshotting a volume that may have a hundred terabyte or hundreds of terabytes of data in it now remember a netapp snapshot is completely space efficient except for any changes right so if i take a snapshot of a 500 terabyte volume and then write 500 gigs into that volume i'm only consuming 500 gigs of extra capacity the other option here is the flex cache is kind of the reverse and i noticed athing on your side i would say for flex cash it's automated hot data tiering right so the idea with a flex cache is that um well i've got another customer where we'veprovisioned a very large cluster of hard disks and then a much smaller cluster of flash systems that's directly connected to the training systems and all of the data goes into just a standard uh hard disk repository and on a case-by-case basis either on demand when a user just kind of references a piece of data or in advance we can pre-populate that cache so that as a data scientist is getting ready to execute a training run against something specific they can elevate what would be kind of colder data up into a performance tier in terms of data ops that's one of the things that i've heard is that a lot of people are starting to think about trying to save a copy of a data set that's consistent with a certain model at a certain point so that they can then um go back to that if needed and is this what people are doing with the netapp snapshots then yeah that's the entire point of the traceability comment that mike's making right is that once you know and this goes to some of the original questions i didn't want to jump in then but you know theconcept of uh devops has been around for a long time and the concepts of continuous integration and continuous deployment right and a lot of that the data ops toolkit is built on those same premises with the idea that these workflows have the same processes right they're software developers they're developing software when they reach what they think is a done point they run some automated testing on that if that testing passes then that model gets moved or thesoftware gets moved into the next phase of the process where it may get deployed or what have you the data ops toolkit really enables all of that same kind of automation that people were already doing for more traditional software development it just adds the element of the data also so that instead of just taking a snapshot of your actual code repository we can simultaneously take a snapshot of the code repository and the data that was used to train that code and that makes a big difference in that traceability question yeah and thistopic or this concept of snapshots for data set to model traceability it's been extremely popular and generated a lot of interest with our financial services and health care customers especially because theyyeah it was a compliance because yeah traceability stuff comp regulatory compliance we've yeah we've had so many conversations with customers where they told us they were kind of stuck in the science project phase and they were really struggling with that traceability andwe've been able to help some of them get over the hump and so in that situation the model the code the data and all that stuff would be on a single netapp volume or something like that and they would snapshot that and they would have it they could tag it and then keep it around for as long as they want to archive it off to an s3 object store you know dowhatever else you would want to do to protect that from a compliance perspective and if there's ever a question around hey why'd this you know how did you train this model i it's made a weird prediction you can go pull up the exact environment not unlike a container to some extent with the infrastructure slow shoe with it yeahexactly yeah so those in those two industries especially we've found that customers have it's really helped them bridge that gap this is just a quick demo showing how with the data ops toolkit you can near instantaneously clone a jupiter lab workspaceso let's so let's start this guy here umso basically let's say i'm a data scientist here i'm working in that jupiter lab workspace and i need to clone it to drive an experiment so ican go into my terminal i could have done this with a python function call as well but so i've run this list jupiter labs command here and theworkspace i was in is that project 3 dash mic so i want to clone that one so i can modify something maybe change the data normalization technique what have you to run an experiment but preserve the gold source all i do is run this clone jupiter lab commandcould also i could also call the clone jupiter lab function from a python programispecify the source workspace name the new workspace name and that's all i have to do and i press enter and it calls out to the kubernetes api and it's going to clone the volume behind the scenes i don't have to know or care about any of that ijust know that it's important to know that cloning is a rewritable structure it's not snapshot which is only readable right exactly so this the cloning is more for that experimentation you need to read write copy so it calls out it clones that volume and it spins up a new jupyter lab workspace on top of it the cool thing isa data scientist i don't even really have to know that a netapp volume was involved i just know that ihad this two terabyte workspace and i get an exact copy of it so i can take this url here and go over to my browser and i'll have an exact copy of that workspace i was working in so igot some images and a notebook in it you know if we pull up the copy we entered the password we just set when we ran that command andwe got the same images data set and the same notebook so it gave us an instant copy it's going to take the same amount of time whether it's two terabytes like this one or 500 terabytes because it uses the netapp you know theold school tried and true netapp cloning technology under the hood but itpresents it in a way a data scientist doesn't have to say okay which volume is it where how do i find that and the you know storage interface can i even get access to the storage interface do i have to submit a request oftentimes they don't even know cloning exists to submit the request so it just makes it super simple and they can manage things at thejupiter lab workspace level so i have one question aboutthat like not seeing the abstraction is good mostly because i i'm living in python i don't need to care about infrastructure right um butwhat can i screw up with this so if you ifyou yeah that's a good question because there'ssome important prerequisites to giving a data scientist access to this yeah sobasically typically what our customer customers will do is they'll set up asandboxed environment so if you're familiar with netapp that would be an svm a storage virtual machine if you're not you know familiar super familiar with netapp the important idea is just it's a sandbox within the storage system that you can set some limits around and give specific access to and sobasically so basically this is what most of our customers are doing they give the data scientists a specific environment with some limits that they can use for theirdevelopment and so they can do whatever they want in there but they're not you know they're totally sandboxed from therest of the environment and they can't stomp on anybody else's stuff we do have customers who give the data scientists a whole dedicated cluster but you know those are more themore advancedcustomers who are kind of further along in theirjourney and need the horsepower of a whole dedicated cluster any support for other types of notebooks oryou know because everyone's building the next best notebook so yeah so the at that workload at that workspace management abstraction scale on kubernetes we only support jupiter but we also there's basically two options you can manage it at that level or at the nfs share level and so we've got customers who are using things like rstudio and they'lljust say hey they'll call create volume instead of create workspace and say i want it mounted at this create volume size 500 terabyteslocal mount point you know slash home whatever and they'll manage it that way yeah thanks no problem soi think it actually works with a live demo yeah for the live demo i want you all to imagine but i'm back in data science and we've been just handed over a data set from our boss that's called a mic and it's our job now tobasically first clean the data analyze the data a bit and then build a simple model as a data set we're going to use today's nasa's tuber fan the aggregation data setbasically in the data set you have many different tuber fan engines like you have them on the airplane and you try to predict how many more life is in those engines before they need a major service in the beginning we're going to just import all the libraries shout out to the net update of circuit traditional library but also to qdf and qpi from nvidia with rapids from rapids ai just to speeding up with data ranking massivelynext what we're going to do isusing the netapp toolkit to give it an overview about which volumes do we currently have access to and here we see analytics data that sounds good let's have a look into that one engine data i think that's the one but as a data scientist you should never everwork with a golden sample and that's also what mike just said before in my time as a data scientist what i did was just copy all the data around with which i wanted to work so that i didn't manipulate the golden sample by accident but with a flex clone or with a clone it's way easier and way faster so just one line of code we specify how it should be recalled where it should be rounded if you want to wait a couple of seconds i think 12 seconds in particular right now and yeah i ran with them a couple of times before setting it up yes and yeah you're done you have you can work with it as if it would be a complete copy of it you can do everything with it and yeah my next steps i'm going to get kind of quickly over it i'm just going to read in the data do some data analytics data wrangling nothing interesting for this demo here from data analytics exactly cleaning the data where it gets interesting again is right herewhen i finished cleaning the data the nextthing i should do is save the data to a separate place and therefore i want to create now volume using the netapp data circuiti just write one line of code create volume specify how should it be called how large should it be where should it be mounted anddone basically i can save in wherever data but what i'm going to do nextis create a snapshot of a volume and you're going to see i think you can see already now what i'm where i'm going with that as a data scientist or in my time as a data scientist it happened to me yes i have to admit it that i actually cleaned that i actually deleted the data set which i just leaned and as a data scientist you really have a bad time if youhave probably not any more access to the cleaning script but if you just created a snapshot of it it's not a bad time at allyou can do is restore the snapshot one line of codeandyour valuable data is give it a couple of seconds it's back no bad day you can rest assured that your evening beer is saved next i'm just going to create asimple model using xgboost for the datawait a couple of seconds and as soon as a model is strained we have now a choice either we could use the new functionality of a netapp data ops toolkit to directly deploy the model to nvidia drive inferencing server but for the simplicity of this demo what we're going to do next is just create a new volume and say save a model into the volume so that we can give access to our colleague who's then putting it into production for us where's the results yeah what are the results[Laughter]so we haven't delete them all down so we have an rmse of 9.5 noi bet like in a typical demo you'd swap over to a dashboard that showed you now hadyeah or whatever you did unfortunately no sexy dashboards for this demo um just that you put the notebooks and super.lab yeaheasy demo you didn't have to yeahbut honestly i was just super yeah super glad that it ran through my demoyeah one question ihave about this thoughand forgive me if i've missed something but the it feels like you're there is a python api to the storage like so it's exposure so as a data scientist i still actually need to know a bit about these infrastructure primitiveslike what's the difference between a clone and a snapshot is can i work ona higher level of abstraction for this wherelike the infrastructure team deals with that and like handles policy and stuff and i just go ilike i want to do this thing and you figure out how that actually happens underneathyeah so yeah so one of the things we're working on is trying to bring this up a level further right and get it into like an mlaps platform um andwe have a couple of customers we've worked with who've built their own kind of custom in-house mlaps platform and they've integrated these capabilities intothat so basically you know the data scientist is in some dashboard and they're right all right they click clone they click i want a copy of thisworkspace and then it just behind the scenes it does the clone and presents it to them so i yeah i think you your question hits on what our ultimate goal is we're still kind of on the path of getting there after having now talked about how the net update op circuit can really help the data scientists and data engineers in the daily lives i want to talk about an architecture how we tend to see it at our customers and how we like to promote it at our customers when we vi solution specialists get involved usually currently the data resides on some sort of ontap on-prem ontap system but in many scenarios our customers want to resume or start their ai journey in the public cloud and those they first get into contact with the hyperscale of choice and the hyper scale of choice and recommends them to use their own natively integrated ai working environment that could be a google vertex ai that could be aw sage maker but no matter what choice we kind of see we tend to see two downsides of those choicesof those integrated work environments for the customer first it makes it for the customer quite difficult in the future to switch to another working environment to another ai working environment second the hyperscaler tends to recommend our customers to upload the data to an s3 bucketand that's totally fine if the data is not oversensitive but we see from our customers that some customers do not feel comfortable upload or many customers doesn't feel comfortable uploading their highly sensitive data to an s3 bucket what we like to propose them instead is to use some cloud-based ontap available at all free hyperscalers and with those cloud-based ontaps you can securely encrypt your data at any point in time and also you could securely transfer the data between the on-prem on tab and the cloud-based ontap so that you do not risk losing the data or giving by accident access to other people like extrovert people to the data what i personally really like to promote is to put in between where on-prem on top and the cloud-based ontap netapp cloud data sense it allows you to scan through the data regarding data privacy issues and to scan it whether there's parts of a data in there which probably should you do not want to have in the cloud and which might be information regarding religious beliefs or something like that and yesyou probably even on brand you shouldn't have the data but yeah as the data just um just ages we see it with our customers sometimes by accident have those data and i think it's a good point of safety for them if they can scan the data before just transferring the data to the public cloudas soon as the data is run in the public cloud our customers really have a choice of mlop's tool why could either choose an open source product like kubeflow but also they could go with one of our partners a mellop's tool like a domino data labs or a ai or iguazio no matter which of those products we're choosing or a browser they are choosing it's really easy for them to move to another hyperscale or also move back on bram because thanks to the best ontap it's really easy to move for data to the place where we need it when we need it and that's actually also what we heard from the ai accelerator in germany that they really enjoy working together with us since we make it that easy to move the data around to the place where they need it for their customers and actually mike has a customer experience where we use an architecture close to this one in practice yeah it'sa really kind of unsexy customer you know reference but itget not in terms of the customer but just in terms of the architecture but i think it's a good example ofyou know some of the simple problems that something like a cloudontap could solve for a data scientist we were talking to a data science team at one of our customers and basically they were in ec2 instances in amazon aws runningtrainingstraight from s3 and they were having trouble sharing data andthey were they were having trouble you know getting the performance they needed it was taking them forever to run their jobs they were paying a lot of money for these expensivegpu-based ec2 instances and so you know just a simple solution they started using amazon fsx for netapp ontap as kind of a shared training environment so they could all mount that on their ec2 instances bring down data sets from s3 there do whatever they needed to do to them and then max out their gpus they needed shared access to it which they you know couldn't get withusing ebs and so it simple use case it kind of really solved some big problems for him coolso when we're already talking about cloud ai i want to have a look on a bit different part of a netapp product portfolio and that's spot by netapp i think you've all heard by now that spot by netapp can offer or offers a lot of different cool solutions to facilitate in your cloud journey and also to make your cloud journey more efficient and cheaperbut i want to have a look in particular into two different spot tools which we like to pitch and see at our customers and which can make the cloud ai journey better and more efficient first i want to start with spot ocean is our serverless infrastructure engine for containers basically it continuously optimizes yourcontainer infrastructure in the public cloud by right-sizing your pots in containers by recommending you and helping you to find the right instance typesbutmost importantly by helping you to consume the cheapest consumption optionof compute instances and that's as we all know those are spot instances but most companies do not want to use spot instances fortheir customer facing applications since they can be terminal spot instances can be terminated at any point in time but that's exactly where ocean comes in since spot ocean automatically reschedules your containers to another instance consumption type so that your customers never feel that then one of your spot instances got terminated overall combining those services spot ocean can save up to 80 percent of the cloudcompute costs compute costs and how we see it with our ai customers is that ourai customers in the public cloud really in choice but ocean for running their inferencefor example they can use nvidia stridents inferencing servers and let it be managed by spot ocean and will save a tremendous amount of money simply by letting spot ocean manage that deployment the other two which i quickly want to go over is spot ocean for apache spark that's basically a fully managed um spot yes fully managed apache spark in the public cloud it's based on spark on kubernetes and is obviously fully spark aware infrastructure scaling it allows you to deploy that spot yeah that's apache spark in the public cloud really easy and best of it in my opinion is it uses the same engine as but ocean and would make it very cheap for you and allows you up to 60 percent cost savings running spot yeah running store apache spark in the public cloud currently a bad airspot ocean fair patchy spark is available for aws but with support for gcp coming soon
Learn about customer experiences around data science and MLOps, the challenges, trends, and NetApp offerings.