On this demo, we plan to show how simple and easy to access the data through Databricks NetApp intelligence infrastructure storage solutions. Where your data is sitting, either a ONTAP or StorageGRID is easy to access your data and build a data solution around your data. Basic typical three stream like one is ETL pipeline or ELT pipeline. They want to convert their raw data to gold copy, and then leverage the value out of their own gold copy to help their business. And second one is machine learning use case ML use case. Next, streamline. It's a typical use case. We are trying to. There is a three jobs on each area we come up with like one is native S3,as a backend grid, NetApp StorageGRID as a backend, and then Ontap S3 as a backend storage. And the only connection string is the different pretty much interested ontap S3. I'm going to point out most of the times ONTAP S3 notebook the gold copy staging to gold copy. This is the AIML use case on top of gold copy they're able to do the work with the back ends as a ONTAP and StorageGRID. That's our intention. And then the last one is the LM use cases. We are pointing the documents PDF documents from AWS, S3, then StorageGRID S3 and then S3. Okay, let me get into the notebooks directly. Without further ado, let me run the jobs. I'm going to run the first set of three jobs parallelly converting raw data into staging data, converting into one parquet format. This is the S3 job I'm running, and then second one is the StorageGRID job. I'm running the set. Last job in the first set is this is the ONTAP. That's a pretty much if you see this one, all the jobs is completed. We are interested on this particular one box, whether it returns successfully or not. This is the timestamp. We just wrote it here. Don't worry about the previous commit I ran multiple times. Now I'm doing the second set of jobs here, converting the stage data to gold. And then second one is the StorageGRID and then third one is the ONTAP S3. Okay. Let's run. Let's get into the job. Code is common for the all the stream of jobs in the with different back end storage. Uh, here is uh, no connection. String is the only different rest are all the piece of code is same for the all the jobs. Okay. And here is still running. And then here also we say it's just to finish the job. Okay. If you still getting into the one level either um dataset either one of the dataset. See this is the. Okay. Now we are good with the next one I'm going to run the AI use cases. This POC I'm running just forshowing the interest. I'm running the multiple regression models. Basic regression models. I will show you let me trigger this jobs in uh, all the three areas. That's a native S3 StorageGRID S3 and then as well as ONTAP S3. Okay. Evaluating with the basic metrics like as I will show, you run these three models like a basic regression, linear regression, L1, Lasso and ridge regression models. And if you see this one, I think you might aware of this. Uh, no cross validation. And the metrics like mean absolute error, mean squared error and root mean square error. Three areas ONTAP as well as um StorageGRID as well as native S3. Okay. This is a pretty much cool like, you know, it's working as expected. No issues for all the data players, data consumers, they can ready to use, uh, leverage data sitting in NetApp side through the Databricks. It's a simple and easy, just as, you know, enable your keys and then have the connection strings and start exploring data and then start building your, uh, solutions around your data. Okay. The last one, the last use case is going to be language Rag and large language models. With the most of the get into our stuff, we are interested like running the rag use case. The rag use case here. Like you see this one? I'm pointing this dot. That's a tree of, let's say, two last jobs. I see there is a room for this here. Basically, right now I'm running the both LM jobs from the StorageGRID as well as ONTAP S3 together, who will not get failed if the. Yeah, it's a final thing in the StorageGRID one is pretty much done. The last step it is generating the answer. What we ask this is like a right. We ask the what is the tree of thoughts that says it matches. It is responding to the response. The last one this is our, you know, primary interested ONTAP S3. Right now we are pointing the same, you know, document and a few documents. This is the embeddings, the context. Okay. This is where we are asking the question. And then now it is going to be let's change a simple thing like what is the c t instead of tright. A tree of thoughts instead of tree. You are asking chain of thoughts. It's going to be let's see what is going to be here, what we ask and it's responded accordingly. Whatever document this. The actual document here. Okay. Yeah, that's pretty much. So basically we are trying to showcase as part of this demo, you can access your data through Databricks wherever it's sitting, either in the ONTAP S3 or maybe a StorageGRID or something else somewhere. It is sitting in a NetApp storage landscape. You can access your data through Databricks as a data player or data consumer data analyst or data scientist. You can do your day to day activities, as well as you can leverage the raw data to convert gold copy. On top of gold copy. You can use your exploratory data analysis for AI machine learning purposes, as well as any rag lm use cases. Also, you can start exploring this one, and then once you are comfortable with your model, then you can start deploying. That's, you know, as simple as that. We no need to two things you can do here itself with a combination of Databricks and then NetApp storage products to access your data through quantum S3 or StorageGRID. That's what we want to showcase as part of this view. Thank you.

INSIGHT, Cyber resilience

Databricks with Amazon FSx for NetApp ONTAP

Integrate Databricks with Amazon FSx for NetApp ONTAP to securely analyze data in place, streamline pipelines, cut costs, and accelerate AI/ML. No data migration or new cloud contracts required.

Read the blog