This is Shira Rubenoff, CEO of the Cyersphere Group. I'm here with Arendam Banadri, technical fellow VP of product and architecture for NetApp. Arendam, what a pleasure to be with you here today.>> Thank you, Shira. This is a pleasure and a privilege to be here with you today. >> Oh, thank you. So, today we're going to talk about security and AI guard rails and governance, which is a super important topic and I'm really happy to dig deep with you on this. >> So, Arend. >> Sure. What key data guard rails has NetApp put in place to ensure the integrity and providence of data used by AI systems? And how does your governance framework prioritize securing the AI data pipeline over the AI model themselves? A bit of a mouthful, but I'm sure you have a lot to say about these. >> Yes, that'sa lot, but hey, we are a data company who have been managing customers data for more than three decades. And customers data not only means the actual production data but also telemetry data that we manage for all our customers worldwide. >> And we aggregate the telemetry data to enable AIdriven analytics and predictive failure analysis of the data. Uh so how we manage the data? We have very stringent data governance policies to manage the data. We make sure that only the customer or the uh support engineer working on the case can work on that. We have things like CI and PII information where we uh redact customer sensitive or personal information and then we go through audits with our customers every year to make sure we are managing their data uh as per the stringent guidelines and keep in mind we do have customers in pharma and healthcare and finance which are very regulated industries >> and all different regulations for each one.>> Exactly. >> Exactly. >> Exactly. Yeah.>> And now we take that learning into our products as well. Okay. Wenow implement those guard rates uh those stringent access policies into our products over and over again and make this an iterative process that we learn from customers and we give it give that learning back to them. Like today in our products we have about 86 classifiers that classify sensitive information for the customers. We make sure that the data and its permissions never leave like the security posture of the data is maintained and we also make sure that uh the source data and the retrieval uh per applications always have access to the same set of permissions. That is how we apply the guardrails to uh our products. >> Oh, very interesting. And you've certainly taken all the different elements into account which is certainly very important and very impressive there. And how can organizations balance the need for rapid AI innovation with robust security measures particularly when it comes to the challenges of managing identity and access for AI data pipelines.That'sa very relevant question Shira in today's world the biggest challenge to increase AIdriven innovation is to make your data AI ready >> yes >> yes >> yes >> is to get your organization specific data to be protected uh let me give you an example as a pharma company who is innovating in u drug discovery for example needs to make sure that the patient data is never used inappropriately ely right so how do we do that as I said uh we make sure that the security posture of the data never leaves because in an AI pipeline data is moving from one system to the other you are having to move data to where the GPUs are to where all your analytics platforms are but it is important that your access controls and permissions are always traversing with the data so that you could use the same governing principlethat the source had specified to everywhere that your data is being now accessed from, >> right? Andthen uh as you go through the pipeline, you also add things like classifiers uh that identify sensitive data at for every stage like you could have sensitive data for running some service ids for doing some basic classification. You could have sensitive data for access. For example, my CEO and I when I'm looking at the same financial in documents for the company may get very different outcomes when we are running it through the AI models. So that role-based access also comes in and that comes in only because we are maintaining the security posture of the data throughout the AI life cycle >> and that'sa very key element when dealing around data and AI uh to take note of. And when we look to the future of AI, what are the most critical emerging threats related to the data supply chain? And how is NetApp evolving its data management strategy to ensure the trustworthiness and security of AI data at scale? >> Biggest challenge in today's AI world is because of the unstructured data. The multimodel unstructured data. >> Yeah. In the structured world, the schemadid tell us what we need to protect because the schema was telling us all about the data. In this unstructured world, we cannot understand the schema till we have AI models scan the data to understand what the data contains. And that is the biggest challenge as VC going ahead. You cannot move the data without security risks unless you know what the data contains. So as a future we see in this unstructured world that you will need to have all the governance and the security and access policies in place right where your data is generated. We have to make every bite of data intelligent with respect to who can access the data. Certainly it's not only who could access the data or what the data contains who it's relevant to and are the appropriate measures taken around it as you mentioned guardrails. So >> so we are seeing inthis world that um all of the data uh structuring all of the governance needs to be applied at the source >> correct >> correct >> correct >> with the leveraging of uh things like near data compute that now you can bring AI near where your data is. >> Yep. Well, that's very important certainly. And what is the single most important action a seuite executive should take to establish a data centric governance strategy for the AI initiatives? And how do leaders balance the ethical and regulatory demands of data providence with pressure for rapid innovation? So there's a lot there to unpack.>> There is a lot there. First, make sure we recognize the problem that governance is not an afterthought. That's the most important thing. You do not kick the can down the road. You want to build it into your business processes. That's number one. Then understand your data estate. Your data estate is a mix of structured and unstructured data with 85% of your data being unstructured. And when I say understand your data estate, discover it, understand what kind of data is there in your data estate. be able to classify it because unless you classify you cannot govern the data and once you classify the data you apply the policies that you want to apply to your data so that we know what kind of access capabilities will be defined based on that governance right now the final thing is you implement traceability to your data Because the data is moving across the data pipeline, you need to ensure that I can trace an outcome to the data to various versions of the data and to various versions of the models that was used to infer cognitive capabilities from the data. So we need to ensure that the data and the model is versioned together so that I can trace it back whenever I want to trace it back to. It is as simple as that. Without uh traceability AI systems cannot be trusted and without trust they're not usable.>> Awesome. So >> that is the evolution every uh seuite exec has to think about as we are evolving for this new world driven by AI. Well, you raised a lot of important points and I think you know from all my conversations with NetApp and all the executives as well. Data governance is so critical and it needs to be given more of a spotlight to really understand why it needs to happen, how it needs to happen, the reasons behind it and if it's not done the right way, there's going to be big problems across the board.>> Well, thank you so much for sharing your knowledge with our audience today and I really appreciate our conversation >> and I look forward to speaking to you again soon. >> Thank you, Shar. It's an honor to be here.>> Thank you.

Securing the AI data pipeline - Governance and guardrails

5 months ago

Hear about implementing robust security and governance for AI, focusing on data guardrails, and ensuring data integrity and provenance.

Explore NetApp AI solutions