The importance of data in responsible and ethical AI

woman looking at screen
Table Of Contents

Share this page

Hoseb Dermanilian

As enterprises expand their artificial intelligence (AI) efforts, they must address important and sometimes difficult questions. Is AI being used responsibly? Can the results that it produces be explained? Because data underpins all AI processes, this series of blog posts will look at important AI questions from the standpoint of data, data management, and data governance. This post focuses on responsible and ethical AI. Future posts in this series will examine explainable AI and federated learning.

Judging from the thousands of articles about responsible AI in the past year, it’s safe to say that the potential ethical pitfalls of AI are now widely recognized—even if the best ways to address these challenges remain unclear.

Enterprises need to ensure, to the best of their ability, that all uses of data and AI are ethical, and that customer, employee, and other data are kept protected and secure. Of course, companies should strive for responsible AI practices because it’s the right thing to do, but there are also clear benefits from responsible AI. According to IBM’s Global AI Adoption Index 2021, “over three-quarters of global IT professionals report that it is critical to their business that they can trust the AI’s output is fair, safe and reliable.”

This blog examines the fundamentals of responsible AI and the important ethical issues that must be considered. It also discusses four responsible AI principles and the role that data and data governance play in responsible AI.

What is responsible AI?

Broadly speaking, responsible AI is a method for creating AI algorithms that reduces sources of risk and bias throughout the AI lifecycle, from design to development to deployment and maintenance. Responsible AI practices encompass everything and everyone involved in your AI effort—people, methods, processes, and data.

From a data standpoint, a typical deep learning algorithm learns to make predictions or decisions based on a large set of training data. During the design phase, teams determine where the data to train the algorithm will come from, whether or not that data—assuming that it exists—was obtained legally and ethically, whether the data contains bias, and what can be done to mitigate that bias. This is a big job that shouldn’t be underestimated; 67% of companies draw from more than 20 data sources for their AI.

During development, all that data must be kept secure but accessible so that experiments can be performed and the algorithm can be optimized, trained, and validated. It’s important to be able to identify exactly what datasets were used to train and validate each version of each model. Once deployed, additional data and further effort may be needed to ensure that an algorithm is performing as expected and is not introducing error or bias. Regular retraining using the latest data is also common.


Principles of responsible AI

So what does responsible AI look like? There are almost as many frameworks for thinking about responsible AI as there are AI researchers. As part of its responsible AI practices, Google enumerates four principles that work well for the purposes of this blog.

  • Fairness. Does the AI algorithm produce results that are fair and unbiased?
  • Interpretability. Can the results produced by the algorithm be explained?
  • Privacy. Does the algorithm protect the privacy of both the training data and input data?
  • Security. Is the algorithm secure against attack?

To understand how these responsible AI principles apply in practice, here are examples of each one.

Examples of ethical dilemmas associated with AI

Fairness. Fairness is almost always an issue when human data is involved. The best-publicized example is facial recognition. Because of a bias toward images of white faces in the available training data, many facial recognition algorithms do a better job identifying white people than people of color. Cases of wrongful arrest based on false identification by AI algorithms are a clear example of bias, and have led to a number of moratoriums on the use of facial recognition by law enforcement and calls for legislation in countries around the world.

Interpretability. If the Netflix algorithm recommends a movie you don’t like, you probably don’t care too much what went into the recommendation decision because the stakes are relatively low. However, AI algorithms are increasingly used to assist with decisions about far more critical things, like hiring and loan approvals. If your loan application is denied by a machine, you’re going to want to know why. The next post in this series will dive into the topic of interpretability and explainable AI.

As with facial recognition, this example isn’t just theoretical. AI bias causes mortgage applications for people of color to be rejected at higher rates, reflecting bias in the training data, which typically consists of approved and rejected mortgage applications going back several years. Interpretability can help ensure that ethical issues like this aren’t perpetuated. More than 90% of companies using AI say that their ability to explain how it arrived at a decision is critical.

Privacy. AI researchers rely on patient data to create algorithms to identify cancer and other diseases quickly and accurately. However, there’s no field where privacy is more important than healthcare. Increasing accuracy requires more data than any single hospital, healthcare system, or research center has available, but patients don’t want copies of their imaging data distributed all over the world. One solution to this problem is federated learning, which—rather than collecting patient data from multiple research centers in one location—trains the algorithm in stages at each location. (Federated learning is the subject of the third post in this series.) Whatever your industry and the sources of your data, it’s important to ensure that data is stored securely so that privacy is protected.

Security. As AI algorithms make their way into autonomous systems that operate in the physical world, they may create significant risks that must be protected against. For instance, in 2019, AI researchers showed that they could use an “adversarial attack,” subtly altering lane markings to cause an autonomous car to swerve into the opposite lane.

And, because the behavior of an AI algorithm depends on its training data, you also have to protect against “training data poisoning,” in which training data is intentionally altered. An AI algorithm may be only as safe as the data used to train it.

Data and responsible AI

From the previous discussion, it should already be clear that you can’t achieve responsible AI without responsible data practices.

Fairness depends on having data that has been obtained ethically, is not biased or can be “cleaned” to eliminate bias, and is of sufficient quantity. Datasets that are too small may contain bias by chance alone.

Interpretability is important both for explaining how a trained model reached a particular decision and for identifying sources of error to improve model accuracy. As Google notes, an “AI system is best understood by the underlying training data and training process, as well as the resulting AI model.” This understanding requires the ability to map a trained AI model to the exact dataset that was used to train it and to examine that data closely, even if it’s been years since a version of a model was trained.

Privacy and security depend on your ability to securely store and manage the huge volumes of data used for training as well as any data that must be input to your algorithm to get a result. Returning to our earlier examples, AI models in healthcare must protect patient privacy. Loan approval models must protect the data from past applications used for training as well as current applicants. When it comes to AI, there’s always tension between the need for data to be accessible to data scientists and data engineers and the need for privacy and security.

As your AI efforts expand from pilot to production and begin to affect all areas of your operations, it’s important to put in place global policies—and infrastructure—to enable good data governance across your organization.

NetApp and responsible AI

three cards

At NetApp, we recognize that data is essential to your AI efforts and your business, and we help you manage it everywhere—on your premises and in the cloud. A laser focus on cloud innovation created our industry-leading storage and data management software. It’s why the three biggest cloud providers asked for that technology to be built into their clouds. And it’s how we make data accessible, protected, and cost optimized.

NetApp AI experts can work with you to build a data fabric—a unified data management environment spanning across edge devices, data centers, and public clouds—so your AI data can be efficiently ingested, collected, stored, and protected.

NetApp AI solutions give you the tools you need to expand your AI efforts, eliminate complexity, and accelerate innovation.

  • NetApp ONTAP® AI accelerates all facets of AI training and inference.
  • NVIDIA DGX Foundry with NetApp gives you world-class AI development without the struggle of building it yourself
  • NetApp AI Control Plane pairs MLOps and NetApp technology to simplify data management and facilitate experimentation.
  • NetApp DataOps Toolkit makes it easier to manage the large volumes of voice and text data that financial services companies need to analyze.
  • NetApp Cloud Data Sense helps you discover, map, and classify data. Analyze a wide and growing range of data sources, structured and unstructured, in the cloud or on premises.

Using NetApp tools in your AI operations reduces complexity, enables teams to manage data efficiently and securely, and ensures traceability and reproducibility.

To find out how NetApp can help you deliver the data management and data governance that are critical to responsible AI, visit

Hoseb Dermanilian

Hoseb joined NetApp in 2014. In his current role, he manages and develops AI and Digital Transformation business globally. Hoseb's focus is to propose and discuss NetApp's value add in the AI and Digital Transformation space as well as helping customers build the right platform for their data driven business strategies. As part of the business development, Hoseb is also focused on developing NetApp AI channel business by recruiting and enabling the right AI ecosystem partners and enabling Go-To-Market strategies with those partners. Hoseb is coming from a technical background. In his previous role, He was the Consulting System Engineer for NetApp’s video surveillance and big data analytics solutions. Hoseb holds a Masters degree with distinction in Electrical and Computer Engineering from the American University of Beirut and he has multiple globally recognized conference and journal publications in the field of IP Security and Cryptography.

View all Posts by Hoseb Dermanilian

Next Steps