Federated learning: Intelligence versus privacy

Contents

Share this page

Sathish Thyagarajan

December 17, 2021

142 views

In my previous blog I wrote about AI-powered recommender systems and how they have changed our lives over the last decade. As I sat down to write this time, I reflected on problems with machine learning (ML) at scale, data privacy, and federated learning (FL), an emerging trend that is a hot research topic for building AI models. FL is primarily a distributed ML architecture that enables the training of an algorithm across multiple decentralized data. Instead of being gathered in a single server, the data remains locked on a server or edge device, while only the algorithm travels between the servers. Thus FL functions on the fundamental approach of bringing the code to the data, instead of the data to the code.

With growing concerns about data privacy, data security, data rights, and data access, techniques like federated learning are gaining popularity because they present an opportunity to reduce data risk. In this blog, I discuss some of the use cases for FL and how NetApp can help address FL challenges associated with data protection and data privacy.

Emerging industry use cases

Using FL and removing the barriers related to data sharing benefit almost every industry. In banking and financial services, FL techniques are being used to optimize pricing and expense ratios in portfolio management. FL enables asset managers, financial advisors, and robo-advisors to maintain their client’s confidentiality relating to the components of the portfolio. FL also allows them to connect with other investment banks that can provide a fair price during buying or selling a client’s portfolio.

Financial institutions are training neural network models on the server by sending encrypted model weights and bias coefficients back and forth. In coping with financial crimes and pressure from deep regulatory compliances like GDPR, FL systems have the potential to improve current efforts to curb unlawful financial activity like money laundering and fraud. FL accomplishes this improvement by enabling shared machine learning without sharing data.

In healthcare, for instance, training an AI model to identify damage to the hippocampus in an MRI of the brain is an important step in diagnosing patients with dementia or Alzheimer’s disease. To build robust AI algorithms, hospitals and medical research institutions often need to collaborate and bring together their research knowledge, as in the case of the Personal Genome Project (PGP). However, deciding on the allowable use of data while preserving the patient’s right to privacy is a challenging task.

The EMR CXR AI Model (EXAM), a new study led by Mass General Brigham and NVIDIA, brought together 20 institutions from around the world to train a neural network. The model predicts the future oxygen requirements of symptomatic patients with COVID-19, using inputs of vital signs, laboratory data, and chest X-rays. The results of this collaborative learning effort are published in Nature Medicine. The model, which is publicly available for research through the NVIDIA NGC hub, uses NVIDIA Clara for training federated learning capabilities with AI-assisted annotation and transfer learning. NetApp® ONTAP® AI for diagnostic imaging with NVIDIA Clara provides guidelines on workflows used in the development of deep learning (DL) models for medical imaging. NetApp ONTAP AI is also validated with DL platforms like TensorFlow, which offers the open-source framework TensorFlow Federated for data scientists who are developing AI models on decentralized data.

The two most common types of federated learning

Cross-device federated learning is typically deployed in a single organization. It includes IoT sensors, mobile, or edge devices that belong to a single organization’s users
Cross-silo federated learning typically involves multiple organizations, like the example of multiple hospitals and research institutions mentioned earlier in a healthcare scenario.

Design challenges with federated learning

Some possible solutions for privacy concerns are encryption (centralized) and federated learning (decentralized). However, recent research has demonstrated that privacy preservation in FL by retaining data and computation on-device alone is not sufficient to guarantee privacy. This is because model parameters exchanged between parties in an FL system still conceal sensitive information, which can be exploited in privacy or security attacks like the Byzantine and data poisoning attacks in federated learning. Therefore, FL systems need efficient data protection, data governance, and privacy preservation at scale AI infrastructures that comply with programs like GDPR, CCPA, and LGPD.

NetApp data privacy and AI infrastructure solutions

NetApp offers a wide variety of products and services with tools that can be used in your privacy operations like GDPR and CCPA compliance programs. NetApp solutions address a full range of cybersecurity threats, and they do data protection and security assessment the right way. These solutions include:

NetApp Cloud Data Sense to help you identify personal information present in your data, enact policies, meet privacy requirements, and align with data governance.
NetApp SnapCenter^® technology to support backup and recovery.
NetApp FPolicy for privacy operations and policy enforcement.
NetApp StorageGRID^® object store for hybrid multicloud environments.
NetApp ONTAP data management software with unified storage and S3 object access for next-generation applications like AR and VR, autonomous vehicles, and cashierless stores.

Other solutions include:

NetApp Astra™ Control to manage, protect, and move data-rich Kubernetes (K8) workloads in both public cloud and on premises.
NetApp ONTAP AI powered with NVIDIA GPUs for cloud-connected AI /ML training.
NetApp AI inferencing for real-time event streaming that enables edge computing
NetApp AI Control Plane and MLRun pipeline for AI and ML model versioning, A/B testing, MLOps & Serverless automation in distributed computing environments or multi-node AI and ML systems running FL aggregate algorithms like FedAvg, Scaffold, etc.

Example of a cross-silo federated learning, multiple organizations.

Storage, compute, local FL models belonging to each organization (on-premises, hybrid cloud, edge).
Data that is more widely distributed, for example between hospitals, banks, etc.

If the future FL direction focuses on containerization and security frameworks that ensure reliable storage orchestration and alleviate concerns about data crash or federated learning failures, a robust AI at scale infrastructure is imperative.

Federated learning presents an opportunity to reduce data risks. However, it also poses new risks that are yet to be fully discovered and resolved. Every new risk may have a technological solution, but without vigorous data governance, data protection, and reliable AI infrastructure, FL is not likely to be effective. NetApp is working to create advanced tools that eliminate bottlenecks to help AI engineers close some of these gaps. As a cloud-led, data-centric software company, NetApp is uniquely positioned to offer a data fabric with industry-leading data management capabilities across the edge-core-cloud ecosystem. NetApp helps customers to build privacy-preserving ML and AI models that enable private, secure, and seamless data analysis.

Learn more about NetApp AI solutions.

Sathish Thyagarajan

Sathish joined NetApp in 2019. In his role, he develops solutions focused on AI at edge and cloud computing. He architects and validates AI/ML/DL data technologies, ISVs, experiment management solutions, and business use-cases, bringing NetApp value to customers globally across industries by building the right platform with data-driven business strategies. Before joining NetApp, Sathish worked at OmniSci, Microsoft, PerkinElmer, and Sun Microsystems. Sathish has an extensive career background in pre-sales engineering, product management, technical marketing, and business development. As a technical architect, his expertise is in helping enterprise customers solve complex business problems using AI, analytics, and cloud computing by working closely with product and business leaders in strategic sales opportunities. Sathish holds an MBA from Brown University and a graduate degree in Computer Science from the University of Massachusetts. When he is not working, you can find him hiking new trails at the state park or enjoying time with friends & family.

View all Posts by Sathish Thyagarajan

Next Steps

Blogs

Brush up on the latest trends and developments in cloud, on premises, and everywhere in between. This is where it all gets real, with a cherry on top.

Get to reading

Community

Explore a wide range of open forums where you can post questions, share answers and just generally get smart on all the NetApp technologies that matter most to you.

Join the discussion