Building a data pipeline for recommender systems

Edge to core to cloud

People working at a table
Table of Contents

Share this page

Sathish Thyagarajan
Sathish Thyagarajan

How do algorithms impact our day-to-day life? When I started writing this article, I first reflected on some of the key algorithms that have changed our life over the last two decades. I began to recognize algorithms that make hugely consequential decisions in our society, from food delivery to transportation, audio and video streaming, financial decisions, and so much more. Around the start of the millennium, when the Y2K bug was no longer a major concern, I recall PageRank, an algorithm that had just started to gain popularity. This new design method with PageRank fundamentally transformed how search engine optimization (SEO) firms operated. Over the last decade, recommender systems have brought many more changes to our lives and have become important to businesses and consumers in global markets, from online shopping to online entertainment.

Recommender systems are among the most visible success stories of AI in practice. In many practical applications, recommender systems are combined with conversational AI or chatbots interfaced with natural language processing (NLP) to filter relevant information and produce useful inferences. Today, many retailers are adopting newer business models like “Buy Online and Pick Up in Store,” curbside pickup, self-checkout, scan-and-go, and more. These models have gained traction during the COVID-19 pandemic by making shopping safer and more convenient for consumers. Artificial Intelligence (AI) is crucial for these growing digital trends, which are influenced by consumer behavior and vice versa. To meet the growing demands of consumers, to augment the customer experience, to improve operational efficiency, and to grow revenue, NetApp helps its enterprise customers and businesses using machine learning (ML) and deep learning (DL) algorithms to design faster and more accurate recommender systems.

The two main paradigms and challenges

Types of recommender systems. Collaborative filtering is a widely used technique based on historic user preference of items like the most frequently clicked, watched, purchased, etc. These latent factors capture not only explicit information but also implicit dependencies to construct a user-item similarity matrix to perform matrix factorization. However, this model does not factor the associated attributes of the user or item. On the other hand, content-based filtering takes user attributes like age, sex, and nationality, or item attributes like books, electronics, groceries, etc., and applies regression or classification models to deduce results. To achieve advanced results and as a best practice, data scientists often apply a combination of these two filtering methods, a hybrid design that leverages the advantages of both paradigms.

Design challenges and bottlenecks in recommender systems  
Most enterprise customers experience some of the following challenges.

  • Data sparsity. Matrix factorization is an efficient technique. However, the model depends on users’ past transaction data. When users rate only a limited number of items, sparse data can lead to unreliable recommendations.
  • Scalability. The amount of data added to a recommender system grows rapidly as more users and items are included. A key challenge here is to design efficient learning algorithms that can handle large datasets distributed across regions, which involves maintaining multiple copies and snapshots of datasets for ML and DL training.
  • Cold start. When new users or items are added, data with few or no ratings can impact the user-item matrix, which can make it difficult for data scientists to bootstrap the recommender system.

NetApp minimizes these challenges by accelerating the AI training workflow for data scientists working with multiple copies of real-world data or synthetic datasets for deploying context-aware recommender systems. For instance, copying a 10TB dataset takes 2 seconds rather than hours. With NetApp these data copies are also stored efficiently. For example, data scientists can make 10 copies of each dataset with a reduction in storage space of up to 90%.


NetApp data pipeline for recommender systems

Commercial recommenders are trained on huge datasets, often several terabytes in scale, with millions of users and products from which these systems make recommendations.

Like with most AI, ML, and DL workloads, recommender systems need high-performance storage and industry-standard systems. NetApp® AFF provides all the flexibility that NetApp ONTAP® delivers to keep up with fast networking capabilities and the high-I/O demands of GPU-enabled training clusters.

  • NetApp ONTAP AI is a NetApp Verified Architecture for AI workloads that uses NetApp AFF A800 storage systems and NVIDIA DGX A100 systems, where each DGX A100 system is powered by eight NVIDIA A100 GPUs. The DGX A100 system leverages NVIDIA NGC, a cloud-based container registry for GPU-accelerated software that provides the most popular DL frameworks such as TensorFlow, PyTorch, and Merlin.
  • NVIDIA Merlin for recommender systems provides fast feature engineering and preprocessing for operators that are common to recommendation datasets, including models like the deep learning recommendation model (DLRM).
  • Many NetApp enterprise customers use the open-source unified analytics environment Apache Spark. Apache Spark comes with a stack of libraries and APIs like Spark ML, Spark SQL, and GraphX that are used by many companies to design ML models and recommender systems. NetApp is helping some of its enterprise customers in the retail and financial services industries to achieve high performance and model accuracy for both batch and streaming data. To learn more about Apache Spark with NetApp and how it’s helping a large bank, refer to the NetApp Verified Architecture, Apache Spark Workload with NetApp Storage Solution.
  • Kubernetes is a popular platform used to deploy modern business applications that include recommender systems. NetApp Trident is an open-source storage orchestrator developed and maintained by NetApp that greatly simplifies the creation, management, and consumption of persistent storage for platforms like Kubernetes and Docker containers. The NetApp AI Control Plane pairs NetApp data management capabilities with popular open-source tools and frameworks like Kubeflow and Kubernetes. It can be implemented on any Kubernetes cluster that is used to deploy AI-powered applications. The NetApp AI Control Plane, which leverages the NetApp DataOps Toolkit, helps data scientists and AI engineers to seamlessly replicate data across sites and regions to create a cohesive and unified AI/ML/DL data pipeline for traceability, model versioning, and A/B testing.
Ansible diagram

The toolkit is easy to install using pip, the standard package manager for Python. Furthermore, integrating the DataOps Toolkit with NetApp® Astra™ Control Center would enable users to deploy and run business-critical Kubernetes workloads with the enterprise-grade data management functionalities in both public clouds and on-premises. Data scientists can take advantage of a data fabric powered by NetApp for experiment management solutions and A/B testing their recommender systems.

AI Control Plane

Why A/B testing is crucial

There seems to be no gold standard for performing matrix factorization or designing recommender systems. Data scientists rely heavily on A/B testing to quantify the impact of these recommendations on business outcomes and to evaluate their efficacy.

With A/B testing, users can test multiple variations of ML and DL models until they find the best possible recommendation to improve their experience. Therefore, it’s crucial for data scientists who are boosting the algorithms using model size and A/B testing to keep a snapshot of all data for generating models when developing recommender systems. A high-quality A/B test affects the design of the best recommendation engine to connect the right products and services to ensure customer satisfaction and to increase business revenue. The NetApp AI Control Plane was developed by keeping customer pain points in mind. It enables users to clone a namespace to seamlessly replicate datasets across regions and to perform the ML/DL model training, versioning, and A/B testing needed for developing recommender systems.

Convergence of edge computing and recommender systems. Companies are increasingly generating massive volumes of data at the network edge. In the IoT domain, recommendation functionalities are becoming essential. Recommender systems expect to deeply understand user’s behavior, demand, and interest via edge servers. Mobile edge computing and AI inferencing at the edge are novel computing paradigms that are emerging to push computation and storage resources from remote servers to network edge servers. The NetApp AI inferencing at the edge solution combines multiple Lenovo edge servers with NetApp AFF storage systems, NVIDIA T4 GPUs, and ONTAP storage management capabilities to create recommender systems that are easy to deploy and manage.

NetApp cloud solutions. Many modern applications with recommender systems run in the cloud for model training. NetApp Cloud Volumes ONTAP is a highly available storage solution on public clouds that supports grow-as-you-go file shares that use NFS, CIFS, or iSCSI file services. NetApp Cloud Sync is a NetApp service for rapid and secure data synchronization. Whether you need to transfer files between on-premises NFS or SMB file shares, NetApp StorageGRID®, NetApp ONTAP S3, or cloud services like Azure NetApp Files, Azure Blob, AWS S3, AWS EFS, Google Cloud Storage, or IBM Cloud Object Storage, Cloud Sync moves the files where you need them quickly and securely.

Data fabric, edge to core to cloud. AI workloads like recommender systems can mean resource-intensive tasks, from data management during training to managing scalable real-time AI/ML-based API endpoints. At a high level, an end-to-end AI/ML model deployment consists of three stages through which the data travels: edge, core, and cloud. This movement of data is very common in applications such as mobile apps and web apps where recommender systems are deployed.

AI, Core, Cloud

A data fabric powered by NetApp has a robust line of products and services that manage data across the three realms of the AI infrastructure and data pipeline required for the deployment of recommender systems.

Customer success story – recommender system use case. One of the largest American retail corporations, a NetApp customer, is leveraging AI with NLP and multimodal recommendations, allowing their customers to interact with their products virtually using augmented reality. On their web app or mobile app, customers can visit the retail studio to try cosmetics virtually. The customer uses a smartphone camera or a webcam on a laptop to allow the studio application to scan their face, which then maps high-resolution images of the products onto the user’s face. The retail company augments the customer’s shopping experience by enabling recommender systems for product selection. This process makes purchasing decisions easier and increases the value for the customer. NetApp is working closely with this retail giant to understand its AI needs in stores, on line, and in its core business, to help drive its digital transformation.

More information and resources.

As enterprises of all types embrace AI technologies, they face data challenges from the edge to the core to the cloud.​ As the data authority on hybrid cloud, NetApp is building a network of partners that can help with all aspects of constructing a data pipeline for AI solutions across the edge-core-cloud ecosystem. Data fabric technologies and services from NetApp can jumpstart your company on the path to success to enable accelerated AI workloads and smoother cloud integration. To learn more about NetApp AI solutions, visit

Sathish Thyagarajan

Sathish joined NetApp in 2019. In his role, he develops solutions focused on AI at edge and cloud computing. He architects and validates AI/ML/DL data technologies, ISVs, experiment management solutions, and business use-cases, bringing NetApp value to customers globally across industries by building the right platform with data-driven business strategies. Before joining NetApp, Sathish worked at OmniSci, Microsoft, PerkinElmer, and Sun Microsystems. Sathish has an extensive career background in pre-sales engineering, product management, technical marketing, and business development. As a technical architect, his expertise is in helping enterprise customers solve complex business problems using AI, analytics, and cloud computing by working closely with product and business leaders in strategic sales opportunities. Sathish holds an MBA from Brown University and a graduate degree in Computer Science from the University of Massachusetts. When he is not working, you can find him hiking new trails at the state park or enjoying time with friends & family.

View all Posts by Sathish Thyagarajan

Next Steps