Demystifying MLOps: part 1

tablet with graph
Table of Contents

Share this page

man front view
Muneer Ahmad Dedmari

So, your organization has decided to use machine learning (ML) and to invest resources that can deliver business value to your customers. Data scientists and data engineers are now solving problems that seemed next to impossible a few years back, and proofs of concept (POCs) are showing exceptional results.

And now it’s the moment of truth: Business executives are asking when you can deploy the trained models into production!

It shouldn’t be difficult, because most of the complex work is done, right? Unfortunately, it’s not as easy as it seems. According to the 2019 report, only 22% of companies have successfully deployed an ML model into production. The rest are stuck in the preproduction phase or have simply failed at the deployment and model management stage.

But don’t be discouraged. In this blog post, I discuss some of the challenges of ML lifecycle management in an enterprise environment—and what methodologies and tools can help productize ML solutions.

ML in production is challenging but not impossible

ML has advanced a lot, and we are privileged to have most of the resources that we need at our disposal. We have access to compute resources (on premises and in the cloud), to the necessary quantity and quality of datasets, and to state-of-the-art ML research. ML systems are also being streamlined. Data engineers together with data scientists are transforming and preparing data that’s consumed by ML models for training. And models ultimately go to production for model serving, where they’re monitored and retrained if necessary.

In the real world, only a small segment of an ML system is composed of the ML model code. The rest of the process consists of data collection, data consolidation, system configurations, model and data verification, debugging and testing, resource management and infrastructure serving, feature and metadata management, and monitoring.

An example illustrating ML model challenges

Let’s say that you have a team of data scientists and data engineers who are working on dynamic pricing for airline flight bookings. The business objective is to allocate ticket price based on travel dates; seat availability; and, to increase sales, a relative competitor pricing model.

You and your team work mostly independently in your own work environments, like Jupyter Notebooks, and use the dataset that’s available for training and validating the model. Maybe team members share notebooks with each other by email or they use some code versioning (GitHub, Bitbucket, etc.). They also have regular catch-up meetings to make sure that everyone is in sync and that the project is progressing as expected.

You’re all using allocated compute and storage resources (AI infrastructure) for training by executing the cells in your notebooks. After some time, your trained model is producing good enough results on your holdout test dataset, and you believe that it will work in the production environment and will predict better pricing for airline tickets. You also have data analysis and visualization reports in your notebooks that back the results and validate your model’s performance.

Finally, it’s time to deploy your best trained model and integrate it into the existing airline ticket–booking system. But there are a few unanswered questions that you need to take care of, including concerns such as:

  • How do you use the code cells of Jupyter Notebooks for a production flight-booking system and preserve the data transformation that took place during training?
  • In the production system, how are you going to continuously monitor model performance? And how are you going to compensate for deviation in the predictions that might occur due to changes in data distribution over time, which might result in model drift?
  • How is your team going to reproduce the experiments and fine-tune trained models for better performance, considering the data used at that point in time?
  • How can you effectively scale the model for retraining on a larger dataset?

If you don’t consider and mitigate these challenges, you can have disastrous consequences near the end of your project. You might have to rebuild everything from scratch, or the project might fail to reach the production stage.

Why you need a new approach

Successful and mature AI processes require automation of these phases or to smoothly carry out training of new models with new data or with new implementations. Automation helps abstract away the complexity and lets you focus on the actual problem at hand. Wait a second, isn’t that very similar to what DevOps practices are known for, and can’t we use a similar concept for ML? That’s right, I’m referring to machine learning operations, or MLOps for short.

But can we really use DevOps methodologies for ML and simply call it MLOps? It does seem to be an obvious option, because an ML system is a software system (Software 2.0) at its core. But it’s a different beast altogether, and it demands a new mindset for handling AI development and workflow management. The core difference between ML (Software 2.0) and a traditional software stack (Software 1.0) is that ML is not just code and configurations. Data is also an integral component of the ML lifecycle and defines the behavior of a trained model.

What is MLOps?

MLOps is a methodology and a practice for a collaborative approach, and it combines data engineering, ML, and DevOps. It aims to operationalize the process of training and tracking models at scale, deployment and maintenance of models in production, and the entire data pipeline that encompasses the ML system. MLOps also ensures the model performance and measures it against business objectives, and it enables continuous delivery of business value. The following figure shows some of the benefits of MLOps.

Reasons to go to MLOps

Here are the general practices that you can use to achieve MLOps in your ML projects:

  • Transition from manual script-driven interactive processes to an automated ML pipeline. Usually, ML researchers and data scientists are building state-of-the-art models by experimental code that’s written and executed in notebooks. This practice forces the manual execution and transition of each step: data consolidation, data analysis, data preprocessing, training, validation, etc. This type of setup results in discontinuity between data scientists, who are responsible for building and for training models, and the engineering team, who takes care of deploying the model and serving it. You can mitigate this problem by automating the ML pipeline, which enables you to package the data and training processes as modular components and to trigger retraining of models.
  • Orchestrate ML experiments. By defining the training environment requirements as code and/or configuration, you can attain environmental-operational symmetry. For example, resources like compute and storage that your team needs during the experimentation phase can be made available in a similar fashion while they work in the production environment. This practice saves data scientists from the dance of “how do we get this model into production?” And it also promotes a cloud-first approach to elastically scale the training and to pay only when you need the resources.
  • Execute and track experiments. In ML, failure is not really a failure; it’s the making of a better model in progress. You’re usually attempting to solve a problem, considering a hypothesis that’s based on business objectives, available datasets, and ML algorithms that are suitable for the use case. Each experiment drives you closer to the best possible solution. It’s crucial to track your experiments because it helps maintain continuity and helps the rest of your team understand what was tried and what went wrong.
  • Use ML versioning and source control. By allowing team members to work alongside each other, versioning of ML code makes collaboration much easier. Traditional code versioning can keep track of code, configurations, and the project dependencies. In ML, however, things get complex, so versioning the code that implements a model is not enough. The model might behave drastically different from one input dataset to another. And to capture the complete training state, you also need to perform versioning on training data and generated models. By recording metadata about experiments, like run-time parameter passed at execution time, components executed, and model evaluation metrics, you facilitate reproducibility and comparison across multiple experiments.
  • Use ML continuous integration (CI). In ML, CI is not just for testing and validating code and configurations and for building container images. It builds packages for components and performs tests on feature engineering logic (missing values in input data, data dimension, etc.). It also starts and monitors training convergence (that the loss decreases with iterations).
  • Use ML continuous delivery (CD). The best trained model needs to be automatically packaged and should be easily deployed at a moment’s notice. CD enables you to test the model compatibility (for example, installed libraries that the model needs) with the platform on which it’s supposed to be deployed. With CD, you can also validate that the same feature engineering is applied at serving time to ensure input feature consistency across the training and serving setup. You can roll out models gradually, without disrupting the production environment. If a model doesn’t behave as expected in the production environment, you can use ML versioning to easily roll back to the previous best known model.
  • Evaluate model interpretability and explainability. Most trained models lack transparency, which usually results in violation of governance and accountability requirements. Model interpretability is the ability of a system to determine cause and effect; for example, if a patient has only one tumor, their chance of survival is 90%. Explainability is the extent to which the internal workings of an ML architecture can be defined in human-friendly terms. For example, a model’s average impact on prediction of diabetes due to glucose is 40%, blood pressure is 15%, and age is 15%. It’s important to note that without knowing what data was used to train the model, interpretability and explainability are difficult to implement in a system.
  • Monitor the model in production. Change is the only constant, and monitoring in-production models helps mitigate the risk from changes. You can watch performance drift and automatically inform the responsible team member to take the appropriate actions, like retraining on new data. The performance of a model might degrade due to a change in the data distribution (for example, data gathered from new sensors) and/or concept drift (the relationship between input and output data changes over time). Monitoring also confirms whether your hypothesis is valid on real-world data and aligns with your business objectives.

NetApp improves patient care

When your organization uses good MLOps practices, you can ultimately produce better results while being cost-effective. You can set up a platform and architecture in place to make the whole process as easy as pushing code to code versioning. The rest (packaging, preprocessing, training, ML versioning, model deployment, autoscaling, etc.) is taken care of for your team.

To learn more about MLOps and how Netapp® AI makes it easier, check out our featured video.

In part 2 of this blog, I discuss a use case and how to deploy a collection of tools (GitHub, Kubeflow, Jenkins, and NetApp AI data management) to incorporate the MLOps methodology into your projects.

To learn more about NetApp AI solutions, visit

Muneer Ahmad Dedmari

Working as an AI Solutions Architect – Data Scientist at NetApp, Muneer Ahmad Dedmari specialized in the development of Machine Learning and Deep learning solutions and AI pipeline optimization. After working on various ML/DL projects industry-wide, he decided to dedicate himself to solutions in different hybrid multi-cloud scenarios, in order to simplify the life of Data Scientists. He holds a Master’s Degree in Computer Science with specialization in AI and Computer Vision from Technical University of Munich, Germany.

View all Posts by Muneer Ahmad Dedmari