Data Science as a Service in Healthcare: Control Shadow AI

Esteban Rubens

February 4, 2021

Anyone involved with research in healthcare and life science knows that the funding model often has unintended consequences. In academia as well as in industry, a researcher or principal investigator (PI) writes a grant proposal describing a project and explaining why they need money for it. If the grant is approved, the PI receives funding to use as they see fit. If the research program involves data science it will need IT infrastructure, typically a combination of the following elements:

GPU compute
Flash storage
Object storage
Networking
A software stack including hypervisors, operating systems, and containers
An orchestration layer
Free or open-source environments such as Jupyter Notebooks, Python, and Kubeflow
Automation tools such as Ansible
MLOps tools such as the NetApp^® AI Control Plane and NetApp Data Science toolkit

How shadow AI takes hold

The PI or grant recipient rightfully controls how the grant is spent. This is good in terms of shielding scientists from external pressure, but it can create redundancy and inefficiency in an institution. Imagine an organization in which several research teams are undertaking data science projects. Given the way financial support is allocated, each team can build an almost identical infrastructure stack to support their work. This is inefficient, because the compute and storage resources in each of those environments is unlikely to be fully used, and it also takes away from the time the teams have for their research.

Installing and configuring a data science infrastructure stack takes time. In addition to the time it takes to get these environments ready for production, someone must be ready to provide ongoing support and troubleshooting if problems arise. The local IT team may be unwilling to get involved with a project they were not involved with architecting or deploying. They may view helping as a way to get drawn into a long ongoing engagement that they don’t have the resources for.

This is how shadow IT (or shadow AI) starts. Research teams want to make their own decisions and control their resources, which creates islands of almost identical, underused infrastructure that may not follow the organization's data security best practices. Fortunately, this suboptimal outcome is easy to avoid. Reducing duplication, maximizing resource utilization, simplifying operations, and strengthening data security and governance are goals that everyone is interested in pursuing.

Data science as a service to the rescue

In our age of anything and everything as a service, why not apply a proven strategic and operational blueprint to data science infrastructure? You can share resources, lower costs, and give time back to data scientists so they can focus on their work.

There is consensus among data scientists that they spend almost half their time (or even up to 75%!) on tasks that are not data science; for example:

Configuring hardware and software
Resource orchestration
Automation
Container management
Resource scheduling and assignment
Production management
Repository management
Version control
Data wrangling

Deploying a data science as a service (DSaaS) infrastructure stack and making it available to researchers through a self-service portal that includes chargeback allows them to keep their independence and to control their grant funding. It also delivers a more complete and secure solution, at the same time eliminating shadow AI silos. This is a case in which everyone wins. Researchers get easy access to the IT resources they need at a lower price than they would pay if they built the infrastructure themselves, and the organization avoids silos, increases efficiency, and promotes security and compliance.

Using the NetApp AI Control Plane and Data Science Toolkit, data scientists and engineers gain powerful tools that also alleviate the burden on IT. Researchers get an AI data- and experiment-management solution, and they also gain the ability to perform data management tasks from within the software environments they commonly use, like Jupyter Notebooks, Kubeflow and Apache Airflow pipelines, and Python. Bringing storage system API integration into the data science realm gives power to the data scientists so they can work more efficiently and avoid having to open IT tickets for common tasks.

NetApp has been helping healthcare organizations deploy data science as a service for years and has refined a process that is optimized for speed, accuracy, and customer satisfaction. For a description of our approach to DSaaS in an academic setting, read in Data Science as a Service—Prototyping an integrated and consolidated IT infrastructure combining enterprise self-service platform and reproducible research.

Adding MLOps tools rounds out the offering by helping data scientists to automate, streamline, and speed up feature engineering, pipeline deployment, continuous integration/continuous deployment), and model monitoring. To learn more, read our technical reports TR-4834 and TR-4841. You can also visit netapp.ai to learn more about our AI solutions.

If you would like to start a conversation about data science as a service, contact your NetApp Sales team or your NetApp partner, or get in touch with us.

Esteban Rubens

Esteban joined NetApp to build a Healthcare AI practice leveraging our full portfolio to help create ML-based solutions that improve patient care, and reduce provider burnout. Esteban has been in the Healthcare IT industry for 15 years, having gone from a being storage geek at various startups to spending 12 years as a healthcare-storage geek at FUJIFILM Medical Systems. He's a visible participant in the AI-in-Healthcare conversation, speaking and writing at length on the subject. He is particularly interested in the translation of Machine Learning research into clinical practice, and the integration of AI tools into existing workflows. He is a competitive powerlifter in the USAPL federation so he will try to sneak early-morning training in wherever he's traveling.

View all Posts by Esteban Rubens

Next Steps

Blogs

Brush up on the latest trends and developments in cloud, on premises, and everywhere in between. This is where it all gets real, with a cherry on top.

Get to reading

Community

Explore a wide range of open forums where you can post questions, share answers and just generally get smart on all the NetApp technologies that matter most to you.

Join the discussion