跳轉至主要內容

Automating MLOps, DevOps, and DataOps for Data Scientists and ML Teams

Mike McNamara
Mike McNamara

automating-mlops-dataops-data-science-pipeline-1024x576The unprecedented promise of machine learning (ML) is still unrealized, because data scientists are spending most of their time on non-data-science work. The common practice is that ML development through deployment relies on ad hoc tools, plug-ins, scripts, and a myriad of siloed tools that are impeding organizations, large and small, from streamlining ML development.

NetApp and cnvrg.io have partnered to deliver an AI/ML data science pipeline solution that is streamlined and drives productivity and efficiency. The solution incorporates industry-leading Kubernetes managed clusters (for example, Red Hat OpenShift), cached datasets for extreme performance, and the one-click attachments of models to datasets with NVIDIA NGC integration. NetApp® ONTAP® AI provides high-performance compute and storage for any scale of operation, and cnvrg.io software streamlines data science workflows, improving resource utilization. 
Automating MLOps, DevOps, and DataOps for Data Sci - Inline Image 2With NetApp and cnvrg.io, you can cache datasets (and/or their versions) and make sure that they’re located in the ONTAP node attached to the GPU cluster or CPU cluster that is exercising the training. Once the datasets are cached, they can be used multiple times by different team members. With caching, datasets are ready to be used in seconds rather than hours, and cached datasets can be authorized and used by multiple teams in the same compute cluster connected to the NetApp cached data.

NetApp and cnvrg.io have written a detailed technical paper, Hybrid-cloud AI Operating System with Data Caching, which presents an innovative solution that enables IT professionals and data engineers to create a truly hybrid-cloud AI platform with a topology-aware data hub. Data scientists can instantly and automatically create a cache of their datasets in proximity to the compute, wherever the compute is located. As a result, high-performance model training can be easily accomplished and multiple AI practitioners can collaborate with immediate access to the cached datasets and versions, and with the ability to create a dataset-version hub. 

To learn more, read the technical report. To experiment with cnvrg.io, download the free community version

Mike McNamara

Mike McNamara

Mike McNamara 是 NetApp 產品和解決方案行銷的資深主管,在資料管理和雲端儲存行銷領域擁有超過 25 年的豐富經驗。在十年前加入 NetApp 之前,Mike 曾任職於 Adaptec、Dell EMC 和 HPE 等公司。Mike 是推出第一方雲端儲存產品和業界第一款雲端連線 AI/ML 解決方案 (NetApp)、統一化橫向擴充和混合雲儲存系統與軟體 (NetApp)、iSCSI 和 SAS 儲存系統與軟體 (Adaptec),以及光纖通道儲存系統 (EMC CLARiiON) 的重要團隊領導者。此外他曾經擔任「光纖通道產業協會 (Fibre Channel Industry Association,FCIA)」的行銷主席,也是乙太網路技術高峰會議顧問委員會、乙太網路聯盟的成員,現在仍定期為業界期刊撰稿,並經常擔任活動講師。Mike 還透過 FriesenPress 出版了一本名為《橫向擴充儲存設備 - 企業資料管理的未來樣貌》的書籍,並被 Kapos 列為值得關注的 50 名 B2B 產品行銷人員。查看 Mike McNamara 的所有文章

後續步驟

Automating MLOps, DevOps, and DataOps for Data Scientists