AFF Storage for AI and Machine Learning

Mike McNamara

2020-02-14

53 瀏覽

Computer vision capabilities are having a significant impact in almost every industry, from autonomous vehicles to AI-assisted medical diagnosis. Training the machine learning (ML) algorithms used for computer vision applications creates an extremely demanding workload, requiring massive quantities of data and significant computing power.

To accommodate these workloads, you can use a clustered architecture consisting of NetApp^® storage systems and Fujitsu PRIMERGY servers optimized for AI. This NetApp and Fujitsu solution is designed to handle large datasets by using the processing power of GPUs alongside traditional CPUs. The combined solution of PRIMERGY servers and NetApp all-flash storage systems provides an infrastructure that delivers excellent performance and seamless scalability with industry-leading data management.

State-of-the-art NetApp AFF storage systems enable IT departments to meet enterprise storage requirements with industry-leading performance, cloud integration, and best-in-class data management. The Fujitsu PRIMERGY GX2570 server is an extremely powerful deep-learning (DL) platform that benefits from equally powerful storage and network infrastructure to deliver maximum value.

To automatically construct the system infrastructure for this solution, you can use Ansible, a DevOps-style configuration management tool developed by Red Hat. Ansible offers a variety of functional modules from NetApp and Cisco. It includes modules for the Fujitsu PRIMERGY GX2570 M5, for storage such as the NetApp AFF A800 array, and for automatic construction and configuration management of Cisco Nexus 3232C network switches. Ansible makes it easy to add GPU nodes and change the software environment on the host OS, greatly reducing the load on system administrators.

To validate the solution, NetApp and Fujitsu used one NetApp AFF A800 storage system, four Fujitsu PRIMERGY GX2570 servers, and two Cisco Nexus 3232C 100Gb Ethernet (100GbE) switches.

Compute, Network, Storage system diagram

We validated the solution by using the MLPerf v0.6 benchmark models and testing procedure. Each MLPerf training benchmark measures the processing time required to train a model on the specified dataset to achieve the specified quality target. The following table shows the training time involved for each of the models.

Model	Training time result
SSD	19.54 minutes
Mask R-CNN	186.22 minutes
ResNet-50	94.76 minutes
Minigo	24.97 minutes

To learn more about the joint solution, read this technical report.

Mike McNamara

Mike McNamara 是 NetApp 產品和解決方案行銷的資深主管，在資料管理和雲端儲存行銷領域擁有超過 25 年的豐富經驗。在十年前加入 NetApp 之前，Mike 曾任職於 Adaptec、Dell EMC 和 HPE 等公司。Mike 是推出第一方雲端儲存產品和業界第一款雲端連線 AI/ML 解決方案 (NetApp)、統一化橫向擴充和混合雲儲存系統與軟體 (NetApp)、iSCSI 和 SAS 儲存系統與軟體 (Adaptec)，以及光纖通道儲存系統 (EMC CLARiiON) 的重要團隊領導者。此外他曾經擔任「光纖通道產業協會 (Fibre Channel Industry Association，FCIA)」的行銷主席，也是乙太網路技術高峰會議顧問委員會、乙太網路聯盟的成員，現在仍定期為業界期刊撰稿，並經常擔任活動講師。Mike 還透過 FriesenPress 出版了一本名為《橫向擴充儲存設備 - 企業資料管理的未來樣貌》的書籍，並被 Kapos 列為值得關注的 50 名 B2B 產品行銷人員。

查看 Mike McNamara 的所有文章

後續步驟

部落格

瞭解雲端、內部部署和兩者之間的最新趨勢和發展。更棒的是，所有這一切都觸手可及。

開始閱讀

社群

探索各種開放式論壇，您可以在其中發表問題、分享答案，在對您最重要的所有 NetApp 技術上更精通。

加入討論