Accelerate Time to Insight with Fast Streaming of Data to AI applications

BeeGFS|EF-Series

Mike McNamara

2021-03-18

69 조회수

In the past, compute and direct-attached storage have been used to feed data to AI workflows. But scaling with traditional storage can lead to disruption and downtime for ongoing operations. Disruptions affect the productivity of data scientists and data engineers. Downtime or slow AI performance can set off a chain reaction that reduces developer productivity and causes operational expenses to spin out of control.

Advances in individual and clustered GPU computing architectures have made NVIDIA DGX systems the preferred platform for workloads such as high-performance computing (HPC), deep learning (DL), video processing, and analytics. Maximizing performance in these environments requires a supporting infrastructure, including storage and networking, that can keep the NVIDIA GPUs featured in DGX systems fed with data. Dataset access must be provided at ultralow latencies with high bandwidth.

NetApp^® EF-Series AI tightly integrates DGX A100 systems, NetApp EF600 all-flash arrays, and the BeeGFS parallel file system with state-of-the-art InfiniBand networking. NetApp EF600 AI simplifies artificial intelligence deployments by eliminating design complexity and guesswork. You can start small and scale seamlessly from science experiments and proofs-of-concept to production and beyond.

EF600 powered BeeGFS building blocks have been verified with up to eight DGX A100 systems. By adding more of these building blocks, the architecture can scale to multiple racks supporting many DGX A100 systems and petabytes of storage capacity. This approach offers the flexibility to alter compute-to-storage ratios independently based on the size of the data lake, the DL models that are used, and the required performance metrics.

Investing in state-of-the-art compute demands state-of-the-art storage that can handle thousands of training images per second. You need a high-performance data services solution that keeps up with your most demanding DL training workloads.

The NetApp EF600 all-flash array gives you consistent, near-real-time access to data while supporting any number of workloads simultaneously. To enable fast, continuous feeding of data to AI applications, EF600 storage systems deliver up to 2 million cached read IOPS, response times of under 100 microseconds, and 42GBps sequential read bandwidth in one enclosure. With 99.9999% reliability from EF600 storage systems, data for AI operations is available whenever and wherever it’s needed.

BeeGFS is a parallel file system that provides flexibility, which is key to meeting diverse and evolving AI workloads. NetApp EF-Series storage systems supercharge BeeGFS storage and metadata services by offloading RAID and other storage tasks, including drive monitoring and wear detection.

NetApp and NVIDIA: Innovating together

The DGX A100 system is a next-generation universal platform for AI that requires equally advanced storage and data management capabilities. By combining DGX A100 with BeeGFS building blocks based on NetApp EF600 systems, this verified architecture can be implemented at almost any scale. You could pair a single DGX A100 with a single BeeGFS building block. Or you could have up to 140 DGX A100 systems with a scalable number of BeeGFS building blocks presenting a single storage namespace.

Combined with the outstanding cloud integration and software-defined capabilities of the NetApp product portfolio, NetApp storage solutions enable a full range of data pipelines that span the edge, the core, and the cloud for successful DL projects. To learn more, read our two related NetApp Verified Architecture documents, NetApp EF-Series AI with NVIDIA DGX A100 Systems and BeeGFS: NVA Design and NetApp EF-Series AI with NVIDIA DGX A100 Systems and BeeGFS: NVA Deployment.

Mike McNamara

Mike McNamara는 NetApp의 제품 및 솔루션 마케팅 분야의 고위 경영진이며 25년이 넘는 데이터 관리 및 클라우드 스토리지 마케팅 경험을 보유하고 있습니다. 10년 전 NetApp에 입사하기에 앞서, McNamara는 Adaptec, Dell EMC, HPE에서 근무했습니다. McNamara는 자사 클라우드 스토리지 오퍼링 및 업계 최초의 클라우드 연결형 AI/ML 솔루션(NetApp), 유니파이드 스케일아웃 및 하이브리드 클라우드 스토리지 시스템 및 소프트웨어(NetApp), iSCSI 및 SAS 스토리지 시스템 및 소프트웨어(Adaptec), 파이버 채널 스토리지 시스템(EMC CLARiiON)의 출시를 이끈 핵심 팀 리더입니다.McNamara는 Fibre Channel Industry Association에서 마케팅 의장을 역임한 경력 외에도 Ethernet Technology Summit Conference Advisory Board와 Ethernet Alliance에서 회원으로 활동하고 있으며, 업계 저널의 고정 기고자로 활동하며 여러 행사에서 연설을 맡기도 했습니다. McNamara는 또한 FriesenPress에서 'Scale-Out Storage - The Next Frontier in Enterprise Data Management'라는 책을 출간했으며, Kapos가 선정한 눈 여겨 볼 상위 50대 B2B 제품 마케터에 이름을 올렸습니다.

Mike McNamara의 모든 게시물 보기

다음 단계

블로그

클라우드, 온프레미스, 그리고 그 사이의 모든 영역에서 최신 트렌드와 발전에 대한 정보를 얻으세요. 모든 것이 실전에 적용되고, 거기에 더해 완벽한 마무리까지!

블로그 글 읽기

커뮤니티

다양한 공개 포럼을 탐색하여 질문을 게시하고 답변을 공유하며 자신에게 가장 중요한 모든 NetApp 기술에 대한 지식을 쌓아보세요.

토론 참여하기