The artificial intelligence (AI) survey data discussed in the blog post is based on answers from more than 100 customers at recent AI conferences to gather primary research on AI data management challenges, AI infrastracture and tools, and reasons for choosing different types of data storage.
The majority of survey respondents were software developers (24%), data scientists (18%), researchers (15%), line-of-business owners (7%), data analysts (6%), or general IT (6%). Company size of respondents ranged widely: more than 10,000 employees (45%), fewer than 1,000 employees (28%), 1,000 to 5,000 employees (18%), and 5,000 to 10,000 employees (9%). The top use cases for AI were healthcare and computer-assisted diagnosis, automotive and autonomous vehicles, manufacturing and robotics, and financial services and fraud detection.
The top three storage and data management challenges with AI were scaling storage, cloud integration, and backing up data.
The top three requirements when choosing an AI infrastructure vendor were cost, easy to deploy and manage, and services and support offerings.
The three most common tools used with AI were NoSQL databases, Apache Hadoop, and Splunk. The majority of respondents either use a Hadoop cluster (data lake) with AI or plan to in the near future.
The most common file systems used with AI are HDFS and NFS. Less than 10% of the respondents used ZFS and GPFS.
Cloud storage is the most popular storage used for AI, followed by direct-attached storage (DAS) and then external storage.
The top reasons for using servers with internal storage or servers with JBOD are performance, cost, and management decision. As shown in the following graph, the top reasons for using external data storage are performance, reliability, and data protection.
NFS is the protocol of choice for AI with external storage, followed by Fibre Channel and then NFS. Ease of use, easy to scale, cost, and already use cloud for compute are the main reasons for using cloud storage. The most popular services used in the public cloud are Amazon Web Services, Microsoft Azure HDInsight, and Google Cloud Dataproc.
To learn more about AI and how it’s transforming how business processes are carried out in the digital era, read the “Infrastructure Considerations for AI Data Pipelines” report from IDC. To learn about NetApp® AI solutions, visit www.netapp.com/ai.
Mike McNamara is a senior product and solution marketing leader at NetApp with over 25 years of data management and cloud storage marketing experience. Before joining NetApp over ten years ago, Mike worked at Adaptec, Dell EMC, and HPE. Mike was a key team leader driving the launch of a first-party cloud storage offering and the industry’s first cloud-connected AI/ML solution (NetApp), unified scale-out and hybrid cloud storage system and software (NetApp), iSCSI and SAS storage system and software (Adaptec), and Fibre Channel storage system (EMC CLARiiON).
In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent event speaker. Mike also published a book through FriesenPress titled "Scale-Out Storage - The Next Frontier in Enterprise Data Management" and was listed as a top 50 B2B product marketer to watch by Kapos.