Empowering AI for science with intelligent data infrastructure

Contact Sales
Welcome!

An account will enable you to access:
- NetApp support's essential features
- NetApp communities
- NetApp training
- Sign in to my dashboard
- Don't have an account?
  Create an account
- BlueXP is now NetApp Console
  
  Monitor and run hybrid cloud data services
  NetApp Console
NetApp account
Language
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 日本語
- 한국어
- 简体中文
- 繁體中文
See your global contacts
Learn
Browse

SCA/HPCA Asia 2026 banner with event details, cherry blossoms, skyline, and Mt. Fuji.

Contents

Share this page

Masahiro Waki

March 26, 2026

In January 2026, at SCA HPC ASIA 2026 held in Osaka, NetApp exhibited under the banner “AI Beyond Limits — The Data Platform AI Has Been Waiting For.” At the Exhibitor Forum, I presented on the theme “Empowering AI for Science with intelligent data infrastructure,” demonstrating the need to shift HPC design philosophy from compute‑centric to data‑centric (AI for Science) and the value of intelligent data infrastructure that supports this transition.

Why AI for Science now: A turning point from compute‑centric to data‑centric

In the pre‑AI era, HPC systems were designed to extract maximum compute performance and minimize data movement. This approach has produced many achievements and remains an important foundation for scientific research today; however, in an era where AI runs as the core engine of research on a daily basis—AI for Science—the premises change.

The bottleneck in research is not peak FLOPS, but the ability to quickly find the necessary data, contextualize and prepare it, share it safely while preserving authorization and lineage, and deliver it to GPUs without delay—in other words, the flow of data itself. The workflow is shifting from traditional one‑shot (batch) execution to a continuous, iterative cycle in which simulation, data, and AI keep turning, and HPC is operated not as a standalone machine, but as a service securely and seamlessly integrated across on‑premises environments, external institutions, and public clouds.

The real bottleneck is not compute but data. GPUs continue to advance in performance and density and have become more readily available than before. Even so, what frequently happens in the field is the opposite: ‘GPUs waiting for data.’ What is jammed is data movement, preparation, management, and reuse. In fact, it is predicted that by 2026, 60% of AI projects will be abandoned due to a lack of AI‑ready data. To boost outcomes in research that presupposes AI, it is necessary to place at the center of design the requirement that data ‘keep flowing’ without stagnation.

Four conditions that AI for Science requires of data

To truly make AI effective in science, data must satisfy the following conditions:

Easily discoverable (Discoverable)
Trustworthy (Governed & Trusted)
Reusable (Reusable)
In a form suitable for AI processing (AI‑Ready)

In addition, it must support sustained operations that presuppose long‑term research, cross‑institutional collaboration, and hybrid/multi‑cloud. This is not a challenge that can be solved with short‑term symptomatic treatments, but a requirement that questions the very philosophy of the infrastructure.

Intelligent data infrastructure: The Data Platform Required in the AI Era

NetApp’s intelligent data infrastructure is not just for storing data but for ‘understand and handle’ it. It brings together data distributed across laboratories, HPC, and major clouds, providing a foundation that continuously and efficiently ‘feeds’ data to AI and HPC. IDI consists of the following three pillars:

Any Data, Any Place: Integrate data wherever it lives to eliminate silos. Access and share with consistent practices while maintaining permissions and lineage

Active Data Management: Build security, governance, and compliance into the mechanism to ensure accountability and assurance

Adaptive Operations: Autonomously follow load and environmental changes to continuously optimize the balance of performance, efficiency, and cost

And to realize and further strengthen this intelligent data infrastructure, NetApp provides the following two latest solutions:

NetApp AFX: High‑Performance, Ultra Scalable, Disaggregated AI Storage

To run AI‑driven science at production scale, storage must reliably keep pace with GPUs. NetApp offers NetApp AFX. With its High-Performance, ultra acalable, disaggregated architecture, it fundamentally eliminates AI/HPC I/O bottlenecks. Its key features include:

Scales up to 128 nodes with up to 4 TB/s parallel throughput
Supports over 1 EB of capacity, with independent scaling of performance and capacity
Builtin Real-time Autonomous Ransomware Protection (over 99% accuracy), ensuring recovery even in worstcase scenarios
Secure multitenancy and seamless data integration across onpremises and cloud environments
NVIDIA DGX SuperPOD certified; built on ONTAP, providing unified file/object access to simplify mixed AI–HPC operations

NetApp AFX: High-performance, scalable AI storage with 4TB/sec throughput, over 1EB capacity, and 99%+ ransomware detection accuracy. NVIDIA DGX SuperPOD certified.

NetApp AI Data Engine (AIDE): End‑to‑end storage‑integrated AI data services

No matter how powerful the GPUs and storage are, if the data is not prepared in an AI‑ready state, AI will not function well. NetApp AI Data Engine is the world’s first to equip storage with GPUs, integrating data detection, curation, policy‑based guardrails, and real‑time vectorization for generative AI, thereby enabling seamless AI workflow operation and scaling by delivering AI‑ready data. While unifying data assets and keeping them always up to date, it enables high‑speed data access, more efficient data transformation, and trustworthy governance. The main functions of AIDE are:

Metadata Engine: Organize lineage, quality, and preprocessing as machine‑readable metadata to instantly present ‘what exists where and which can be used now’.
Data Sync: Detect changes and synchronize deltas to keep everything current, eliminating the waste of ‘stale training’.
Data Guardrails: Automatically detect sensitive information and apply policies to ensure least privilege, compliance, and accountability.
Data Curator: Integrate cross‑cutting exploration, search, vectorization, and retrieval to supply data in a form easy for AI to use as‑is.

As a result, it greatly reduces the time required for data preparation and raises project throughput. Even in long‑term, collaborative, hybrid environments, it enables operations centered on always‑fresh, highly trustworthy AI‑ready data.

NetApp AI Data Engine: Simplifies and secures AI data pipelines with metadata, data sync, governance, and curation for seamless AI app integration.

Implications for research: Data continuity becomes a source of competitiveness

Long‑term, collaborative, hybrid‑cloud will become the ‘new normal.’ In Japan and across Asia, the convergence of HPC and AI is steadily progressing. To meet the complex real‑world requirements of long‑term studies, joint research, and hybrid/multi‑cloud, trustworthy data, reusable data, consistent data, and governed data are indispensable. With IDI, AFX, and AI Data Engine, NetApp simultaneously achieves integration, protection, optimization, and utilization of data, supporting sustained value creation in AI for Science.

Examples of Scientific Research Using AI for Science

Genomic Analysis: Japanese research institutions analyze large‑scale genomic data to elucidate gene functions and causes of disease. This requires efficient management of massive data and high‑speed analysis. By leveraging AI to centrally manage these datasets and provide high‑speed data processing capabilities, we support advances in genomic analysis.
Materials Research: Research is advancing that uses AI to predict new materials’ properties and perform optimal materials design. This requires integrating experimental and simulation data and performing advanced analyses. AI integrates heterogeneous data sources to realize efficient data management and high‑speed analysis.
Climate Change Analysis: Research is conducted using climate models to predict the impact of global warming. This requires integrating observational and simulation data to achieve high‑accuracy predictions. By centrally managing large volumes of data and providing high‑speed processing capabilities, AI supports progress in climate change analysis.

From compute‑centric to sata‑xentric: What AI for Science demands is data ‘continuity

In the compute‑centric era, HPC was evaluated by performance; in the data‑centric era of AI for Science, what is demanded is the ‘continuity’ of data. Outcomes are not determined merely by the size of compute resources or the existence of data. Data exerts power as a research foundation only when it is organized, properly updated, protected, and kept in an AI‑ready state that can be reused without interruption.

The NetApp intelligent data infrastructure sustains that ‘flow,’ AFX guarantees unstoppable high‑speed supply, and NetApp AI Data Engine continuously provides well‑prepared AI‑ready data through Find/Sync/Govern/Curate. Researchers are freed from the complexity of data management and can concentrate on true scientific discovery.

NetApp Session Video Link

Take a look at the video of the NetApp session presented by the author at the SCA HPC ASIA 2026 Exhibitor Forum.

AI for Science begins with data. And data begins with NetApp.

Notes

Gartner prediction (published February 26, 2025): It is predicted that by 2026, 60% of AI projects not supported by AI‑ready data will be abandoned. Source: Gartner Newsroom, “Lack of AI‑Ready Data Puts AI Projects at Risk.”
AFX throughput (TB/s class) and maximum scale of 128 nodes are maximum values dependent on reference configurations. Details of DGX SuperPOD certification and disaggregated architecture conform to product documentation.
The ‘up to 99% accuracy’ of AI‑driven ransomware detection (ARP/AI) at the storage layer is based on measurements targeting file workloads. Detection rates in actual operation depend on workload characteristics and settings.
AI Data Engine works with the NVIDIA AI Data Platform (NVIDIA AI Enterprise / NIM microservices) reference design to enhance semantic search, vectorization, and policy‑driven operations.

Masahiro Waki

Masahiro Waki is responsible for strategic business development related to AI and DX in NetApp Japan. He is engaged in industry activities and alliance activities with technology partners such as academic and research institutions, HPC, IOWN, life sciences, and M&E. He has 15 years of experience working overseas in the U.K., France, India, and the U.S. during his previous position at Sony. He has promoted business development in storage, data infrastructure, broadcast media, and others. Attracted by NetApp's "Data Fabric" concept, he moved to his current position in 2021. He is a member of the board of directors of the Association of Motion Picture and Television Engineers of Japan, Vice Chairman of the RDM and Cloud Subcommittee of AXIES (Association for the Promotion of ICT in Universities). He is also a member of LLM-JP, LINK-J and the Society for Digital Archiving. 

View all Posts by Masahiro Waki

Next Steps

Blogs

Brush up on the latest trends and developments in cloud, on premises, and everywhere in between. This is where it all gets real, with a cherry on top.

Get to reading

Community

Explore a wide range of open forums where you can post questions, share answers and just generally get smart on all the NetApp technologies that matter most to you.

Join the discussion

Share this page

Masahiro Waki

Why AI for Science now: A turning point from compute‑centric to data‑centric

Intelligent data infrastructure: The Data Platform Required in the AI Era

NetApp AFX: High‑Performance, Ultra Scalable, Disaggregated AI Storage

NetApp AI Data Engine (AIDE): End‑to‑end storage‑integrated AI data services ​

Implications for research: Data continuity becomes a source of competitiveness

Masahiro Waki

Next Steps

Blogs

Community

NetApp AI Data Engine (AIDE): End‑to‑end storage‑integrated AI data services