We live in the age of big data, and big data keeps getting bigger. You've seen the headlines: "Data Explosion!", "Data Tsunami!", "500 hours of video uploaded to YouTube every minute!" Petabytes of data used to feel like a lot, but these days that seems quaint. We now talk about exabytes, zettabytes, and yottabytes of data. Soon, we'll need more prefixes in the International System of Units. Healthcare is no exception to this phenomenon, especially given the popularity of wearables such as smart watches, fitness bands, and other sensor-studded devices that help us make better, data-driven decisions about our own health. At NetApp we love data – the more, the better. We are a global leader in data storage and management, delighting our customers regardless of where they want their data to reside – in the cloud, or a hybrid of both.
Amassing colossal amounts of data may be interesting and potentially useful, but that’s only the beginning. Getting insights from that data is the key to better patient care, to better business decisions, and to fulfilling the Quadruple Aim. And getting to those insights is predicated on having the right tools and skills. Back in the mid-2000s, when the existing business intelligence tools proved insufficient to analyze the growing amounts of data that were becoming available, Hadoop was created, merging Google's recently released MapReduce with a distributed file system. This breakthrough allowed the creation of clusters of CPU-based nodes with internal storage that could analyze big data much faster than had been possible before.
As big data kept getting bigger, more computing power was needed to analyze it. Unfortunately, Moore's Law, postulated by Gordon Moore in 1965, and the reality of CPU development began diverging in the first decade of the 21st century due to fundamental physical limitations and nanoscale problems such as quantum tunneling. Relying on clusters of CPU-based compute nodes was no longer the best solution. That approach could mean that hundreds, or even thousands, of nodes could be required in order to have enough computing power to analyze the oceans of available data and get actionable insights from it in a reasonable time. Fortunately, GPU development picked up where CPUs trailed off. GPU-accelerated compute has in a sense extended Moore's Law and even sped up the progress of data processing power. That’s true at least in terms of tasks that benefit from highly parallel processing such as data analysis at scale and artificial intelligence (AI) methods such as deep learning.
Since their appearance on the technology scene in the 1990s, GPUs have made astonishing gains in performance. That has led some analysts and industry experts to call for a change in focus, from looking at the raw number of transistors in a chip to considering computing performance overall. Making that shift gives us an interesting perspective. The number of transistors in each chip is no longer the most useful proxy for how much computing power we have at our disposal. We can keep the spotlight on the business and clinical problems we want to solve and then build systems that can produce solutions on a timely basis. Petaflops (quadrillions of floating-point operations per second) are now available in standard rack-mounted servers like the NVIDIA DGX A100, which has the same amount of computing capacity as the world's largest supercomputer did in the year 2000.
Big data analytics and AI model training both benefit from GPU-accelerated computing and visibility to as much good data as possible. Healthcare, however, suffers from a proliferation of application silos that keep data segregated. Breaking down those barriers is long overdue. Eliminating silos and bringing data together into a unified environment where it can be used for daily operations, data analysis, and AI can simplify the overall IT architecture and reduce costs. And it can also make that data more useful to clinicians, patients, administrators, and payors. Running big data analytics from a decoupled compute and storage environment allows big data environments to evolve beyond the Hadoop distributed file system (HDFS), which is an additional benefit for IT, analytics teams, and their internal customers.
Choosing the right infrastructure is key to benefiting from the convergence of big data and AI to unleash the insights that your data holds. Whether it's on premises or in the cloud, you need a solution with GPU compute, fast networking, and all-flash storage that can serve block, file, and object protocols. That, however, is just the beginning. You also need that solution to be able to extend to both public and private clouds seamlessly. It should also be surrounded with a rich software ecosystem that includes API integrations between programming environments and the data-management layer and built-in protection for your data from even the most determined criminals. We offer all of this and more, blending our foundational technologies with our many years of cloud and data science development.
To learn more about our converged big data and AI solutions, visit the NetApp® ONTAP® AI site and sign up to get the latest updates on AI infrastructure and data management.
Esteban joined NetApp to build a Healthcare AI practice leveraging our full portfolio to help create ML-based solutions that improve patient care, and reduce provider burnout. Esteban has been in the Healthcare IT industry for 15 years, having gone from a being storage geek at various startups to spending 12 years as a healthcare-storage geek at FUJIFILM Medical Systems. He's a visible participant in the AI-in-Healthcare conversation, speaking and writing at length on the subject. He is particularly interested in the translation of Machine Learning research into clinical practice, and the integration of AI tools into existing workflows. He is a competitive powerlifter in the USAPL federation so he will try to sneak early-morning training in wherever he's traveling.