To understand the development of data analytics, let’s use the analogy of automotive transmission development to improve control when changing car speed. We started out using manual transmission and progressed to semi-automatic for a smooth transition from manual to automatic transmission. Now most cars only have an automatic transmission. Similarly in data analytics, we started with a relational database, progressed to distributed analysis with disk operations, and then introduced in-memory for prediction and data analysis.
Apache Spark is a programming framework for writing Hadoop applications that work directly with the Hadoop Distributed File System (HDFS) and other file systems, such as NFS and object storage. Apache Spark is a fast analytics engine designed for large-scale data processing that functions best in our NetApp® data analytics playground. It is more efficient than MapReduce for data pipelines and interactive algorithms. Apache Spark also mitigates the I/O operational challenges you might experience with Hadoop.
The NetApp modern data analytics playground[/caption] Before you decide to use Apache Spark workload with NetApp storage to overcome your large-scale data processing challenges, you might need answers to questions such as:
Karthikeyan Nagalingam is a Principal Technical Marketing Engineer at NetApp for NetApp XCP, Fpolicy, Filesystem Analytics and Antivirus. His previous roles in Emerging Technology Solutions involved in Pre-Sales and Post-Sales technical activities with fields, partners and customers. He holds an Master of Science in Software Systems from Birla Institute of Technology and Science.