ONTAP is ready for streaming applications

Contents

Share this page

Arindam Banerjee

July 28, 2022

417 views

Stream data processing is an integral part of big data pipelines and artificial intelligence and machine learning workflows. The industry is developing an increasing number of applications that try to simplify data pipeline workflows connecting edge and IoT devices to data centers and/or cloud for data transformation and insights. NetApp^® ONTAP^® and ONTAP based products are trusted to bring scale, efficiency, and simplicity to stream data processing. ONTAP enables offloading of compute-intensive storage operations from the compute server, freeing up the server resources needed to optimize streaming broker activity.

NFS for streaming

Although NFS is ubiquitous for file workloads across data centers and cloud, it is not the top choice for data storage for today’s streaming applications like Kafka. These applications are often deployed on direct-attached Storage leveraging UNIX or Linux based filesystems (POSIX compliant) because many believe that network attached storage results in high latency due to network bottlenecks. However, given that Ethernet bandwidths have increased significantly over the last few years (we now talk of 400Gb and 800Gb Ethernet) and technologies like NFS over RDMA have been adopted, the myth about the network bottleneck can be debunked.

Another common silly rename issue crashes the application when resizing or repartitioning the Kafka cluster running on NFS. We refused to let that happen and challenged the status quo with our innovation. NetApp has made a significant contribution for applications like Kafka to leverage network-attached storage. We leveraged the “delete on last close” feature in the NFS4.x spec to fix the long-standing silly rename issue. NetApp engineers implemented the changes to both the NFS server side (ONTAP) and the Linux NFS client side (open source) and contributed the changes upstream.

You might be thinking, how does silly rename affect me as a Kafka user?

We’ve got you covered. Although normal Kafka operations on NFS work fine even without the silly rename fix, the problem may surface when the Kafka cluster needs to be resized or repartitioned for load balancing or maintenance purposes. As existing topics get repartitioned across new brokers, Kafka creates a new copy of the partition on the destination broker and then deletes the redundant partition on the existing broker. Without the silly rename fix, this “delete” operation could result in a crash with NFS. In a typical Kafka workflow, the delete (unlink) operation can be performed even when the application (in this case Kafka) has open references to the file handle. A Unix (POSIX) filesystem allows the unlink to proceed only after all the references to the file are removed. However, NFS behaves in a different way. Because the NFS spec does not allow unlinking an open file, the NFS client orchestrates the workflow by intercepting the unlink command on an open file and renaming it to a special name. Then, on the last close of the open file, it removes the file. This rename is called the silly rename. Directories with such a renamed file cannot be deleted. This affects the Kafka repartitioning workflow.

With the “delete on last close” feature in NFSv4.x, the server is allowed to manage the unlink workflow and is able to orchestrate the operations in the right order without affecting the application. The protocol requires the NFS client and the server to agree on the capability to handle the unlink workflow, and therefore changes are required on both the NFS client and server side. NetApp engineers implemented the changes on the NFS server side (ONTAP) as well as the Linux NFS client side and contributed the changes upstream. The changes will be generally available in RHEL 8.7 and RHEL 9.1.

Now you’re wondering, what do I get out of NFS and ONTAP?

The ability to leverage NFS as storage for Kafka allows Kafka users to use the robust ONTAP ecosystem for data storage, resilience, and rich data management features for data retention and compliance. Offloading the storage operations to the storage system frees up the Kafka compute server to process compute-intensive broker functions. Because the data is persisted over the network (off the server), this could also reduce rebuild times in case of a broker relaunch. With ONTAP providing high bandwidth NFS storage, Kafka applications can now go faster. Because storage and rebuild operations are now offloaded to the storage system, Kafka brokers can be deployed with less compute, enabling Kafka deployments to be cheaper.

ONTAP is the first NFS server that implements the silly rename fix. The ONTAP rich data services can optimize and reduce the compute overheads of voluminous data transfers, making streaming and analytics workflows faster and cheaper.

Learn more

Find out more about ONTAP capabilities and look for future updates.

Arindam Banerjee

Arindam is NetApp’s first Technical Fellow. He is also the Chief Architect, VP of NetApp Platforms and leads the technology vision, strategy and architecture for NetApp. He is currently spearheading the architecture and design for next generation of AI infrastructure and AI data platforms. Arindam has more than 25 years of experience in distributed storage infrastructure and data platforms. He has been in NetApp for 19 years and has championed many innovations in the areas of filesystems, distributed storage, and AI. Arindam has authored/co-authored more than 50 patents and patent publications that have received over 500 citations for reference in the field of computer data systems and technology.

View all Posts by Arindam Banerjee

Next Steps

Blogs

Brush up on the latest trends and developments in cloud, on premises, and everywhere in between. This is where it all gets real, with a cherry on top.

Get to reading

Community

Explore a wide range of open forums where you can post questions, share answers and just generally get smart on all the NetApp technologies that matter most to you.

Join the discussion