Sign in to my dashboard Create an account
Menu

Extreme S3 Performance with Confluent Tiered Storage + ONTAP

Car in tunnel
Contents

Share this page

Joe Scott
Joe Scott
884 views

This week we’re showcasing Confluent Tiered Storage integration with ONTAP. What do we mean by “extreme” performance? Take a look.

S3 retrieval historical query performance

Read on to see how we achieved over 31GBps with Confluent Kafka, Tiered Storage, and NetApp® ONTAP® S3 Object Server.

S3 and performance

The Amazon Simple Storage Service (Amazon S3) protocol is ubiquitous. S3 is the de facto standard for object storage access across the cloud and on-premises environments.

For me, as a performance nerd, S3 has rarely entered my lexicon. I spend my time engineering environments where microseconds are cherished, and milliseconds usually mean that I goofed. Wrapping my head around S3 past the basics often involved nonperformance considerations, trivial stuff like TCO and durability.

Then I was shown an interesting graphic.

S3 retrieval historical query performance

https://www.netapp.com/blog/kafka-fall-rise-networked-storage/

Confluent Tiered Storage

In streaming data environments, such as Confluent Kafka, where data is being produced at exponentially increasing rates, tiering seemed like a no-brainer. As a storage nerd, I hear “tiering” and think “cheap + deep.” You get lots of capacity to meet retention targets or for background analytics processing. And it’s a perfect fit for cheap + deep S3 targets with minimal performance capability, “good enough” for extending the capacity of expensive-to-manage direct-attached storage (DAS) environments.

As I learned more and came to understand Kafka architecture in general—and the specifics of Confluent Kafka’s capabilities—I came to see the light. As the rate of data ingestion increases and the number of consumer applications expands, Kafka makes it easy to seamlessly add brokers and to increase cluster performance capacity.

But what happens to the headache (and cost) of continually adding compute, along with adding more hot-tier DAS, to scale up capacity and performance? Wouldn’t it be nice to separate (or disaggregate) scaling compute from scaling storage? Without introducing more complexity into the system that’s being managed?

Enter NetApp® ONTAP® S3 Object Server and Confluent Tiered Storage . You can reduce or eliminate the overhead of DAS capacity and still achieve extreme performance in your Apache Kafka clusters. I know, I’m late to the party: Confluent introduced Tiered Storage in 6.0 https://www.confluent.io/blog/infinite-kafka-storage-in-confluent-platform/. But I do bring gifts with me to the party: extreme performance, powered by ONTAP, that simplifies management and enhances performance of your Kafka environments.

Extreme performance, really?

NetApp AFF all-flash storage is fast and low latency. The S3 protocol is boring to me, the perf nerd, because it’s “slow.” But is it slow?

To find out, we took the Confluent Tiered Object Store Compatibility Checker (TOCC) into the lab with an AFF A900 system with ONTAP 9.11 RC1. Spoiler alert: S3 is not always slow.

The AFF A900 is the fastest Tiered Storage target ever tested with the TOCC. A single high-availability (HA) pair handled over 31GBps throughput during a producer-consumer workload measurement.

S3 retrieval performance scaling

Why did we scale the Kafka broker count from five to eight? Because we had to add more compute to our Confluent Kafka cluster to tax the AFF A900 system. Disaggregated scaling in action!

Starting with five Kafka broker nodes, we didn’t have the horsepower to push the AFF A900 system. We saturated resources within the Java Virtual Machine (JVM) with five Kafka broker nodes. It turned into a very real case study because of how easy it was to scale our Kafka compute independent of our storage. We ultimately scaled to eight Kafka brokers to saturate the performance envelope of the AFF A900 with the TOCC producer-consumer workload.

How do the AFF A900 results stack up to the competition? Pretty well, as you can see in the following bar chart.

S3 retrieval historical query performance

Simplified lifecycle

By using Tiered Storage with ONTAP S3, you can simplify and accelerate other lifecycle processes within a Kafka cluster. For example, rebalance operations, whether through node failure or shifting partition placement, no longer rely on migration of entire copies of data across brokers. Instead, metadata references to partition segment placement on Tiered Storage are replicated.

Scale-up, scale-out

ONTAP powered AFF is ready to scale. AFF A-Series systems—from the AFF A250 up to the AFF A900—are built for speed. At NetApp, we showcase our performance because we’re proud of it. We publish benchmarks (like this TOCC result) that highlight performance across our portfolio, such as our scale-out NAS capabilities, and our ONTAP S3 server is built on the same foundation. You get over 31GBps as a building block that is ready to scale out to 12 HA-pairs in a single cluster. All can be harnessed using a single S3 bucket.

Make your S3 performance extreme

The decision to use NetApp AFF as the Tiered Storage target in your performance-critical Confluent Kafka environments couldn’t be easier. AFF systems are built for speed. You can start small, scale your storage independent of compute, scale up or scale out, and reduce the heartburn of managing your data—the lifeblood of your business. All with NetApp® ONTAP® as your single storage platform; one that you can take to the cloud or consume as a service.

Wherever your data lives, NetApp is ready to make managing your data easier than ever. To get started, check out our latest best-practices for Confluent Kafka and the Confluent + Netapp partner page.

Joe Scott

Joe Scott is a Senior Workload Performance Engineer in the ONTAP Performance Engineering Group. He is living and breathing application performance with ONTAP across on-prem and clouds.

View all Posts by Joe Scott

Next Steps

Drift chat loading