BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Thank you for joining us today for this webinar, Unlocking the Power of Apache Kafka Scaling with a Managed Service. This topic is very timely. The need to support more streaming and real time data continues to grow. Many companies find that their current Kafka implementations must be greatly scaled to support growing volumes of data and new use cases. Unfortunately, most find they lack in-house resources to scale, secure, and manage Kafka. That's where managed services can help. Joining us today is Andrew Mills, senior solutions architect at NetApp. Instaclustr. Andrew will talk about the challenges of scaling Kafka and the role managed services can play in helping organizations do just that. Andrew, welcome over to you. Hey, Jesse, thank you so much. As you mentioned, I'm Andrew Mills, senior solution architect here with NetApp Instaclustr. I've been with NetApp just about three years now. Prior to that, I spent about 15 years as a practitioner, specifically around big data pipelines and Apache Kafka. So I have quite a bit of experience around Kafka, both running it operationally, as well as building applications in the ecosystem around Apache Kafka. So today, as Jesse mentioned, we're going to talk about specifically about scaling with Apache Kafka and how a managed service can help. So on the agenda, we're going to quickly hit Kafka Basics a little bit about what it is and why you'd use it. I'm going to breeze through that pretty quickly, because there are about a thousand Kafka 101 videos out there, um, that you can Google. Um, Tim Berglund has an excellent one out on YouTube. So I would definitely recommend hitting that if you're new to Kafka. Uh, but we'll go through that just to, just to level set. We'll discuss what it looks like from boots on the ground for scaling Apache Kafka, and then how we help from a managed service perspective. Uh, and then lastly, we'll open it up for a little Q&A, uh, for the team to,pop some questions in from any of the audience members that are out there. So in short, Apache Kafka, uh, is used by more than 80% of the fortune 100 companies across every vertical, every industry, um, and innumerable amounts of use cases. So you have IoT data pipelines, event driven architecture. Um, we have credit card companies using fraud detection, uh, log aggregation, any sort of real time, uh, analytics as well as now AI workflows, uh, helping to power the data that the AI workflows can bring at the highest level. And I've grabbed these directly from Kafka website, as you can see on the bottom right there, uh, essentially Kafka provides you a high throughput, scalable, permanent storage. Highly available message queue message bus. Uh, it's actually not a queue, but it is a message. Bus queues are coming up as part of the Kafka ecosystem. Uh, it has built in stream processing. In addition, uh, we can collect just about anything with Apache Kafka. It doesn't really care what you're shoving in it, as long as it's not super big megabyte around there is probably when you want to stay under. It has a ton of client libraries and a very large ecosystem of open source tools. Many find Kafka to be mission critical. It is literally the lifeblood of the data in the organization and powers many applications that they build. Uh, trusted by thousands of organizations, a massive user community, and lots of resources online. So before we get into scaling, it's really important that you understand a bit about how Kafka is architected. And again, I'm not getting super deep into the nuance. We're going to stay pretty high level here. Um, but essentially, uh, every Kafka cluster should be deployed with at least three brokers. Um, and so when you're using, let's say, AWS, which is how I've set this example up here, uh, you're going to want to deploy typically within a single region. So you'll see the top left is x one. Uh, and then inside of that region you want to spread yourKafka brokers across availability zones. And that helps with the fault tolerance and high availability, um, thatKafka has natively. And so you can see I've got three. In this case the green boxes are brokers that are deployed across the availability zones broker one, broker two, broker three. And in each case they have four vCPUs, 32 gigs of Ram and two terabytes of attached storage. So Kafka itself is built around this concept of topics and partitions, and it's loosely akin to like tables in a database. So what you'll do is you'll deploy a topic, and then you're going to set what's called the replication factor. So the example that I have on the screen is a single topic with six partitions and a replication factor of three. And what that looks like here is partition one. The blue represents ownership. So in that case broker one owns partition one and partition for broker twoand five. Broker threeand six. As you can see the yellow is what's called the replica or the in-sync replica hopefully. And so essentially that means that the data is replicated. So high level the way that it works, you're going to have a data producer that is sending data to Kafka. And um, in this particular case, I picked 5000 records per second. In hindsight, maybe 6000 would have been easier to do the math on. Um, but essentially, uh, that producer is going to be sending records per second at some clip, uh, to the Kafka, uh, cluster, and it's going to send the data to the partition owners. Right. So of these 5000 records per second, they'll be divided probably evenly. There'sagain, there's a lot of nuance there among the blues. And then once they land in the blues, then they will be replicated out to the yellows, which means that you're going to have data, uh, shipped around all of these three availability zones within a single region. And that is how Kafka is built. And it's designed to handle that, uh, across the availability zones. And,that is,the fault tolerance and high availability of Kafka. From here, you can consume those events. Uh, and it doesn't really have a huge impact on the Kafka brokers themselves from a consumption perspective. And that is really the one of the biggest things about Kafka, where, uh, when you start consuming, yes, it's going to tax the CPU a little bit. Um, and some of the IOPs, depending on, uh, how old the data is. But from the consumption perspective, it's not like you're querying a database table. Right. And that isa big mind shift from relational or even NoSQL tables, uh, in that you can consume with a relatively low cost. From the resources perspective on the server, it is going to be data transfer and that type of stuff. Uh, every technology has its trade off. So at a high level, this is the architecture. Uh, and then producing and consuming the data. So inevitably, you know, once you use once you have Kafka in your system, you're going to want to use it more and more. Um, you know, you start off with kind of an MVP or a small, uh, small Kafka cluster, and it really starts to balloon because it can do a lot. And it's really that, like I said before, it's that lifeblood for many organizations. And so you may end up having some additional producers, uh, sending data to Kafka. What you're going to notice generally first is you start running out of storage in that, in that cluster. Um, but then you're also going to have additional applications that are consuming from it, which is what I try to represent here. Uh, in,a concept called a consumer group. And in that group it's going to have three consumers, but you have a large influx in data being produced. And then additionally a large fan out for data being consumed. And so if you start off with a simple three node broker or a three node cluster, um, This can very quickly become overwhelmed. And so, uh, while up until the right is generally a good thing when we're talking about resource utilization for like a Kafka cluster, it's not always great. So, um, what is the scaling need. So there's,generally three large boxes of,where you might need to see scaling first, uh, is CPU and next is Ram. I'm going to talk about those together because they have a pretty close relationship. So as you can see here, some of the high level, um, is that you can have disk IO bottlenecks and that might start affecting your CPU first, and then it'll start backing up into your Ram again. Overloaded brokers way too many partitions on a given broker can have an effect on CPU and Ram and then data velocity, right. So if you have a ton of small events, you know, you talk a million records a second for some of these workloads that we work with. Um, that's where your CPU and Ram can really get taxed. Now, if it's configured correctly, uh, through your JVM, Ram may not see issues. Um, and depending on whether or not you're using SSL or TLS, that can also affect your CPU utilization and the number of producers that you have. Generally speaking, CPU and Ram are never an urgent thing that needs help scaling with Kafka. It's almost always the disk. And so, um, when we go back to thearchitecture here, the each one of these brokers is actually going to have, um, disk attached to it, whether that's an SSD or an EBS volume. Um, it's, there is storage attached to each. And all of the cloud providers are going to have certain limitations on which how much storage you can attach to a given us a given size of virtual machine. So they're not going to let you attach 16TB of storage to a single core and two gigs of Ram. So you're going to have limitations to that. And that's a good thing. Um, it definitely helps to prevent, um, helps to prevent you from, uh, getting into a bad situation in the event that you lose a Kafka broker. So, uh, when it comes to disk, um, there's three kind of main components that you see for why that becomes overwhelmed. First, you just need to keep the data longer. You know, maybe you have a log aggregation solution. You are keeping seven days. And the business came to you and said, hey, we need to keep 30 days of history here. And that wasn't something that you had planned for initially. Or maybe you have, um, you know, a lot more applications adopting Kafka, which is increasing the message size or the rate forthe broker or the for the cluster as a whole. Um, those things are the primary things that will affect the ability for your disks to keep up with the amount of data that you're pushing into it. So, um, one important consideration when you're looking at Kafka and putting it into production use case is you really need to sit back and consider, how do I design this infrastructure for scale? And Kafka is super resilient, and it's flexible in its ability to scale in certain ways. And so there are two kind of primary ways that you can scale. First it's vertical right. So um, whenever you would just add a little bit more CPU, Ram or disk to a machine, that'spretty easy to do with Kafka.is designed to have brokers offline or a broker offline. And so if you, uh, if you add more CPU and Ram and then need to reboot your broker, it's not going to cause an issue at all for Kafka. So vertical scaling, if you're going to scale, is generally the way that you want to do it, especially in an emergency situation. You know, let's say that all of a sudden, uh, the shark tank comes to mind, right? You hear these companies that talk about, you know, they did a pitch on Shark Tank and it aired, and it crashed their site. Right. So you have a situation like that where you have some sort of, like, large influx of events that maybe you didn't expect, and all of the sudden you're seeing this massive ramp. The best option for Kafka is a vertical scale. So you want to be positioned so that you can handle vertical scaling, uh, and that you're not going to get into some of the issues that I mentioned earlier with regard to, you know, uh, being limited on the amount of disk that you can add to a given VM type or, uh, if you're already at the top and you have the largest potential, uh, VM available for a given cloud provider, that can be a scary situation because now you have to add brokers, right? So when you add brokers, that's when we get into what's called horizontal scaling. So that's essentially adding more machines to a Kafka cluster. And typically those will be with the same exact configuration as the previous one. Uh, you definitely want your Kafka versions to be the same. And we definitely recommend that you have the same, uh, hardware characteristics as well. Um, and so in general, the vertical scaling should be used for an emergency situation or if you're, you know, if you have a really big ebb and a flow throughout the day where maybe most of your traffic happens in eight hours and you need to scale up and then you scale back down. That should be a vertical scale, not a horizontal scale. And we'll get into a little bit about why that is here in just a minute. So there's definitely some tools out there like cruise control that can help you scale automatically. That would just be careful with those. Not to say they're not great and they can't be used. There's a lot of considerations when you talk scaling, especially horizontally, uh, or when you talk redistribution of your partitions across your cluster, uh, that can put you in a bad spot specifically around network saturation, uh, moving too much data back and forth. So you just really want to be careful, uh, when using tools like that,promise fully automated scaling, especially when we talk horizontal scaling. So can you avoid scaling altogether? Absolutely. Um, but it depends. So, um, like I mentioned earlier, most of the use cases that we see are disk related. And so in a disk related scenario, you want to keep your disk utilization usually under 70% of capacity. And so depending on the nuance for the issue that you're seeing, uh, you may be able to get away, get away with just making a modification to a setting or doing some slight, uh, partition reassignment and redistribution. And so, uh, you know, if you think about the architecture that we had just a minute ago, partitions are spread out across the entire cluster. If you have ownership on one broker of partitions that are really active, they're sending tons of data or large data, and that particular broker is really hot running high on Ram or CPU or it's filling up the disk quickly. There may be an opportunity there to redistribute where those partitions are owned. So you can maybe move one of your hottest ones to a broker that isn'tas active. And so you can use partition redistribution in that type of event. Again we're talking an emergency situation here where you're at risk of potentially losing a broker and degrading your service. Another option could be increasing the size of the volume that's attached to your cluster. Um, so back to the conversation on. Most of the providers are going to limit the amount of storage that you can attach to a singlenode. Um, and so you want to make sure that you understand the overhead that you have available to you. That way, in the event you're in an emergency situation, you could expand the size of the amount of storage that you have, uh, for that particular node type. You may also have the ability to reduce the retention time or bytes that you're keeping on a per topic basis that's highly customized or that's highly dependent on the actual solution that you're serving. Uh, because if you can, let's say that you're keeping four days worth of data. And the intention behind that is, if something happens on a Friday, I want to make sure that when I come in Monday morning that I can replay it right. That I can go back and make sure thatI still have my data sitting there. Well, it's Tuesday and I'm filling up my disk all of a sudden. Well, you might be okay to just reduce that retention time to a day, to pull some pressure off that disk until you can understand what exactly is causing the problem and maybe take a more permanent approach to it. Whether that happens inside of the application itself or within Kafka through other configurations. So retention time reduction is definitely an option. And then finally, I mean, I say finally in the list I have here, which is not exhaustive, um, tiered storage, uh, it's relatively new to Apache Kafka, and it is a fantastic way to reduce the amount of storage that you need, uh, locally on the disk. It is not an emergency type situation, though. Tiered storage is a solution that you need to sit down and think about. Uh, you know, what is the impact in latency with tiered storage? Um, and it doesn't make sense for this topic. Compacted topics are not compatible with tiered storage. So you just want to make sure that the use case is going to work for tiered storage. So butit can absolutely help you avoid needing to scale your cluster. Uh, if disk utilization is a problem. Okay. So we're going to dive in a little bit to the nuts and bolts of scaling itself and how it works. So like I mentioned earlier, um, vertical scaling is as simple as changing the resources on a broker. And usually that means a restart. Sometimes you can expand. Um, you can expand the amount of storage that a, that a broker has without a restart. Uh, but, uh, you pretty much always need to do a restart when you're changing the amount of CPU, uh, the amount of CPU and Ram on a broker. And so, uh, in this, in this example, um, and this is from,earlier as well, all three brokers had four cores and 32 gigs of Ram, but on the one in AZ three, it's been modified to eight cores and 64 gigs of Ram with four terabytes attached. And so it would simply take, uh, kind of a command to your,hyperscaler to adjust the resources, restart the broker, and then move to the next one. And that's called a rolling restart. Kafka is super familiar with that process because that's how it does updates when you're trying to update to a new version of Kafka, uh, and several other things. But that is built into the way that Kafka manages partition ownership, uh, producing and consuming. And so when you do that, you don't have to do anything from the client side. So the,applications that are using the Kafka clients to send data to Kafka and receive data from Kafka, um, have resiliency built in to handle, uh, a broker restarting. That's part of the architecture. So again, when it comes to an emergency situation, when you're in a scale, you want to scale vertically. So you have to make good architectural choices to make sure that you can scale vertically in that situation. So the next piece we're going to talk about horizontal scaling. And so horizontal scaling is a little more complicated and nuanced. And again I haven't covered every single possible situation here. Uh we're just going over the broad strokes to help illustrate, um, the complexity. So, uh, This is the same cluster that we had earlier where you have broker one, two and three across three availability zones. Uh, except I have introduced the concept of, uh, three new brokers. So you can horizontally scale this. And in,the first phase of a scaling event, what you'll do is just simply stand the infrastructure up, install the technology, and then join it to the Kafka cluster. Right. When you do that, it's not really doing anything at that point. So essentially all it's doing is saying, hey, I'm here, I'm an available broker, but it doesn't own anything, right? It's not owning partitions. It doesn't have anything that any read replicas or in-sync replicas that it could own at that point. So whenever you, uh, whenever these are online and healthy, you're going to go through a process where you distribute partition ownership Ship across the cluster. And so we talked earlier about you have um. Replication factor of three. And what that means is that each partition will have. A single owner. And then it will be replicated to two other nodes in the cluster. So that's what I've illustrated here is and there'smany ways that you can do this. Sometimes some people choose to do, uh, ownership and uh, replicas in the first jump essentially, uh, right away to change ownership and replicas. Other people might choose to just assign, uh, replicas to the new nodes and then go back and change ownership later. Uh, it just depends on, um, network saturation isone of the biggest things to consider what exactly is going on in the cluster with time of day? It is all that these are reasons that horizontal scaling is generally an exercise for a planned situation and not necessarily an emergency situation. So once you've got some partition, in this case replica ownership here. So, um, you have uh, one partition is represented in each availability zone. So you'll see. Let's look at partition one. Um, broker one owns the partition. Uh, it is the primary for it. And it is an availability zone. One availability zone in availability zone two, broker five has a replica of partition one and an availability zone three. Broker three has a replica of partition one. And that same pattern is demonstrated across every partition in every broker and every zone, right. So there's no more than there should be no more than three copies of the data across the entire cluster. And each partition should be represented in each availability zone. That's all taken care of thanks to rack awareness and some other features within Kafka. So finally then you'll go through and you can assign ownership or change ownership of those partitions. And so now what we see is I have six brokers and six partitions in this topic, which means that each broker owns one of the partitions. And then two other brokers, uh, have replicas of that. And again, same thing. There's going to be representation in each availability zone. That way if you lose one, you're not actually losing any of your data and you still have quorum. So that way you can, uh, if, you know, if you're reading, you know, acknowledgements, uh, say that I need to have, uh, more than one copy of the same data before I consider it to be like the source of truth. Um, thenyou'll have that set up in a horizontal scaling model here, And then finally justto, you know, look at the kind of the consolidated view. Um, essentially in this model, each broker is only either in charge of or serving as a replica for three total partitions. So, you know, when we talk about architecting for, uh, for vertical scaling, if you look at this model, again, this is one topic. Um, and there could be hundreds of topics, uh, across this cluster. But having the,availability to, um, scale this might be able to be something that could be handled on a couple of cores. Um, and, you know, eight gigs of Ram or 16 gigs of Ram. But if you start, you know, influxes a lot of data or your consumption is increasing significantly, you could bump that from 2 to 4 cores or 4 to 8 cores. Right. So you could vertically scale this type of cluster up in that type of emergency situation or in some sort of, uh, ebb and flow of your, of your data type situation. Um, and you can see how the complexity of, you know, as we, as we just go back, the complexity of adding nodes, joining them to the cluster, redistributing partition ownership, all of this, in this whole process, it requires data to be moved. Right. So you'rehaving to rely on the network bandwidth between your availability zones and your brokers. And that can take time. You know, if you've got 16TB of data attached to these brokers, it could be a couple of days before that's fully streamed from one broker to the other. Um, and if you're not careful during that process, you could saturate your network, which then could affect the ability for your producers, the things sending data to Kafka to actually get data in. It could also affect your consumers being able to get data out right. Maybe they get super latent, um, or they just disconnect altogether, you know, because they can't get heartbeats. There's all sorts of issues that can happen. Kafka has controls in place that can help you monitor that, but knowing how to use those andstepping in those landmines before onyou know, exactly how to configure them and exactly how to monitor them. That's something that takes a lot of time, right? That takes experience, uh, of being in the trenches and experiencing those issues. And, you know, that's the that's where the managed service comes into play. So NetApp Instaclustr, you know, we've been doing it for, uh, 14 years as Instaclustr NetApp has been in the enterprise business for more than 30 years. So we have over 400 million note hours, uh, delivering industry leading SLAs. Um, and that's not just an uptime SLA. It's also a performance SLA. Um, for Kafka, we have up to five nines of availability. Now, there's some nuance there. If you're using Privatelink or something like that, it'sonly four nines. But, you know, in certain circumstances we can offer you best in class SLAs when it comes to security and compliance. Um, we run the gamut GDPR, Soc2, um, PCI compliant, ISO 2701. Uh, we also monitor for CVEs constantly. Um, and we patch as soon as we can in the event the community isn't moving quickly enough on a specific CVE. Uh, we can also jump into the open source community and apply a patch for you as well. Uh, and it's a comprehensive solution. Right. So we don't just have a managed platform. Uh, we also have 24 by seven enterprise support. And thatsame team that runs our air support team is running the manage platform as well, and we have consulting services, a professional services. So we have anything that you need here to help make you running Kafka easier for you. Uh, it's not just Kafka though. So one of the mantras of us, uh, of NetApp Instaclustr here is we're all about open source. Right. Uh, and so what we've done is we've picked a curated list of technologies where each technology services something within kind of the big data pipeline space. And so we have technologies around storage, uh, obviously streaming, which is what this is about. Kafka and Kafka connect. Analytics searching and then orchestration, mostly cadence. But zookeeper does help, and we've been running it for a long time as part of cadence, as part of Kafka. Um, and so we love open source because it eliminates the commercial tax that you get on software. Um, plus there is a vibrant community around the open source stack, and we can manage it anywhere, in any cloud as well as on prem. So, um, you know, you can hear from me all day. Um, I am a technologist, but part of my job here is to, is to talk about our solution. But really, the proof's in the pudding, right? So Pega is one of, uh, one of our best customers here. Uh, and they leverage Kafka, Cassandra and OpenSearch. And they do it today in Google Cloud and AWS. Uh, and we have saved them a tremendous amount of time from a DevOps perspective, uh, managing that. So essentially, if you're running Pega or anyone that you know is running Pega cloud, uh, when Pega goes to run, uh, pega cloud, uh, stand up a pega cloud environment. They're leaning on NetApp Instaclustr for the underlying data services of Cassandra, Kafka and OpenSearch and, um, in addition to that, we're also a partner with Shad. So Shad, uh, came to us early on in the design phase, uh, for Kafka, Cassandra and OpenSearch on a new solution that they're trying to work on. And essentially, they had an entire enterprise architecture laid out. And they came to us and said, hey, we're looking for a vendor to help us with these. You cover all the bases. Uh, what can you do? Right. So myself and several consultants sit down and talk to them about their architecture. We pointed out a few things that they hadn't thought of and things that just come with experience for running each of these technologies. Andthey made some pretty significant changes to their architecture and their on our managed platform today. Uh, for those technologies as well. And again, they're super happy, uh, with the service and uh, and I know you would be too, uh, if you're running Kafka today. So really, uh, the key takeaways here is that scaling Kafka can be complicated. Um, NetApp Instaclustr. We are experts at that. And other technologies, uh, we're experts at running it in a high performance layer. Uh, we're experts at scaling it when you need to, uh, and we're experts at helping you, uh, make better choices around, uh, designing Apache Kafka for your specific needs. And we got comprehensive support, not just for Kafka, but for all the other open source tools as well. So before we get to the Q&A, uh, really, it's, you know, please go give us a shot. Uh, if you're interested in Kafka, you're managing it today. Either one of those things hit our website. Let's do a free trial. You can hit me up. We'll get our contact details in there in case you want to talk to me directly. Uh, but we would love for you to give the, uh, the platform a try and see what you think about it. And, uh, Jesse, I think we'll kick it back to you for any sort of Q&A. Uh, ifanyone's asked questions while we've been going through this. Yeah.that was great. Andrew, thanks so much. Um, we do have some questions, so let's jump right in. Uh, the first one. Is it possible to migrate from Kafka 3.6 to 4.0, or is there some kind of interim step. Right. And then if there is like what are the downtime implications of that? Well, I expected it all to be around scaling. But throw me a curveball here, Jesse. I appreciate it. It's not me. Yeah,No, it's all good. Hey, look, this is hot in the topic inthe community right now, so I get it. Um, so. Uh, it's been about a year, um, since, uh, I think about a year since the, uh, since the release of craft. Right. So when we talk 4.0, what we're really talking about there is the quorum controller has switched from Zookeeper to craft or craft. Um, and so in that switch, uh, for oh no longer supports zookeeper. And so whatour recommendation here is,essentially migrate to three nine, uh, that migration is,at this point pretty well worn. Um, and we can definitely help with best practices in a strategy on that migration. So you migrate to 39I think it's 392. Now, remember, off the top of my head, um, migrate to three nine and then once you're in three nine, uh, you can switch. You can either switch during the migration or you can switch to craft from zookeeper while you're in three nine. And then you can bump up to a 4.0 version, uh, if you do it right, there shouldn't be any latency increase or downtime. And if you need some help, give you a call. Got it. Yeah, absolutely. Of course, the next one. Um. All right. So can your consulting or managed services, whichever you prefer to call it. Uh, set up tiered storage for Kafka. And have you done this before? And what does my team need to provide to you to get this configured. Mhm. Okay. So uh yes, our consultants have absolutely set up tiered storage. Um an important caveat. You know we again we're open source and so we work on Apache Kafka. So um, you know if you're running Apache Kafka and you need help with tiered storage, uh, in your environment, our consulting team absolutely can help. Uh, essentially, there's a variety of plugins that you can use to help, uh, interface with remote storage. Um, but essentially S3 compatible. One will work. And once you have an S3 compatible storage mechanism, we can help come in and help you configure Kafka to work with that mechanism, help you understand how that remote storage apparatus works, and then help you configure your topics to leverage, uh, tiered storage. Um, tiered storage is available as a checkbox in the managed service as well. There's still a little bit you have to do per topic to tell it which ones you want to tier, and at what time you want to tier. Uh, but it's just a checkbox in our managed service. Okay. Very cool. So you asked for scaling. You got it. Oh, we've run into scaling issues with Kafka. You have come to the right place. Uh, what are some of the ways you ensure brokers can handle higher throughput? Yeah. And so the slide that I did onarchitecting for scale, I think, I think that hits it. Um, if I recall, we're going to send this deck out or it'll be available for download and I've got some more nuanced details in a, in a kind of an appendix that might help you, but essentially the in the end, it is, uh, design your cluster to vertically scale, right. So if you're having a performance problem, um, there's usually, you know, a couple steps. First, it's triage, right? It's. Stop the bleeding. Um, and oftentimes that looks like a vertical scale, right? You double the Ram and CPU. Add some more storage to it, and then you analyze the problem, figure out what's going on. Um, and when you analyze the problem to figure out what's going on, if it's a Ram or CPU thing or a storage thing, um, you know, we can help you with that. Analytics. We have emergency support contracts. So if you're in the moment, you're having a problem and you're like, oh man, my Kafka guy went to Aruba and you know, I can't find him. You can call us, we can do an emergency support contract, and that's why we're there, and we can help you through that. Um, butI would just say architect it so that you're prepared to vertically scale. Uh, and then that gives you the time to,make a better decision. Um, in the case of the triage. Right. You stopped the bleeding. Now you get to stitch it up and have the long term care. Awesome. And if you're out there and your Kafka guy is in Aruba, give me a call. Yes, Jesse. Sorry. Yeah, again. I'm a technologist, right? So I. Love that. Um, all right. Last one. I know we're pushing up on our time, but we do have one more. Can you configure custom performance alerts based on the activity of the cluster? So we'll be aware of any issues that arise. And if they do how does your solution handle this. Like is it automated. Okay. So there's a lot of layers there. Um sosuffice it to say that um there the first layer is going to be your infrastructure layer, right. Is my virtual machine online and healthy? Is my Networking working? Uh, it has my disk. Um, is it full? Is it close to full? How does that look from that perspective? And the servers take care of all that. Right. So, uh, and yes, you can create custom alerts. We have, uh, Prometheus integration, Datadog, Splunk, all the good stuff. Right. Um, so,yes, you can create custom alerts for that. But that. But again, that'son us. Uh, we do expose, uh, consumer group metrics and other metrics with regard to consumer lag and all that.is more application layer, and that's not really actionable by us. But those can be created, uh, in your monitoring system. And most of our clients are large enterprise clients, and they've got existing monitoring systems. So we really have those integration tools built in so that you can just plug it into yours and makes it a little easier, uh, for the things that you need to action on. Awesome. Well, thanks for that. Sorry. I'll hit you with three and one for the last question, but, you know, uh, listen, that's all the time we have today. Uh, thank you, Andrew, for the great talk and for all the information. Uh, thank you, all of you out there for tuning in and listening. And thank you to NetApp Instaclustr for bringing this webinar to us today. We'll see you next time.
Gain insights into Kafka's architecture and learn scalable strategies to ensure peak performance. Ideal for IT professionals, developers, and system architects, this is your guide to high availability and fault-tolerant Kafka systems.