(upbeat intro music) (lively music) Hello! I'm Phoebe Goh and I am at AWS re:Invent chatting to some people at the Spot booth. I'm joined today by Hudson Buzby, from Spot Ocean for Apache Spark. Hey, we've been talking a lot about big data at re:Ivent. What are customers trying to do with big data these days? Yeah, so I think two common trends that I've seen, amongst customers here and outside, are one is cost savings. With the environment I think and the economy where it's at, a lot of companies are hitting layoffs and looking for areas to save money, and trying to save jobs, primarily. And big data traditionally is one of the largest line items on customers cloud spend. So what we do is allow customers to run their Apache Spark workloads significantly cheaper through the better utilization of Spot Instances. Spot Ocean is kind of the primary backing behind thatallows us to run the Spot, look at the Spot marketplace and find instances for applications that are cheaper, more available, and more reliable. And we're taking all of that and applying it to Apache Spark. Wow. Okay. That's a lot. Let's unpack a little bit of that- Sure.Yeah. So obviously cost optimization because big data when collecting so much of it and running it on Apache Spark can be expensive. Yes. Absolutely. Yeah. And I think one of the other major frustration points we see with Spark in particular is that it's a really difficult and frustrating language to work with- Sure. You're frequently looking in logs that are 10, 20 thousand lines long and you're given kind of a UI with data points, 20 thousand things. You're trying to synthesize all of this together and understand what's happening in your application- Yeah. To actually just fix when your application fails. So what we do is we offer some resource utilization graphs that help you understand your applications. Restore your logs for you in a nice easy dashboard so- Yeah. Just making that developer experience a little bit easier so they can kind of focus on what's important, which is actually developing. Yeah. Sure. So speaking to developers out there, and data scientists, and data engineers, what would you like them to know about how they can run their Spark environment better? Yeah, a few things that we offer. We have an analysis tool that's actually open source. You can find it on our GitHub and you can, with a few simple settings, plug it in and you can actually see, like I was mentioning earlier, that graph that shows you the performance of your application. We embedded it into our platform, but it's also available as an open source tool. And really there's not another product out there that gives you that much insight and intelligence into your applications. Well, you heard it here! Right? There is not another product out there that gives you that much insight and I think that's what's really important. Seeing what you've got and what you can save and how you can run more efficiently. Thanks so much for joining me- Thank you, Phoebe. Today, Hudson. (Phoebe laughs) Thank you for joining me, too. And if you want to checkout even more great content, stay tuned right here on NetApp TV.

Greater insight and intelligence on big data

Hear about solutions for Apache Spark big data analytics environments.

Spot for Amazon Web Services