Menu

Kubernetes meet BeeGFS: A tale of future-proof investment.

Joe McCormick
111 views

Kubernetes and BeeGFS NetApp has brought support for BeeGFS to Kubernetes providing a scalable and high performing option for storage that costs nothing to get started with. If you are a Kubernetes user, you can now use the BeeGFS container storage interface (CSI) driver from NetApp® to access existing datasets in a BeeGFS parallel file system, or request on-demand ephemeral or persistent high-speed scratch space. Furthermore, if you’re already using BeeGFS, you can now use familiar storage while breaking into the next paradigm of workload management and orchestration with Kubernetes. The BeeGFS CSI driver is freely available as a contribution to the open-source, artificial intelligence (AI), and high-performance computing (HPC) communities.

As I help build environments to support NetApp’s own data science initiatives, I’ve been eagerly awaiting the ability to integrate BeeGFS and Kubernetes. While the shipped driver has now earned 1.0 release status, I’ve been using it to back MLOps tools like Kubeflow and data science tools like Jupyter that have been running in our environment for months. This early exposure provided the perfect opportunity to ensure the driver fits seamlessly within AI data pipelines and other workflows.

Why Kubernetes and BeeGFS?

Kubernetes and BeeGFS are both technologies used to solve some of the world’s most demanding challenges around scalable compute and storage. So, it was only a matter of time before the two became acquainted. There’s a wide range of uses for this Kubernetes and BeeGFS combination. But you’ll find the combination especially valuable if you need flexible yet powerful infrastructure options for AI initiatives that take your organization into a new cloud-native era.

As interest in AI grows, especially in large enterprises, Kubernetes is emerging as the standard platform on which to develop infrastructure supporting AI initiatives. In particular Kubernetes enables IT departments to deliver a more cloudlike experience, empowering developers and data scientists while maximizing the value of specialized (and expensive) AI hardware like GPUs. This cloudlike experience includes application portability between your cloud providers and on-premises cloudlike environments—in other words, in your hybrid cloud.

Some storage technologies used by enterprise IT departments are cost prohibitive, highly complex and/or unable to scale to meet AI storage requirements when moving from pilot to production. BeeGFS, on the other hand, is a proven fit to handle performance and capacity requirements of data intensive workloads like AI. Although parallel file systems are sometimes associated with complexity, BeeGFS was designed from the ground up to give you a better user experience. The self-supported Community Edition of BeeGFS allows you to try it (indefinitely) before you buy it. Then when you’re ready to scale into production, you can purchase the Enterprise Edition along with support and features like high availability.

With proven BeeGFS deployments of up to 30PB (and, theoretically, no limits in sight), you can start small with proofs-of-concept and pilots while knowing you can scale to meet long-term storage, compute, and GPU requirements.

Introducing the BeeGFS CSI driver

Using the container storage interface (CSI) we are able to add support for new storage options like BeeGFS to container orchestration systems like Kubernetes. The new CSI driver allows directories in BeeGFS to be used as isolated persistent or ephemeral volumes in Kubernetes. Keep in mind that BeeGFS provides massive scale-out performance and capacity, so combined with the driver you can meet key storage requirements in Kubernetes:
  • Dynamic storage provisioning: As a user, I want quick access to on-demand high-performance scratch space or semi-temporary storage for my applications.
    • How: Administrators expose BeeGFS as storage classes in Kubernetes that allow users to request storage without having to worry about where it comes from.
    • Example: I need some space to transform my dataset before I can use it for training, and I want to avoid an extra copy. So, the scratch space should be accessible by multiple Kubernetes nodes or GPUs.
  • Static storage provisioning: As an administrator, I want to avoid multiple copies of identical datasets within my storage environment. So, I prefer that users have concurrent access to common datasets from a central read-only location for both experimentation and production use.
    • How: Administrators create a persistent volume that references a specific directory in a specified BeeGFS file system.
    • Example: Financial data is ingested daily and made available in a location that allows multiple users to use it for their applications and data pipelines.
With the 1.0 version of the driver, you can also:
  • Create storage classes representing different tiers of storage within the same BeeGFS file system by using storage pools.
    • Example: Fast storage backed by SSDs, with archive storage backed by near-line SAS.
  • Create storage classes optimized for different workload profiles within the same BeeGFS file system by using striping.
    • Example: Optimizing by file size (large or small) and the number of nodes accessing each file (one-to-one, many-to-one, or many-to-many).
  • Easily apply global and node-specific BeeGFS client configuration to Kubernetes nodes.
    • Example: Setting preferred network interfaces and performance tuning.
The driver can be used to access any number of BeeGFS file systems from your Kubernetes nodes. For details on all available functionality and how to get started, see the documentation that's on GitHub.

Getting Started

Deploying the BeeGFS CSI driver is simple if you use the provided deployment manifests and examples. Don't have BeeGFS yet? BeeGFS can be quickly deployed on a single server or virtual machine for experimentation, and when you're ready NetApp has you covered with a range of enterprise solutions. The BeeGFS CSI driver is just the latest in NetApp's continued innovation with Kubernetes, AI, and HPC, all part of delivering solutions to the world's biggest challenges with data.

We are constantly working with current and future customers, along with our wide network of partners, to understand how we can continue enabling all the amazing things our customers do. If you have ideas for functionality you want in the BeeGFS CSI Driver, I want to hear them.

Joe McCormick

Joe McCormick is a software engineer at NetApp with over ten years of experience in the IT industry. With nearly seven years at NetApp, Joe's current focus is developing high-performance computing solutions around E-Series. Joe is also a big proponent of automation, believing if you've done it once, why are you doing it again.

View all Posts by Joe McCormick

Next Steps