For nearly a decade, the NetApp® ONTAP® codebase has been seamlessly flowing through a self-healing continuous integration and continuous deployment (CI/CD) pipeline running a variety of unit, functional, integration, and systemic test cases. The pipeline runs across various NetApp® products and platforms, including AFF and FAS, Cloud Volumes ONTAP®, and Astra™ Data Store. It's our bellwether for shipping new features and products. We have one of the most state-of-the-art CI/CD pipelines in the industry, and our innovations are built on a culture where failures aren’t ridiculed but rather celebrated for being spotted early in development. We no longer use the term DevOps regularly because it’s simply what we do every day. Truly drinking our own brew!
Just like many of our customers who require data storage to house critical infrastructure, we use ONTAP data management software— to store test results, virtual machines VMDKs, container images, logs, core files, and other metadata. With our test content "shifted left," we knew that we could take the next big leap and move to testing directly in production. Continuous deployment is the method of fully automating the deployment of software into the end users’ hands without manual intervention. This isn't new to us; we’ve been “drinking our own brew” for decades—doing continuous deployment across our IT and support organizations. However, those events typically happen before the release of an RC or GA build, not immediately after a code submission.
Recognizing the benefits of continuous deployment, it was clear that we wanted to integrate it further into our fleet. As our first target, we selected a workload around virtualization built on top of an AFF A300 high-availability cluster supporting FlexVol® datastores for VMware ESX6/ESX7 over NFS. We decided to pursue that area first because many of the pipeline tests perform fresh installations.
We implemented a mechanism that, at the time of the scheduled deployment, removes the resources from the target cluster (we refer to this process as "drain"); installs head-of-line ONTAP; validates that the installation was successful and is serving data; and finally moves the resources back into production.
With this ecosystem in place, we're able to install ONTAP without disruption and without impacting SLAs and SLOs. And we can push new changes and features into production in just hours that historically took months. After each successful code check-in, a report is sent to the developer informing them of how much of the deployment is covering their submission. Achieving this capability took years to perfect and we now perform these deployments daily across the NetApp test engineering environments around the world.
We're now extending our deployment journey further upstream into the mission- critical tiers of the infrastructure. We’re targeting the system that drives the build process for our automatic healing platform through a blue-green deployment leveraging Automated Non-Disruptive Upgrade (ANDU) across separate AFF A800 4-node clusters. This build process is based off a daemon NetApp FlexGroup volume, which creates a NetApp Snapshot™ copy for every commit, and finally creates a NetApp FlexClone® volume to perform the actual build. Before every ANDU we also use the SnapMirror® feature of ONTAP to mirror the daemon volume in case a disaster recovery procedure is required.
Now who wouldn’t be interested in sampling NetApp’s continuous deployment brew? Find out more about ONTAP.
Jonathon has been working at NetApp for over 15 years and is a Senior Engineering Manager in the Build, Automation, EngineeRing, and Operations (BAERO) organization part of the Hybrid Cloud Engineering (HCE) group. Jonathon’s teams focus on bringing exceptional quality to NetApp products through test tools, centralized services, and toolchain. His teams also focus on developer productivity to ensure engineers can develop and test as efficiently as possible.