Sign in to my dashboard Create an account

How to cut costs by running MySQL on Spot

Three people writing on transparent board
Table Of Contents

Share this page

Win Vahlkamp
Win Vahlkamp

MySQL is one of the most widely used database systems in the world, counting some of the world’s largest cloud-native companies as users—such as Pinterest, Netflix, and GitHub. Many customers today want a managed platform such as that provided by Azure or AWS. However, there are also customers who need to operate MySQL in an infrastructure-as-a-service (IaaS) delivery model due to the functionality and customizations they need.

By decoupling the storage from the VM instance with first-party storage services Azure NetApp Files, Amazon FSx for NetApp ONTAP, and Google Cloud NetApp Volumes, the storage capacity and performance can be scaled separately and nondisruptively from the compute. This approach simplifies database protection and recovery without costly management and storage overhead.

Other articles have discussed running MySQL in various ways, but our goal was to determine whether MySQL could be run effectively using Spot Elastigroup by NetApp® and/or Stateful Nodes with Azure NetApp Files as the back-end storage. Spot by NetApp automates and optimizes your cloud infrastructure in AWS, Azure, or Google Cloud to deliver SLA-backed availability and performance at the lowest possible cost.

Using Spot and Azure NetApp Files can add significant value to MySQL environments by speeding infrastructure deployment, reducing data protection costs, and reducing compute costs.

MySQL architecture

MySQL uses an InnoDB cluster with a primary node in a shared-nothing architecture with each replica node maintaining a copy of the primary node’s data. You can deploy a multi-primary model that requires extra considerations for writes if your workload demands it.

MySQL recommends using a MySQL router to connect clients to the cluster (Figure 2). A MySQL router—middleware installed on the application clients—provides transparent routing between the application clients and the MySQL cluster. The network ports determine which node is read/write and which nodes are reads that must be incorporated into the SQL connection used in your code. It’s possible to use an external load balancer, but that requires your code to have a try/catch function to ensure that you connect to the primary node when doing writes.

mysql load balancer
Figure 1) Load balancer
MySQL router
Figure 2) MySQL router



We deployed all MySQL components with Spot Elastigroup and Spot Stateful Nodes to deploy MySQL in a bare-metal cluster and also in a containerized deployment. It’s also possible to run MySQL components in Kubernetes by using Spot Ocean by NetApp, but we didn’t test this approach. Spot Elastigroup and Stateful Nodes make it extremely easy to deploy the equivalent of scale sets in any cloud; scale up, out, or in; complete rolling upgrades nondisruptively; and also provide telemetry to understand performance and other conditions. Terraform can easily deploy Spot resources for full automation. We had to iterate numerous deployments and changes to our testing clusters—something that Spot made trivial. The savings we realized from using Spot instances was considerable: an 80%–90% savings costs on the compute for our testing. That was incredible!


Azure NetApp Files is a high-performance Microsoft NAS-as-a-service offering that’s powered by NetApp ONTAP® data management software and NetApp AFF clusters running natively in Azure’s data centers.

We provisioned Azure NetApp Files Premium volumes for the storage. You could also use Ultra volumes if your performance density warrants it. If your throughput is constant, it’s cheaper to use Ultra over Premium to achieve the same throughput. MySQL recommends using NFSv4.1, but we tested NFSv3 with no problems.


Performance wasn’t a major focus of this testing, because other testing has proved that capability for Azure NetApp Files, NetApp Cloud Volumes ONTAP, and other NetApp based storage. MySQL defaults to 151 connections; the default read I/O transfer sizes are 16K, and the write I/O transfer sizes are 32K. Host CPU utilization can be saturated with enough connections, but we were never able to push the Azure NetApp Files volumes or even managed disks to their limit, demonstrating the efficiency of the InnoDB distributed buffer pool. With Azure NetApp Files, storage latency remained at ≤ 1ms, with an average of ~.3ms (300ms). Because we couldn’t saturate the storage, we nondisruptively resized the volumes for capacity rather than performance, yielding significant cost savings.

All NetApp storage provides tremendous performance capability, which reduces compute costs and operational costs. You can dynamically and nondisruptively adjust volume sizes to accommodate performance demands or troubleshooting needs. These adjustments can improve operational flow and resilience, which also adds outstanding value.

MySQL resilience and data protection

The Azure 99.99% uptime SLA requires zonally deployed VMs. Managed disks depend on the VM and the disk type. However, when you’re using only Spot instances, this isn’t possible, because they aren’t included in the SLA. To achieve this with Spot by NetApp, you would need at least two zonally deployed on-demand instances, and the remaining replica nodes could be Spot instances.

Azure NetApp Files offers a 99.99% uptime SLA by default with no additional deployment requirements.

In testing loss of the primary node with concurrent queries running over a MySQL router or a load balancer, we saw no disruption to the query. Loss of any replica didn’t disrupt read I/O.


NetApp Snapshot™ copies were easily integrated for application-consistent backups by a cron-scheduled Python script that triggered a Snapshot copy every hour, retaining 12 Snapshot copies. This approach adds excellent value due to faster time to completion of the backup, shorter time to recovery, and reduced storage needed for backups. Backups based on Snapshot technology are the equivalent of a MySQL Enterprise ”hot” full backup with significantly less effort, no extra licensing, and no extra infrastructure. Snapshot copies have zero impact on performance. Because they capture only the changed 4k blocks, they consume much less capacity than a mysqldump backup. By using Azure NetApp Files backups to tier off the Snapshot copies to Blob storage, you can archive them for long-term retention.


You can recover individual files instantaneously to each node by using Single File SnapRestore® with a subsequent logical database recovery completed as normal. Azure NetApp Files includes this functionality and requires no external tools. Recovery of the file is nearly instantaneous because it’s not a streaming recovery; it’s a recovery from a Snapshot copy. However, when you’re restoring to the MySQL primary node, the restoration doesn’t replicate to the replica nodes. You must restore the table file to each node and then take the cluster offline temporarily, using the recover cluster from the complete outage process to the recover the files. You can automate this MySQL process.

Copy management

In addition to Single File SnapRestore® file-level recoveries, you can clone entire volumes in seconds for fast and cost-efficient delivery of DevTest, QA, staging, and training environments. Simply restore all volumes of the cluster from a Snapshot copy, deploy a new cluster with Spot, recover the cluster from a total outage, and you’ll have an entirely new cluster with fresh data. We used Terraform to automate this entire process, leveraging a continuous integration and continuous deployment (CI/CD) process with Azure DevOps and GitHub for version control. This is an incredibly valuable capability across all NetApp based storage. By moving to an on-demand provisioning of a nonproduction environment while also having the freshest copy of the database, you reduce your infrastructure costs by the number of standing environments you used to maintain and significantly decrease defects in your code.


Infrastructure pricing is something of a dark art, with many input and factored variables. The main questions we considered were what savings would be achieved with various number of Spot instances combined with the use of managed disks versus Azure NetApp Files.

To compare managed disks to Azure NetApp Files, you need to understand the differences in their pricing models. On paper, managed disks often appear cheaper, but in operational reality, they’re at least as expensive as Azure NetApp Files. Bursting charges have caused many an unpleasant surprise in one’s bill. Transaction costs are included in Azure NetApp Files pricing, whereas managed disks have a base cost per disk plus bursting. Azure NetApp Files has a 99.99% uptime SLA with no extra modifications necessary; instead, for managed disks to achieve a 99.99% SLA, they require a zonal configuration with the VM to which they’re attached, which increases their costs.


MySQL with Spot and Azure NetApp Files (or Cloud Volumes ONTAP, FSx for ONTAP, or Google Cloud NetApp Volumes) brings tremendous operational benefits to MySQL. Performance is predictable and scalable, so you can realize efficiencies (more processing for less compute) at the compute layer. Resilience is enhanced with a default 99.99% uptime SLA at the storage layer. Azure NetApp Files volumes can be automatically provisioned to reduce deployment time, and it's also decoupled from the compute layer. Data protection, recovery, and related operations are significantly enhanced with Azure NetApp Files volumes, which can offer the added value of improving development accuracy and efficiency. Give Spot by NetApp a try.

Win Vahlkamp

Win is a Data Solutions Architect with over 25+ years of experience in systems architecture and engineering. He is focused on developing open source data solutions across the NetApp’s cloud and on-prem portfolio of products and solutions. Previously, he was an Azure Cloud Solutions Architect, Global Technology Strategist, and Senior Solutions Engineer for some of the world’s largest oil companies. In his spare time, Win reads a lot, especially about Systems of Profound Knowledge (Deming) and the Theory of Constraints (Goldratt), practices Iaido (Japanese sword), and likes to travel.

View all Posts by Win Vahlkamp

Next Steps

Drift chat loading