In today’s constantly connected global business environment, organizations must plan for varying degrees of failure. Failure can affect anything ranging from a network to devices, storage, and the complete site itself. The worst-case scenario for a business disaster is obvious — a catastrophic event, such as a natural calamity, fire, or human-created disaster that physically destroys your whole site location. Other failures might consist of partial loss or corruption of data, security breaches, temporary service outages, or even loss of key personnel. These failures, too, can constitute a disaster that affects your day-to-day operations.
Effective Disaster Recovery
To devise an effective data protection and disaster recovery approach, organizations must focus on what needs to be protected and for how long, so that business-critical data can be recovered from any of these types of failures. The most important mechanism in disaster recovery is replication. With optimal replication, data is recoverable at one or more data centers.
Disaster recovery plans contain SLAs such as:
Earlier disaster recovery plans included dual-site configuration: IT operations would transfer activity to another site in case an event caused the primary data center to go down. However, the distance between these two sites affected how much data could be permanently lost. Data transfer latency increases with distance, which suggests that sites should be close together. But when sites are close together, there’s a risk that both sites could become unavailable in a natural calamity. Also, the recovery time depends on factors such as the speed of the link between the local and remote recovery nodes, the amount of data to be recovered, and the complexity of the recovery process.
- Zero recovery point objective (RPO), to achieve no data loss
- Near-zero recovery time objective (RTO), for faster recovery of business-critical applications if a disaster occurs
To sustain business operations with minimal data loss if vast disasters occur, the current requirement is a three-data-center configuration to achieve zero data loss and geographic dispersion. This approach consists of two data centers located close to one another, and the third located farther away. The goal is also to maximize investments and get the most out of the IT infrastructure by being able to reuse a secondary facility for business intelligence or development and testing.
NetApp® SnapMirror® technology allows you to replicate data at high speeds over LANs or WANs. It enables high data availability and fast data replication for business-critical applications such as Oracle and Microsoft SQL Server in both virtual and physical environments. SnapMirror replicates to one or more NetApp storage systems and continually updates the secondary data, keeping your data current and available whenever you need it.
SnapMirror Synchronous and SnapMirror Asynchronous
The three-data-center configuration for disaster recovery uses a combination of SnapMirror technologies to enable zero data loss at different distances. One of these technologies, SnapMirror Synchronous (SM-S), is a disaster recovery solution for zero data loss, but it requires your round-trip time (RTT) to be 10 milliseconds or less (a distance of about 150 km); otherwise, application performance is affected. To overcome this distance limitation, you can use SnapMirror Asynchronous, which lets you protect your business even from a large-scale disaster (for instance, an earthquake) that would damage both the primary and local sites.
Because of its flexibility with synchronous and asynchronous replication, SnapMirror lends itself to various topologies. Two topologies that are used in a multitarget, three-data-center configuration are the fan-out and the cascade approaches.
In the fan-out topology, a primary data center and a nearby disaster recovery data center (with a replication network having <10ms RTT) synchronously replicate data between themselves to achieve zero RPO. The primary location also asynchronously replicates data regularly, depending on the quantity of new data being generated, to a disaster recovery data center that is farther away, as shown in the following figure.
If the primary site fails, you can restart the application with zero data loss from the nearby disaster recovery data center. The near data center would also take over asynchronous replication to the far disaster recovery data center. The following figure shows delta synchronization from the near site to the far site.
In the cascade topology, the primary site synchronously replicates to the nearby disaster recovery site, and the nearby disaster recovery site asynchronously replicates data to the far disaster recovery site. If the nearby disaster recovery site goes down, you can always configure an asynchronous replication from the primary site to the far disaster recovery site to keep updating or replicating all the delta changes.
Your choice of a failover site affects the capabilities of your disaster recovery methods, whether it entails restoring existing infrastructure, buying new infrastructure, or moving to a production cloud. A disaster recovery plan is a necessity for business continuity.
To learn more about implementing multi target disaster recovery using NetApp SnapMirror technology in the fan-out topology, refer to Three-Way Disaster Recovery Using NetApp SnapMirror for ONTAP® 9.7.