top of page

DR Criteria in the Multicloud World

  • 3 hours ago
  • 4 min read

Part I: The Criteria That Matter


Disaster Recovery products, deployments, and strategies have evolved extensively in the last 15 years or so. With Cloud technology, robust virtualization options, vastly improved wide area networking, and new products and methods, it can be a daunting task to plan and implement a modern Disaster Recovery solution. Legacy backup and DR products with limited support for Hyperscaler as well as emerging Cloud technologies can complicate things even more.


It's tempting and common to focus on a few highly quantitative metrics such as RPO and RTO. And as important as they are, there are other equally important elements to evaluate for a successful DR plan and implementation. This is the first in a series of blogs that attempts to itemize evaluation criteria and explore considerations for each.


Evaluation criteria should include:


  • DR site location and network infrastructure

  • Personnel plan

  • Application categorization

  • Wave planning

  • Handling production IT changes

  • Boot order and post-boot delay

  • Recovery Point Objective (RPO)

  • Recovery Time Objective (RTO) and Data Consistency

  • DR drill planning

  • Converged or separate backup and DR

  • Hybrid and Multi-Cloud

  • Product selection


Subsequent blogs will address each of these criteria in more detail, but to complete this blog I'd like to discuss how RackWare concluded its approach was the optimal architecture to optimally address all of the above.


When RackWare was founded with the vision of building the most sophisticated mobility and DR solution, we evaluated multiple different types of solutions. When engineers build products, they very often design the product around their skillset as opposed to what the real end user needs are. If you have storage expertise, you will build a storage-based solution. If you have hypervisor expertise, you will build a hypervisor virtual appliance. As the adage goes, if your tool is a hammer, every problem looks like a nail.


So, we actually built 3 functioning prototypes and tested and evaluated all three against a wide range of criteria. And along the way there was some friendly debate among us as to which would prove the superior.


The first prototype was a VMDK solution. At the time these were fairly new and not much was known about them. In a VMDK solution, a VMDK file, designed to operate in one specific Hypervisor context, is transformed to run in a different hypervisor. In many instances this works well, especially for smaller, simpler Virtual Machines, especially in isolation. This prototype was productized enough such that it was even used a few times for production migration projects. These were successful, but it became clear that a VMDK solution was not optimal when considering the broader scope of criteria.


The second prototype was a storage solution whereby a kernel filter driver accesses and replicates storage at the sector level. Storage solutions can work well, especially when you have identical hardware and storage on both sides and support from storage appliances. However, we quickly discovered that when removing the same hardware/same storage constraints, this approach was very troublesome, particularly considering the coming Cloud age where hardware platforms and storage would be different. We did a couple of special projects with it but quickly deprecated this approach.


The idea behind the third prototype was to replicate and delta sync through the Operating System whereby, among other technology, filesystem logical volumes were replicated and subsequently delta sync'ed. The theory was that applications see the world through the eyes of the Operating System. The OS presents storage and networking to applications as well as any other hardware or platform services. So, replicating through the OS provides the widest scope and most consistent experience for both end users and applications. When coming from a VMDK or storage background, it's a little counterintuitive — why would you do this? And not surprisingly, this was the most difficult solution to build. But it was almost magical how it solved all the inadequacies of the other two approaches, especially for Cloud, Hybrid Cloud and Cross Cloud use cases.


Some of the advantages include:


  • Highly flexible DR policies with granularity of frequency and selected data

  • Far superior data consistency ensuring applications achieve production operational state with no or minimum intervention

  • Permits DR test without bringing up a second site

  • Converged backup features such as retention policies and single file restore

  • Flexible and better storage options on the target side

  • Supports in-memory applications (e.g., SAP HANA) as part of standard operations with no special or additional configuration or product extensions

  • Initial replication is more efficient as it only replicates used data as opposed to the entire disk

  • Not sensitive to network outages; never requires a complete re-replication of data no matter how long the outage

  • Selective sync at the drive, directory and file level

  • Not sensitive to disk defragmentation

  • Single file restore and restore anywhere

  • BIOS → UEFI and UEFI → BIOS conversion


The superiority of our Image Replication was rapidly proved in the Enterprise market. RackWare garnered the reputation as being the solution of choice when other products failed or were hard to use. We secured 4 patents along the way and hundreds of happy customers with over 1 million servers successfully replicated.

 
 
 

Comments


bottom of page