DR Platforms Are Showing Their Age – And It’s Not a Patch Problem

Apr 28
4 min read

By Todd Matters, Chief Technology Officer, RackWare

Every generation of enterprise technology has a moment where the platforms that defined the previous era start to strain against a world they weren’t designed for. We’ve seen it with storage. We’ve seen it with networking. We’ve seen it with virtualization itself. Disaster recovery is now having its moment.

That’s not a criticism of the platforms that got us here. The DR solutions that emerged a decade or more ago were well-designed for the problem they were solving: replicate virtualized workloads from one datacenter to another, ideally with matching hypervisors on both sides, and get the recovery point as close to zero as possible. For that problem, in that world, they worked.

The world changed. The platforms mostly didn’t.

What enterprises now call “DR” spans on-prem VMware, multiple public Cloud environments, often bare metal, increasingly containers, and even Private clouds and other non-VMware hyperscalers – often in the same recovery plan. The assumption that production and recovery are running on identical platforms and architecture, is no longer safe. And the technical choices that legacy DR platforms made early – choices about how to capture change, how to ensure consistency, how to target a recovery site – were made before any of this was real. Those choices show up today as architectural limits, not feature gaps. And you can’t patch your way out of an architectural limit. Moreover, recovery times today are far more aggressive.

There are two in particular that I think deserve attention.

The Journaling Problem

Most legacy DR platforms rely on continuous journaling – capturing every write as it happens and replaying them in order at the recovery site. In theory, this gives you very low RPO. In practice, it gives you a recovery point whose consistency depends entirely on whether the journal replay hits a coherent state at the moment you need it.

I’ve written before that recovery points built on guaranteed consistency are fundamentally superior to journaling approaches, where you never quite know if the data will align properly at any given checkpoint. That’s not a marketing claim. It’s a claim about what happens the first time you actually have to recover.

Here’s the practical consequence. When a journaling-based recovery server boots after a real failover, you find out whether your checkpoint was consistent – after the boot. If it wasn’t, you’re using manual recovery steps, guessing at the correct journing point, and likely applying database and application-level repairs, and the kind of phone calls no one wants to make during an incident. Modern in-memory applications are particularly sensitive journaling deficiencies. Aggressive RPO targets are useful. Aggressive RPO targets against an inconsistent checkpoint are false security.

The platforms that made the journaling bet made it a long time ago, when the alternative tradeoffs looked different than they do now. Today, with storage-efficient snapshotting, application-aware sync, and much better tooling for consistent delta capture, the tradeoff has inverted. The journaling bet made sense in 2013 with longer RTOs. It makes much less sense in 2026.

The Matched-Hypervisor Problem

The second architectural limit is more subtle but arguably more consequential.

Most legacy DR was architected around a specific assumption: that the recovery site would look, at the infrastructure layer, like the production site. Same hypervisor. Same storage fabric. Same network abstractions. If you were running VMware in Datacenter A, your recovery site was VMware in Datacenter B – ideally with matched hardware and software generations.

That assumption was reasonable when enterprise computing was homogeneously hypervisor-centric and the clouds were still immature. It’s not reasonable now. Production might be VMware on-prem. Recovery might be IBM Cloud VPC, or OCI, or Google, or some combination. The underlying infrastructure on each side looks nothing like the other. There is no “matching hypervisor” to replicate to – there’s an entirely different abstraction, managed by the Hyperscaler, exposing entirely different machine formats.

Legacy DR platforms handle this by requiring you to either build a matched environment at the recovery target (defeating most of the cloud value) or by bolting on translation layers that were not part of the original architecture. Translation layers of this kind tend to be fragile, feature-limited, and expensive to maintain. They work until they don’t.

The shift that actually needs to happen is architectural: DR has to be designed from the ground up for asymmetric, filesystem based recovery – with multiple Clouds, multiple heterogeneous hypervisors, and possibly physical servers recovering in virtual environments where production and recovery are intentionally different, where workloads move between fundamentally different infrastructure layers, and where containers and managed services live alongside traditional VMs in the same protection plan. That’s not a feature. That’s now a design mandate.

What This Means for the Refresh Cycle

None of this is an argument that legacy DR platforms are bad. They’re aging. That’s different. They solved the problem they were built to solve, and they did it well enough that an entire generation of enterprise IT trusted them with their recovery obligations.

But the architectural assumptions they made – hypervisor-to-hypervisor symmetry, journaling-based consistency, matched-environment recovery targets – are assumptions the current milieu no longer supports. Enterprises coming up on DR platform renewals, or staring down compliance regimes like DORA and NIS2 that force a real audit of recovery posture, are finding this out the hard way.

The right response isn’t to fight the old platform into accommodating the new world. It’s to look honestly at whether the DR architecture was ever designed for the environment it’s being asked to protect today – and if not, to treat the refresh cycle as an opportunity to align architecture to reality rather than extend a design that does not meet the economic or technical challenges for the immediate future.

That’s the real generation gap. And it’s not going to close on its own.

By Use Case

By Environment

The Journaling Problem

The Matched-Hypervisor Problem

What This Means for the Refresh Cycle

Comments