Five Ingredients for an Effective Disaster Recovery Plan

DRP.jpg

Over the past few years, cyber attacks have been on the rise. In fact, research shows that for every one hour of operational downtime; small and medium sized businesses incur an incremental financial loss of over $8,000 while large organizations incur over $74,000.

To resolve this issue, businesses across the globe have gotten serious in implementing an effective and actionable Disaster Recovery (DR) plan. A DR plan provides a template for protection against the worst consequences of critical disaster scenarios.

Although, all disaster plans are created uniquely. Here are 5 of the most important components that should be included in your disaster plan to ensure that your organization can operate effectively in the event of a disaster.

  • Define your tolerance for data loss and downtime: Defining your tolerance for data loss and downtime can help determine the type of solution needed for recovery. According to David Grimes, CTO at NaviSite, one should evaluate what an acceptable recovery point objective (RPO) and recovery time objective (RTO) is for each set of applications. “By properly identifying these two metrics businesses can prioritize what is needed to successfully survive a disaster, ensure a cost-effective level of disaster recovery and lower the potential risk of miscalculating what they’re able to recover during a disaster.”

  • Define key roles, responsibilities and parties involved in the DR Process: It’s important to identify key roles, responsibilities, and parties so employees know which tasks need to be completed in the event of a disaster. This ensures that the DR process operates as efficiently as possible.

  • Map out a communication plan: This is often overlooked in the disaster recovery plan. It is vital to map out a communication process to your employees when a disaster strikes. Robert Gibbons, CTO at Datto, states that “a disaster recovery plan should include a statement that can be published on your company’s website and social media platforms in the event of an emergency, it should also give your customers timely status updates on what they can expect from your business and when to make them feel much better.”

  • Provide a backup worksite for your employees: It is vital to create a backup worksite for your employees in case of an emergency. Inform your staff about this site and inform them how to access it.

  • Include disaster and emergencies in your legal service agreement: If your organization leverages a Managed Service Provider or Data Center, it is vital that you have a binding agreement that defines what happens to your data in case of disaster, and also ensure that your providers work towards resolving the in a timely fashion.

Apart from above, it is very important for every business owner to test their plans regularly and ensure the disaster recovery (DR) process works out as planned.


Software Agents and Cloud Migration

Software Agent.jpg

Agents are little pieces of software, usually small, that are installed on servers so that another piece of software can extract information from, or control some aspect of the server.  Although software agents are often required in certain application they can come with certain disadvantages:

Security & Compliance

Many enterprises require a strict adherence to rules when it comes to installing software on servers.  This is because software agents may open up the company to unknown security issues, such as opening unauthorized network ports, or causing conflicts with other pieces of software.

Resource usage

Every piece of software has processes that take up resources, be it CPU, Disk, Memory or Network.  If left unchecked, they can begin to affect the workload’s primary applications, including its responsiveness and ability to serve customers or end users.

Compatibility

Sometimes applications have compatibility issues and conflict with one another for resources.  For this reason data centers often test new software rigorously before putting them in a production environment.

Installation effort and downtime

Installing agents on a few workloads isn’t usually that big of a deal.  However in production environments this can be an issue, especially if the agent requires a reboot.  When environments get large, installing agents on hundreds or even thousands of workloads now becomes a cumbersome and tedious effort.  If a one-time migration is being done, then uninstalling the agents also becomes an issue.

Software agents should only be used when there is no other choice.  In the case of workload migration and DR, it is best to go with software that is agentless, where no software needs to be installed on the source or target infrastructure.

Protecting your Data Center in Public or Private Clouds

Cloud infrastructure.jpg

How Cloud is changing the face of Disaster Recovery?

By Sash Sunkara, CEO, RackWare Inc.

The advent of Cloud infrastructure, both public and private, has completely changed the way the data center operates.  It has increased data center agility, reliability, and scalability, while lowering costs.  There is one area, however, where Cloud can play a vital role which almost no one is taking advantage of today:  Availability.

Data center workloads are typically divided into two camps:  (1) Critical workloads and (2) Non-critical workloads.  Critical workloads, the ones that can’t tolerate even a few minutes of downtime are usually protected with real-time replication solutions that require a duplicate system that acts as a recovery instance should the production workload experience an outage.  Non-critical workloads can tolerate a wide range of outage times and are typically unprotected, or backed-up with image or tape archive solutions.  Cloud technology has introduced the possibility of an intermediate solution, where non-critical workloads can get the benefits of expensive and complex high availability solutions, such as failover, without the high cost.

There are 4 ways that Cloud infrastructure can be used to improve your data center’s availability today:

1.  Prevent Downtime by Reducing Resource Contention

Unplanned downtime occurs for many reasons but one of these reasons is due to processes fighting over resources (resource contention).  As businesses grow, the demand for resources usually grows with it proportionally.  Data center workloads are typically not architected to handle the variable demand, and outages may occur due to the inability to handle peak load times.  That’s where cloud scaling and cloud bursting come into play.  The Cloud has afforded data center managers a way to accommodate drastically changing demands on workloads by allowing the easy and automatic creation of additional workloads in the Cloud without changing or customizing its applications.  This is especially true of public clouds, since data centers can “burst” or extend their infrastructure into a public cloud when needed.  This alleviates resource contention through automatic scaling out to the Cloud when necessary and ensures that resources are available in to accommodate spikes in demand and prevent downtime and increase overall availability.

2.  Replicate workloads into the Cloud to create Asymmetric “Hot Backups”

Cloud infrastructure has created the wondrous ability to clone the complete workload stack (OS, Applications, Data).  When combined with a technology that can decouple the workload stack from underlying infrastructure, this “portable” workload can be imported into public or private clouds.  In the case of downtime on the production workload, sessions can reconnect to the Cloud instance where processing the service can resume, even if the production workloads and recovery workloads are on differing infrastructures.  The Cloud allows data centers to move beyond traditional ‘cold’ backups, where only data is being protected, which requires OS, and applications to be restored manually before data is restored.   The notion of the Asymmetric “Hot Backup” is made possible by the Cloud because every workload stored as an image can be “booted” into a live, functioning virtual machine which can take over workloads while the production server is being repaired.  This differs from the traditional replication solutions whereby a duplicate set of hardware is required to take over workloads should the production workload fail.  Changes that occur to the production instance are replicated into the recovery instance on a periodic basis to keep it up-to-date.   Cloud introduces the flexibility of having a flexible recovery instance to save on costs, since the hot backup can be “parked” when not in use, or a smaller instance can be provisioned.

3.  Introducing the concept of “Failover” and “Failback” typically reserved only for Critical Workloads

Software replication technology has existed for decades, used to protect and replicate in “real-time” between production and recovery workloads.   Typically these setups are extremely expensive from a software and services perspective as they often require a duplicate identical recovery setup, doubling the cost of maintaining and running costs of infrastructure.  Meanwhile, the rest of the workloads in the data center are under-protected, typically protected only by slow-to-restore images and tape schemes, which take days to restore, and much undue manual effort.    By automating the switching of users or processes from production to recovery instances, downtime can be reduced by up to 80% for the majority of under-protected workloads in the data center.

4.  Using Dissimilar Infrastructure for “Off-Premises” Redundancy

For added protection, data centers should consider dissimilar cloud infrastructure to use as part of their DR strategy as well.  Cloud infrastructure can be prone to failure, and for data centers that require extra level of protection, workloads should be replicated off-site to different cloud providers.  Physical-to-Physical, Physical-to-Cloud, or Cloud-to-Cloud replication can offer a level of protection that can be robust enough to overcome site-wide Denial-of-Service attacks, hacking, or natural disasters.

>