Rethinking disaster recovery for the cloud

DR practices that were designed for in-house computing are out of sync with the cloud world. If you haven't already revised your DR plan for cloud computing, now's the time.

On September 4, 2018, a cooling problem in a data center created an outage for Microsoft Azure cloud that impacted companies across the south central United States. “Azure was down for most of one business day,” said one IT professional. “Though we’re a nationwide company, all of our traffic goes through Dallas, Texas, so the entire company was affected. It caused a slowdown in many of our business processes.”

As a leading public cloud services provider, Azure is not alone when it comes to outages. Google cloud and Amazon AWS have also experienced outages that have adversely affected their corporate clients.

If you haven't already revised your DR plan for cloud-based computing, you need do it now.

Rethinking DR

“We really haven’t thought about modifying our DR plan until now,” said an IT manager at a west coast financial services firm. “When we went back over our contracts with cloud vendors, we discovered that almost all of the contracts contained disclaimer clauses saying that the cloud providers would not be responsible for service or data recovery SLAs if a disaster occurred. That really concerned us.”

The plot thickens further for companies using software as a service (SaaS) vendors that in turn rely on third party cloud providers to host their services.

What happens when the third-party cloud provider the SaaS company is using experiences an outage in its data center? “In that case, which is highly unlikely, we would simply put the client in touch with our cloud provider,” said one California SaaS company executive.

Unfortunately, finding yourself face to face with a third party you have no contract with and that you don't even know, is not a good position to find yourself in if you are experiencing a disaster.

In the cloud, you have to think differently. DR practices that were designed for in-house computing are out of sync with the cloud world, where strategies such as replication of systems and data, cooperative testing with vendors, and even failover to alternate vendors need to be considered.

Here are seven recommended best practices for revising your DR plan for the cloud.

On September 4, 2018, a cooling problem in a data center created an outage for Microsoft Azure cloud that impacted companies across the south central United States. “Azure was down for most of one business day,” said one IT professional. “Though we’re a nationwide company, all of our traffic goes through Dallas, Texas, so the entire company was affected. It caused a slowdown in many of our business processes.”

As a leading public cloud services provider, Azure is not alone when it comes to outages. Google cloud and Amazon AWS have also experienced outages that have adversely affected their corporate clients.

If you haven't already revised your DR plan for cloud-based computing, you need do it now.

Rethinking DR

“We really haven’t thought about modifying our DR plan until now,” said an IT manager at a west coast financial services firm. “When we went back over our contracts with cloud vendors, we discovered that almost all of the contracts contained disclaimer clauses saying that the cloud providers would not be responsible for service or data recovery SLAs if a disaster occurred. That really concerned us.”

The plot thickens further for companies using software as a service (SaaS) vendors that in turn rely on third party cloud providers to host their services.

What happens when the third-party cloud provider the SaaS company is using experiences an outage in its data center? “In that case, which is highly unlikely, we would simply put the client in touch with our cloud provider,” said one California SaaS company executive.

Unfortunately, finding yourself face to face with a third party you have no contract with and that you don't even know, is not a good position to find yourself in if you are experiencing a disaster.

In the cloud, you have to think differently. DR practices that were designed for in-house computing are out of sync with the cloud world, where strategies such as replication of systems and data, cooperative testing with vendors, and even failover to alternate vendors need to be considered.

Here are seven recommended best practices for revising your DR plan for the cloud.

1. Regularly backup and replicate your systems and data

“There is just huge exposure now with cloud computing that companies haven’t thought about,” said Michael Flavin, director of sales at Saalex IT, a network infrastructure company. “One of the ways that companies can protect themselves against a cloud outage is by maintaining a secure backup of their systems and data offsite that they can failover to. This can be accomplished by regularly replicating your data to this second backup data center.”

2. Understand the order in which you restore systems during an outage

In the old central data center days, it was relatively uncomplicated to determine which systems had to be restored first during an outage, and which came after. What made this easier to determine was the fact that all of these systems were under your own direct control.

This isn’t the case with hybrid computing, where applications and data can move from one cloud to another, or between clouds and your in-house data center.

“When clients come to us, one of the first things we do is to sit down with them and determine which systems need to be restored first,” said Derrin Rummelt, director of cloud engineering and R&D for U.S. Signal, a hybrid IT solutions provider. “Then we perform testing to ensure that recovery actually works.”

It’s critical to know the order of restoration and also where different systems and groups of data operate and are stored—because in some cases, it might be necessary to reach out to another cloud or the internal data center to complete system transactions. If even one of these resources is unavailable, your recovery is in jeopardy. This becomes more complicated as applications and data are modified, because additional risk is introduced when organizations fail to retest the new modifications. As a consequence, recoveries no longer work.

3. Test your DR regularly

Even if your systems and data remain relatively unchanged, there are always risks that new changes are introduced into the infrastructures and platforms that your cloud vendors use and that can impact the performance of your own systems and data. The only way to safeguard against this is to annually test your DR plans with your cloud vendors to ensure that a recovery really does work.

“A company can be using multiple SaaS, PaaS and IaaS cloud platforms in its IT,” said Saalex’s Flavin. “By regularly testing these systems, even through replication, you can assure that recovery works in each cloud scenario.”

Can organizations realistically take this task on?

“We recently conducted a survey of companies, and 34% said that they tested their disaster recovery plans annually,” said U.S.Signal’s EVP of products and services, Amanda Regnerus. “30% said that they tested their DR plans every six months, and 40% said that they tested their DR plans every two or more years. That 40% is concerning.”

4. Define your DR goals

With continuous replication technology and disaster recovery specialization driving the growth of more disaster recovery as a service (DRaaS) firms, the good news for companies as they plan DR for their hybrid computing environments is that there is available help. However, none of this help is very effective if you don't define your disaster recovery objectives.

“What we advise companies is that they aim for a sub 30-second recovery point objective (RPO) for their data and a recovery time objective (RTO) of anywhere between several minutes to one hour, depending upon the size of their IT environment and the types of workloads they are running,” said Steve Blow, technology evangelist at Zerto, which provides virtual replication services.

5. Manage your vendor relations

“In many ways, we haven't managed our vendor relations well,” acknowledged one west coast IT manager. “We haven't read the contract, we haven’t met with the vendor about SLAs, we’ve never tested DR with them, although we know that they maintain data centers around the country.”

This manager’s not alone. Unless you are a large enterprise with a dedicated contract management staff, your already over-taxed IT staff probably has trouble following up with vendors or taking the time to maintain sound relationships with vendors that could be instrumental in disaster recovery planning and execution.

“One of the things that we did with our cloud provider was to meet annually with them. We also confer with them between times and even meet onsite with them to define mutual strategies and to discuss concerns,” said Benjamin Baghdadi, CTO at Island Pacific, a SaaS company that serves the retail sector. “This has really helped us achieve a close and cooperative working relationship with our cloud provider.  We know that they will be responsive in a disaster.”

6. Choose SaaS vendors that own and operate their own data centers

When you interview cloud vendors for SaaS solutions, a key point on your RFP should be whether they own and operate their own cloud data centers. SaaS operators that both own and operate the cloud that their solutions run on are a much better bet in a disaster recovery scenario, because they have full accountability for a failure on their end should disruption of service occur—and you have only one contact point to worry about.

7. Manage your risk

The last element of aligning your DR plan for a hybrid cloud environment is risk management.

When one IT professional was asked about how his management was assessing the risk of going to the cloud, he said, “I do think upper management is weighing risk vs. cost very carefully, but maybe leaning toward cost savings.”

The statement is borne out by 2017 survey results that revealed that two out of three companies were adopting cloud computing primarily because of perceived cost savings.

This underscores why any cloud computing strategy must also include clear communications to C-level management and the board of your organization that a movement of compute to the cloud also carries new risks that having total control of your own data center doesn't have—especially when it pertains to disaster recovery.

Your management will feel more secure about your cloud computing strategy if they know that you are cognizant of the risks, and that you have realigned your disaster recovery plan accordingly.

This story, "Rethinking disaster recovery for the cloud" was originally published by CIO.