Tag Archives: Disaster

Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link

Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link

5 Disaster Recovery Tips: Learning from Hurricanes


Hurricanes Irma and Harvey highlight the need for DR planning to ensure business continuity.

 

This has been an awful year for natural disasters, and yet, we’re not even midway through a hurricane season that’s been particularly devastating. Hurricanes Irma and Harvey, and the flooding that ensued, has resulted in loss of life, extensive property damage, and crippled infrastructure..

Naturally, businesses have also been impacted. When it comes to applications, data and data centers, this is a wake-up call. At the same time, these are situations that motivate companies and individuals to introduce much-needed change. With this in mind, I’ll offer five tips any IT organization can use to become more resilient against natural disaster, no matter the characteristics of their systems and data centers. This can lead to better availability of critical data and tools when disaster strikes, continuity in serving customers, as well as peace of mind knowing preparations have been made and work can continue as expected.

1. Keep your people safe

When a natural disaster is anticipated (if there is notice), IT staffers need to focus on personal and family safety issues. Having to work late to take one more backup off-site shouldn’t be part of the last-minute process. Simply put, no data is worth putting lives at risk. If the rest of these tips are followed, IT staff won’t have to scramble in the heavy push of preparation to tie up loose ends of what already should be a resilient IT strategy.

2. Follow the 3-2-1 rule

In my role, I’ve long advocated the 3-2-1 rule, and we need to keep reiterating it: Have three different copies of important data saved, on two different media, one of these being off-site. Embrace this rule if you haven’t already. There are two additional key benefits of the 3-2-1 rule: It doesn’t require any specific technology and can address nearly any failure scenario.

3. 10 miles may not be enough

My third tip pertains to the off-site recommendation above. Many organizations believe the off-site copy or disaster recovery facility should be at least 10 miles away. This no longer may be sufficient; the path and fallout of a hurricane can be wide-reaching. Moreover, you want to avoid having personnel spend unnecessary time in a car traveling to complete the IT work. Cloud technologies can provide a more efficient and safer solution. This can involve using disaster recovery as a service (DRaaS) from a service provider or simply putting backups in the cloud.

4. Test your DR plan

Ensure that when a disaster plan is created there is particular focus on anticipating and eliminating surprises. This should involve regularly testing of backups to be certain they are completely recoverable, that the plan will function as expected and all data is where it needs to be (off-site, for example). The last thing you want during a disaster is to find that the plan hasn’t been completely implemented or run in months, or worse, discover there are workloads which are not recoverable.

5. Communications planning

My final recommendation is to work backwards in all required systems and with providers of all types to ensure you don’t have risks you can’t fix. Pay close attention to geography in relation to your own facilities, as well as country locations for data sovereignty considerations. This can apply to telecommunications providers, too. A critical component about response to any disaster is that organizations are able to communicate. Given what has happened in some locations in the path of Hurricane Irma, even cellular communication can be unreliable. Consider developing a plan to ensure communications in the interim if key business systems are down.

The recent flood and hurricane damage has been significant. The truth is, when it comes to the data, IT services, and more, there is a significant risk a business may never recover if it’s not adequately prepared. We live in a digitally transformed world and many businesses can’t operate without the availability of systems and data. These simple tips can bring about the resiliency companies need to effectively handle disasters, and prove their reliability to the customers they serve.

Rick Vanover is director of technical product marketing for Veeam Software.



Source link

5 Disaster Recovery Tips: Learning from Hurricanes


Hurricanes Irma and Harvey highlight the need for DR planning to ensure business continuity.

 

This has been an awful year for natural disasters, and yet, we’re not even midway through a hurricane season that’s been particularly devastating. Hurricanes Irma and Harvey, and the flooding that ensued, has resulted in loss of life, extensive property damage, and crippled infrastructure..

Naturally, businesses have also been impacted. When it comes to applications, data and data centers, this is a wake-up call. At the same time, these are situations that motivate companies and individuals to introduce much-needed change. With this in mind, I’ll offer five tips any IT organization can use to become more resilient against natural disaster, no matter the characteristics of their systems and data centers. This can lead to better availability of critical data and tools when disaster strikes, continuity in serving customers, as well as peace of mind knowing preparations have been made and work can continue as expected.

1. Keep your people safe

When a natural disaster is anticipated (if there is notice), IT staffers need to focus on personal and family safety issues. Having to work late to take one more backup off-site shouldn’t be part of the last-minute process. Simply put, no data is worth putting lives at risk. If the rest of these tips are followed, IT staff won’t have to scramble in the heavy push of preparation to tie up loose ends of what already should be a resilient IT strategy.

2. Follow the 3-2-1 rule

In my role, I’ve long advocated the 3-2-1 rule, and we need to keep reiterating it: Have three different copies of important data saved, on two different media, one of these being off-site. Embrace this rule if you haven’t already. There are two additional key benefits of the 3-2-1 rule: It doesn’t require any specific technology and can address nearly any failure scenario.

3. 10 miles may not be enough

My third tip pertains to the off-site recommendation above. Many organizations believe the off-site copy or disaster recovery facility should be at least 10 miles away. This no longer may be sufficient; the path and fallout of a hurricane can be wide-reaching. Moreover, you want to avoid having personnel spend unnecessary time in a car traveling to complete the IT work. Cloud technologies can provide a more efficient and safer solution. This can involve using disaster recovery as a service (DRaaS) from a service provider or simply putting backups in the cloud.

4. Test your DR plan

Ensure that when a disaster plan is created there is particular focus on anticipating and eliminating surprises. This should involve regularly testing of backups to be certain they are completely recoverable, that the plan will function as expected and all data is where it needs to be (off-site, for example). The last thing you want during a disaster is to find that the plan hasn’t been completely implemented or run in months, or worse, discover there are workloads which are not recoverable.

5. Communications planning

My final recommendation is to work backwards in all required systems and with providers of all types to ensure you don’t have risks you can’t fix. Pay close attention to geography in relation to your own facilities, as well as country locations for data sovereignty considerations. This can apply to telecommunications providers, too. A critical component about response to any disaster is that organizations are able to communicate. Given what has happened in some locations in the path of Hurricane Irma, even cellular communication can be unreliable. Consider developing a plan to ensure communications in the interim if key business systems are down.

The recent flood and hurricane damage has been significant. The truth is, when it comes to the data, IT services, and more, there is a significant risk a business may never recover if it’s not adequately prepared. We live in a digitally transformed world and many businesses can’t operate without the availability of systems and data. These simple tips can bring about the resiliency companies need to effectively handle disasters, and prove their reliability to the customers they serve.

Rick Vanover is director of technical product marketing for Veeam Software.



Source link

Top 3 Disaster Recovery Mistakes


Considering the high cost of IT downtime, disaster recovery planning is critical for every enterprise. According to a 2016 IHS report, downtime costs North American companies $700 billion a year. For a typical mid-size company, the average cost was around $1 million, while a large enterprise lost more than $60 million on average, IHS found.

Yet even with the stakes so high, companies can fall into common pitfalls when it comes to disaster recovery planning to mitigate the impact of service outages. GS Khalsa, senior technical marketing manager at VMware, said that he sees organizations making the same three mistakes over and over again.

1. Not having a DR plan

In Khalsa’s opinion, by far the biggest mistake that companies make — and one of the most common — is failing to put together any sort of disaster recovery plan at all. He said that industry statistics indicate that up to 50% of organizations haven’t done any DR planning.

That’s unfortunate because preparing for a disaster doesn’t have to be as complicated or as costly as most organizations assume. “It doesn’t have to involve any purchases,” Khalsa said in an interview. “It doesn’t have to involve anything more than a discussion with the business that this is what our DR plan is.”

Even if companies decide to do nothing more than restore from their latest nightly backup, they should at least write that plan down so that they know what to expect and what to do in case of an emergency, he added.

2. Not testing the DR plan

Coming up with a plan is just the first step. Organizations also need a way to test the plan. Unfortunately, in a traditional, non-virtualized data center, there isn’t an easy, non-disruptive way to conduct a recovery test. As a result, most companies test “infrequently, if at all,” Khalsa said.

He pointed out that having a virtualized environment eases testing. Organizations can copy their VMs and test their recovery processes on an isolated network. That way they can see how long recovery will take and find potential problems without interrupting ongoing operations.

3. Not understanding the complexity of DR

Organizations also sometimes underestimate how much work it takes to recover from a backup. Khalsa explained that some organizations expect to be able to do their restores manually, which really isn’t feasible once you have more than about 10 or 20 VMs.

He noted that sometimes IT staff will write their own scripts to automate the recovery process, but even that can be problematic. “People forget that disasters don’t just impact systems, they also potentially impact people,” Khalsa said. The person who wrote the script may not be available to come into work following a disaster, which could hamper the recovery process.

Khalsa’s No. 1 tip for organizations involved in DR planning is for IT to communicate clearly with the business. Management and executives need to understand the recovery point objective (RPO) and recovery time objective (RTO) options and make some decisions about the acceptable level of risk.

“More communication is better,” Khalsa said.

Hear more about disaster recovery planning from GS Khalsa live and in person at Interop ITX, where he will present, “Disaster Recovery In The Virtualized Data Center.” Register now for Interop ITX, May 15-19, in Las Vegas.



Source link