Tag Archives: Public

Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link

Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link

Data Protection in the Public Cloud: 6 Steps


While cloud security remains a top concern in the enterprise, public clouds are likely to be more secure than your private computing setup. This might seem counter-intuitive, but cloud service providers have a leverage of scale that allows them to spend much more on security tools than any large enterprise, while the cost of that security is diluted across millions of users to fractions of a cent.

That doesn’t mean enterprises can hand over all responsibility for data security to their cloud provider. There are still many basic security steps companies need to take, starting with authentication. While this applies to all users, it’s particularly critical for sysadmins. A password compromise on their mobiles could be the equivalent of handing over the corporate master keys. For the admin, multi-factor authentication practices are critical for secure operations. Adding biometrics using smartphones is the latest wave in the second or third part of that authentication; there are a lot of creative strategies!

Beyond guarding access to cloud data, what about securing the data itself? We’ve heard of major data exposures occurring when a set of instances are deleted, but the corresponding data isn’t. After a while, these files get loose and can lead to some interesting reading. This is pure carelessness on the part of the data owner.

There are two answers to this issue. For larger cloud setups, I recommend a cloud data manager that tracks all data and spots orphan files. That should stop the wandering buckets, but what about the case when a hacker gets in, by whatever means, and can reach useful, current data? The answer, simply, is good encryption.

Encryption is a bit more involved than using PKZIP on a directory. AES-256 encryption or better is essential. Key management is crucial; having one admin with the key is a disaster waiting to happen, while writing down on a sticky note is going to the opposite extreme. One option offered by cloud providers is drive-based encryption, but this fails on two counts. First, drive-based encryption usually has only a few keys to select from and, guess what, hackers can readily access a list on the internet. Second, the data has to be decrypted by the network storage device to which the drive is attached. It’s then re-encrypted (or not) as it’s sent to the requesting server. There are lots of security holes in that process.

End-to-end encryption is far better, where encryption is done with a key kept in the server. This stops downstream security vulnerabilities from being an issue while also adding protection from packet sniffing.

Data sprawl is easy to create with clouds, but opens up another security risk, especially if a great deal of cloud management is decentralized to departmental computing or even users. Cloud data management tools address this much better than written policies. It’s also worthwhile considering adding global deduplication to the storage management mix. This reduces the exposure footprint considerably.

Finally, the whole question of how to backup data is in flux today. Traditional backup and disaster recovery has moved from in-house tape and disk methods to the cloud as the preferred storage medium. The question now is whether a formal backup process is the proper strategy, as opposed to snapshot or continuous backup systems. The snapshot approach is growing, due to the value of small recovery windows and limited data loss exposure, but there may be risks from not having separate backup copies, perhaps stored in different clouds.

On the next pages, I take a closer look at ways companies can protect their data when using the public cloud.

(Image: phloxii/Shutterstock)



Source link