Tag Archives: Recovery

SIOS Offers SAP Certified High Availability And Disaster Recovery For SAP S/4HANA Environments In The Cloud





High availability is critical to many businesses that can’t afford any downtime. They need redundancy built into the applications themselves so that they can automatically recover in a matter of minutes. SIOS specializes in IT Resilience through intelligent application availability. It’s the first provider of Linux clustering services. SIOS recently announced the latest releases of SIOS LifeKeeper 9.4 and SIOS DataKeeper 9.4, at the SAP TechEd event. SIOS integrates with SAP to deliver overall availability protection through automation of setup, monitoring, and failure management within SAP environments. (TFiR)




Previous articlePlasma 5.17 Beta in openSUSE Tumbleweed
Next articleStallman: No radical changes in GNU Project

Swapnil Bhartiya has decades of experience covering emerging technologies and enterprise open source. His stories have appeared in a multitude of leading publications including CIO, InfoWorld, Network World, The New Stack, Linux Pro Magazine, ADMIN Magazine, HPE Insights, Raspberry Pi Geek Magazine, SweetCode, Linux For You, Electronics For You and more. He is also a science fiction writer and founder of TFiR.io.

Rewriting Disaster Recovery Plans for the Edge | IT Infrastructure Advice, Discussion, Community


Writing a disaster recovery plan has been the responsibility of IT departments for years, but now these plans must be recalibrated to failover for edge and cloud environments. What’s new, and how do organizations revise their plans?

Rule 1: IT does not control the edge

Given the adoption of edge computing and other distributed computing strategies, IT can’t control all of this distributed compute with a standard centralized DR plan that is built around the data center. In day to day manufacturing using robotics and automation, for example, it is line supervisors and manufacturing staff who run the robots and are responsible for making sure that these assets are safe and secure in locked areas when they are not in use. In many cases, these manufacturing personnel might also install and monitor/maintain the equipment themselves, or work with vendors.

These personnel do not have IT’s background in security or asset protection and maintenance/monitoring. At the same time, installing new edge networks and solutions outside of IT multiplies the number of IT assets where failures could occur. Somewhere, DR and failover plans need to be documented and trained for so these assets are covered. The most logical place for this to occur is within the IT DR and business continuity plan.

To revise the plan, IT must meet and work with these different distributed computing groups. The key is getting everyone involved and committed to documenting a DR and failover plan that they then participate in and test on a regular basis.

Read the rest of this article on Information Week.

Related Network Computing articles:

— Why You Should Consider Using Azure Backup

— Protecting Your IT Environment: Tips from the Trenches



Source link

Intel’s Iris Gallium3D Driver Working On Better GPU Recovery Handling


INTEL --

While Intel’s Iris Gallium3D driver is not enabled by default and considered still experimental in its support of Broadwell graphics and newer, in all of our tests thus far it’s been working out very well and haven’t encountered any hangs so far in our tested OpenGL workloads. But with no OpenGL driver being immune from potential GPU hangs, a patch series is pending to improve the GPU recovery heuristics.

Longtime open-source Intel Linux graphics developer Chris Wilson sent out a set of three patches this morning for handling of GPU recovery within the Iris driver. In particular, to opt-out of the Linux kernel’s automatic GPU recovery and replay. That approach doesn’t work out well for Iris where its batches are constructed incrementally and thus the replay following a reset would likely cause issues due to missing state. With this patch series, the Iris driver will instead re-construct a fresh context for the next batch when the kernel indicates a GPU hang.

The set of patches improving the GPU recovery behavior for the Iris driver in Mesa can be found here. The Iris driver is set to makes its initial debut in Mesa 19.1 due out around the end of May, still giving plenty of time for Intel graphics driver developers to make more improvements to this next-gen OpenGL driver ahead of its formal debut.


Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link

Disaster Recovery in the Public Cloud


Find out about the options for building highly available environments using public cloud providers, along with the benefits and tradeoffs.

I’ve had the opportunity to speak with many users about their plans for public cloud adoption; these discussions frequently revolve around how to avoid being impacted by potential cloud outages. Questions come up because public cloud outages do occur, even though they happen less frequently now than they may have in the past, and customers are concerned about mitigating the risk of disruption.

Thankfully, every major public cloud vendor offers options for building highly available environments that can survive some type of outage. AWS, for example, suggests four options that leverage multiple geographic regions. These options, which are also available with the other public cloud vendors, come with different price points and deliver different recovery point objectives (RPO) and different recovery time objectives (RTO).

 

Companies can choose the option that best meets their RPO/RTO requirements and budget. The key takeaway is that public cloud providers enable customers to build highly available solutions on their global infrastructure.

Let’s take a brief look at these options and review some basic principles for building highly available environments using the public cloud. I’ll use AWS for my examples, but the principles apply across all public cloud providers.

First, understand the recovery point objective (RPO) and recovery time objective (RTO) for each of your applications so you can design the right solution for each use case. Second, there’s no one-size-fits-all solution for leveraging multiple geographic regions. There are different approaches you can take depending on RPO, RTO, and the amount of cost you are willing and able to incur and the tradeoffs you are willing to make. Some of these approaches, using AWS as the example, include:

  • Recovering to another region from backups – Back up your environment to S3, including EBS snapshots, RDS snapshots, AMIs, and regular file backups. Since S3 only replicates data, by default, to availability zones within a single region, you’ll need to enable cross-region replication to your DR region. You’ll incur the cost of transferring and storing data in a second region but won’t incur compute, EBS, or database costs until you need to go live in your DR region. The trade-off is the time required to launch your applications.
  • Warm standby in another region – Replicate data to a second region where you’ll run a scaled-down version of your production environment. The scaled-down environment is always live and sized to run the minimal capacity needed to resume business. Use Route 53 to switch over to your DR region as needed. Scale up the environment to full capacity as needed. With this option, you get faster recovery, but incur higher costs.
  • Hot standby in another region – Replicate data to a second region where you run a full version of your production environment. The environment is always live, and invoking full DR involves switching traffic over using Route 53. You get even faster recovery, but also incur even higher costs.
  • Multi-region active/active solution – Data is synchronized between both regions and both regions are used to service requests. This is the most complex to set up and the most expensive. However, little or no downtime is suffered even when an entire region fails. While the approaches above are really DR solutions, this one is about building a true highly available solution.

One of the keys to a successful multi-region setup and DR process is to automate as much as possible. This includes backups, replication, and launching your applications. Leverage automation tools such Ansible and Terraform to capture the state of your environment and to automate launching of resources. Also, test repeatedly to ensure that you’re able to successfully recover from an availability zone or region failure. Test not only your tools, but your processes.

Obviously, much more can be said on this topic. If you are interested in learning more about disaster recovery in the cloud, you can see me in person at the upcoming Interop ITX 2018 in Las Vegas, where I will present, “Saving Your Bacon with the Cloud When Your Data Center Is on Fire.” 

Get live advice on networking, storage, and data center technologies to build the foundation to support software-driven IT and the cloud. Attend the Infrastructure Track at Interop ITX, April 30-May 4, 2018. Register now!

 



Source link