Tag Archives: Data

How Spectre and Meltdown Impact Data Center Storage

IT news over the last few weeks has been dominated by stories of vulnerabilities found in Intel x86 chips and almost all modern processors. The two exposures, Spectre and Meltdown, are a result of the speculative execution that all CPUs use to anticipate the flow of execution of code and ensure that internal instruction pipelines are filled as optimally as possible. It’s been reported that Spectre/Meltdown can have an impact on I/O and that means storage products could be affected. So, what are the impacts and what should data center operators and storage pros do?

Speculative execution

Speculative execution is a performance-improvement process used by modern processors where instructions are executed before the processor knows whether they will be needed. Imagine some code that branches as the result of a logic comparison. Without speculative execution, the processor needs to wait for the completion of that logic comparison before continuing to read ahead, resulting in a drop in performance. Speculative execution allows both (or all) branches of the logic to be followed; those that aren’t executed are simply discarded and the processor is kept active.

Both Spectre and Meltdown pose the risk of unauthorized access to data in this speculative execution process. A more detailed breakdown of the problem is available in two papers covering the vulnerabilities (here and here). Vendors have released O/S and BIOS workarounds for the exposures. Meltdown fixes have noticeably impacted performance on systems with high I/O activity due to the extra code needed to isolate user and system memory during context switches (syscalls). Reports range from 5%-50% additional CPU overhead, depending on the specific platform and workload.

Storage repercussions

How could this impact storage appliances and software? Over the last few years, almost all storage appliances and arrays have migrated to the Intel x86 architecture. Many are now built on Linux or Unix kernels and that means they are directly impacted by the processor vulnerabilities, which if patched, result in increased system load and higher latency.

Software-defined storage products are also potentially impacted, as they run on generic operating systems like Linux and Windows. The same applies for virtual storage appliances run in VMs and hyperconverged infrastructure, and of course either public cloud storage instances or high-intensity I/O cloud applications. Quantifying the impact is difficult as it depends on the amount of system calls the storage software has to make. Some products may be more affected than others.  

Vendor response

Storage vendors have had mixed responses to the CPU vulnerabilities. For appliances or arrays that are deemed to be “closed systems” and not able to run user code, their stance is that these systems are unaffected and won’t be patched.

Where appliances can run external code like Pure Storage’s FlashArray, which can execute user code via a feature called Purity Run, there will be a need to patch. Similarly, end users running SDS solutions on generic operating systems will need to patch. HCI and hypervisor vendors have already started to make announcements about patching, although the results have been varied. VMware for instance, released a set of patches only to recommend not installing them due to customer issues. Intel’s advisory earlier this week warning of problems with its patches has added to the confusion.

Some vendors such as Dell EMC haven’t made public statements about the impact of the vulnerabilities for all of their products. For example, Dell legacy storage product information is openly available, while information about Dell EMC products is only available behind support firewalls. I guess if you’re a user of those platforms, then you will have access, however, for wider market context it would have been helpful to see a consolidated response in order to assess the risk.


So far, the patches released don’t seem to be very stable. Some have been withdrawn, others have crashed machines or made them unbootable. Vendors are in a difficult position, because the details of the vulnerabilities weren’t widely circulated in the community before they subsequently were made public. Some storage vendors only found out about the issue when the news broke in the press. This means some of the patches may be being rushed to market without full testing of the impact when they are applied.

To patch or not?

What should end users do? First, it’s worth evaluating the risk and impact of either applying or not applying patches. Computers that are regularly exposed to the internet like desktops and public cloud instances (including virtual storage appliances running in a cloud instance)) are likely to be most at risk, whereas storage appliances behind a corporate firewall on a dedicated storage management network are at lowest risk. Measure this risk against the impact of applying the patches and what could go wrong. Applying patches to a storage platform supporting hundreds or thousands of users, for example, is a process that needs thinking through.

Action plan

Start by talking to your storage vendors. Ask them why they believe their platforms are exposed or not. Ask what testing of patching has been performed, from both a stability and performance perspective. If you have a lab environment, do some before/after testing with standard workloads. If you don’t have a lab, ask your vendor for support.

As there are no known exploits in the wild for Spectre/Meltdown, a wise approach is probably to wait a little before applying patches. Let the version 1 fixes be tested in the wild by other folks first. Invariably issues are found that then get corrected by another point release. Waiting a little also gives time for vendors to develop more efficient patches, rather than ones that simply act as a workaround. In any event, your approach will depend on your particular set of circumstances.

Source link

8 Ways Data Center Storage Will Change in 2018

The storage industry was on a roller coaster in 2017, with the decline of traditional SAN gear offset by enterprise interest in hyperconverged infrastructure, software-only solutions, and solid-state drives. We have seen enterprises shift from hard disks to solid-state as the boost in performance with SSDs transforms data center storage.

2018 will build on these trends and also add some new items to the storage roadmap. SSD is still evolving rapidly on four fronts:  core technology, performance, capacity and price. NVMe has already boosted flash IOPS and GB per second into the stratosphere and we stand on the brink of mainstream adoption of NVMe over Ethernet, with broad implications for how storage systems are configured going forward.

Vendors are shipping 32TB SSDs, leaving the largest HDD far behind at 16TB. With 3D die technology hitting its stride, we should see 50TB and 100TB drives in 2018, especially if 4-bit storage cells hit their goals. Much of the supply shortage in flash die is behind us, and prices should begin to drop again, though demand may grow faster than expected and slow the price drop.

Outside of the drives themselves, RAID arrays are in trouble. With an inherent performance bottleneck in the controller design, handling more than a few SSDs is a real challenge. Meanwhile, small storage appliances, which are essentially inexpensive commercial off-the-shelf servers, meet the need of object stores and hyperconverged nodes. This migration is fueled by startups like Excelero, which connect drives directly to the cluster fabric at RDMA speeds using NVMe over Ethernet.

A look at recent results reflects the industry’s shift to COTS. With the exception of NetApp, traditional storage vendors are experiencing single-digit revenue growth, while original design manufacturers, which supply huge volumes of COTS to cloud providers, are collectively seeing growth of 44%. Behind that growth is the increasing availability of unbundled storage software. The combination of cheap storage platforms and low-cost software is rapidly commoditizing the storage market. This trend will accelerate in 2018 as software-defined storage (SDS) begins to shape the market.

SDS is a broad concept, but inherently unbundles control and service software from hardware platforms. The concept has been very successful in networking and in cloud servers, so extending it to storage is not only logical, but required. We’ll see more SDS solutions and competition in 2018 than we’ve had in any year of the last decade.

NVMe will continue to replace SAS and SATA as the interface for enterprise drives. Over and above the savings in CPU overhead that it brings, NVMe supports new form-factor drives. We can expect 32TB+ SSDs in a 2.5 inch size in 2018, as well as servers using M.2 storage variants.

This has massive implications. Intel has showcased an M.2 “ruler” blade drive with 33+ TB capacities that can be mounted in a 1U server with 32 slots. That gives us a 1 Petabyte, ultra-fast 1U storage solution. Other vendors are talking up similar densities, signaling an important trend. Storage boxes will get smaller, hold huge capacities, and, due to SSD speed, outperform acres of HDD arrays. You’ll be able to go to the CIO and say, “I  really can shrink the data center!”

There’s more, though! High-performance SSDs enable deduplication and compression of data as an invisible background job. The services doing this use the excess bandwidth of the storage drives. For most commercial use cases, the effective capacity is multiplied 5X or more compared with raw capacity. Overall, compression reduces the number of small appliances needed, making SSD storage much cheaper overall than hard drives.

Let’s delve into the details of all these storage trends we can expect to see in the data center this year.

(Image: Olga Salt/Shutterstock)

Source link

Software-Defined Data Centers: VMware Designs

These are best practices and proven practices for how a design for all components in the SDDC might look. It will highlight a possible cluster layout, including a detailed description of what needs to be put where, and why a certain configuration needs to be made.

Typically, every design should have an overview to quickly understand what the solution is going to look like and how the major components are related. In the SDDC one could start drawing the vSphere Clusters, including their functions.

Logical overview of the SDDC clusters

This following image describes an SDDC that is going to be run on the three-cluster approach:


The three clusters are as follows:

  • The management cluster for all SDDC managing services
  • The NSX edge cluster where all the north-south network traffic is flowing through
  • The actual payload cluster where the production VMs get deployed

Tip: Newer best practices from VMware, as described in the VMware validated designs (VVD) version 3.0, also propose a two-cluster approach. In this case, the edge cluster is not needed anymore and all edge VMs are deployed directly onto the payload cluster. This can be a better choice from a cost and scalability perspective. However, it is important to choose the model according to the requirements and constraints found in the design.

The overview should be only as complex as necessary since its purpose is to give a quick impression over the solution and its configuration. Typically, there are a few of these overviews for each section.

This forms a basic SDDC design where the edge and the management cluster are separated. According to the latest VMware best practices, payload and edge VMs can also run on the same cluster. This basically is a decision based on scale and size of the entire environment. Often it is also a decision based on a limit or a requirement — for example, edge hosts need to be physically separated from management hosts.

Logical overview of solution components

This is as important as the cluster overview and should describe the basic structure of the SDDC components, including some possible connections to third-party integration like IPAM.

Also, it should provide a basic understanding for the relationship between the different solutions.


It is important to have an understanding of these components and how they work together. This will become important during the deployment of the SDDC since none of these components should be left out or configured wrong. For the vRealize Log Insight connects, that is especially important.

Note: If not all components are configured to send their logs into vRealize Log Insight, there will be gaps, which can make troubleshooting very difficult or even impossible. A plan, which describes the relation, can be very helpful during this step of the SDDC configuration.

These connections should also be reflected in a table to show the relationship and confirm that everything has been set up correctly. The better the detail is in the design, the lower the chance that something gets configured wrong or is forgotten during the installation.

The vRealize Automation design

Based on the use case, there are two setup methods/designs vRealize Automation 7 supports when being installed.

Small: Small stands for a very dense and easy-to-deploy design. It is not recommended for any enterprise workloads or even for production. But it is ideal for a proof of concept (PoC) environment, or for a small dev/test environment to play around with SDDC principles and functions.

The key to the small deployment is that all the IaaS components can reside on one single Windows VM. Optionally, there can be additional DEMs attached which eases future scale. However, this setup has one fundamental disadvantage: There is no built-in resilience or HA for the portal or DEM layer. This means that every glitch in one of these components will always affect the entire SDDC.

Enterprise: Although this is a more complex way to install vRealize Automation, this option will be ready for production use cases and is meant to serve big environments. All the components in this design will be distributed across multiple VMs to enable resiliency and high availability.


In this design, the vRealize Automation OVA (vApp) is running twice. To enable true resilience a load balancer needs to be configured. The users access the load balancer and get forwarded to one of the portals. VMware has good documentation on configuring NSX as a load balancer for this purpose, as well as the F5 load balancer. Basically, any load balancer can be used, as long as it supports HTML protocol checks.

Note: DNS alias or MS load-balancing should not be used for this, since these methods cannot prove if the target server is still alive. According to VMware, there are checks required for the load balancer to understand if each of the vRA Apps is still available. If these checks are not implemented, the user will get an error while trying to access the broken vRA

In addition to the vRealize Automation portal, there has to be a load balancer for the web server components. Also, these components will be installed on a separate Windows VM. The load balancer for these components has the same requirements as the one for the vRealize Automation instances.

The active web server must only contain one web component of vRA, while the second (passive) web server can contain component 2, 3, and more.

Finally, the DEM workers have to be doubled and put behind a load balancer to ensure that the whole solution is resilient and can survive an outage of any one of the components.

Tip: If this design is used, the VMs for the different solutions need to run on different ESXi hosts in order to guarantee full resiliency and high availability. Therefore, VM affinity must be used to ensure that the DEMs, web servers or vRA appliances never run on the same ESXi host. It is very important to set this rule, otherwise, a single ESXi outage might affect the entire SDDC.

This is one of VMware’s suggested reference designs in order to ensure vRA availability for users requesting services. Although it is only a suggestion it is highly recommended for a production environment. Despite all the complexity, it offers the highest grade of availability and ensures that the SDDC can stay operative even if the management stack might have troubles.

Tip: vSphere HA cannot deliver this grade of availability since the VM would power off and on again. This can be harmful in an SDDC environment. Also, to bring back up operations, the startup order is important. Since HA can’t really take care of that, it might power the VM back on at a surviving host, but the SDDC might still be unusable due to connection errors (wrong order, stalled communication, and so on).

Once the decision is made for one of these designs, it should be documented as well in the setup section. Also, take care that none of the limits, assumptions, or requirements are violated with that decision.

Another mechanism of resiliency is to ensure that the required vRA SQL database is configured as an SQL cluster. This would ensure that no single point of failure could affect this component. Typically, big organizations have already some form of SQL cluster running, where the vRA database could be installed. If this isn’t a possibility, it is strongly recommended to set up such a cluster in order to protect the database as well. This fact should be documented in the design as a requirement when it comes to the vRA installation.

This tutorial is a chapter excerpt from “Building VMware Software-Defined Data Centers” by Valentin Hamburger. Use the code ORSCP50 at checkout to save 50% on the recommended retail price until Dec. 15.

Source link

10 Silly Data Center Memes

[Security Breach Report] Overall Impact of & Steps to Prevent Breaches

Despite the escalation of cybersecurity staffing and technology, enterprises continue to suffer data breaches and compromises at an alarming rate. How do these breaches occur? How are enterprises responding, and what is the impact of these compromises on the business? This report offers new data on the frequency of data breaches, the losses they cause, and the steps that organizations are taking to prevent them in the future.


Source link

Big Data Storage: 7 Key Factors

Defining big data is actually more of a challenge than you might think. The glib definition talks of masses of unstructured data, but the reality is that it’s a merging of many data sources, both structured and structured, to create a pool of stored data that can be analyzed for useful information.

We might ask, “How big is big data?” The answer from storage marketers is usually “Big, really big!” or “Petabytes!”, but again, there are many dimensions to sizing what will be stored. Much big data becomes junk within minutes of being analyzed, while some needs to stay around. This makes data lifecycle management crucial. Add to that globalization, which brings foreign customers to even small US retailers. The requirements for personal data lifecycle management under the European Union General Data Protection Regulation go into effect in May 2018 and penalties for non-compliance are draconian, even for foreign companies, at up to 4% of global annual revenues per affected person.

For an IT industry just getting used to the term terabyte, storing petabytes of new data seems expensive and daunting. This would most definitely be the case with RAID storage array; in the past, an EMC salesman could retire on the commissions from selling the first petabyte of storage. But today’s drives and storage appliances have changed all the rules about the cost of capacity, especially where open source software can be brought into play.

In fact, there was quite a bit of buzz at the Flash Memory Summit in August about appliances holding one petabyte in a single 1U rack. With 3D NAND and new form factors like Intel’s “Ruler” drives, we’ll reach the 1 PB goal within a few months. It’s a space, power, and cost game changer for big data storage capacity.

Concentrated capacity requires concentrated networking bandwidth. The first step is to connect those petabyte boxes with NVMe over Ethernet, running today at 100 Gbps, but vendors are already in the early stages of 200Gbps deployment. This is a major leap forward in network capability, but even that isn’t enough to keep up with drives designed with massive internal parallelism.

Compression of data helps in many big data storage use cases, from removing repetitive images of the same lobby to repeated chunks of Word files. New methods of compression using GPUs can handle tremendous data rates, giving those petabyte 1U boxes a way of quickly talking to the world.

The exciting part of big data storage is really a software story. Unstructured data is usually stored in a key/data format, on top of traditional block IO, which is an inefficient method that tries to mask several mismatches. Newer designs range from extended metadata tagging of objects to storing data in an open-ended key/data format on a drive or storage appliance. These are embryonic approaches, but the value proposition seems clear.

Finally, the public cloud offers a home for big data that is elastic and scalable to huge sizes. This has the obvious value of being always right-sized to enterprise needs and AWS, Azure and Google have all added a strong list of big data services to match. With huge instances and GPU support, cloud virtual machines can emulate an in-house server farm effectively, and make a compelling case for a hybrid or public cloud-based solution.

Suffice to say, enterprises have a lot to consider when they map out a plan for big data storage. Let’s look at some of these factors in more detail.

(Images: Timofeev Vladimir/Shutterstock)

Source link