Tag Archives: Storage

Object Storage: 8 Things to Know

Object storage is one of the hottest technology trends, but it isn’t a particularly new idea: The concept surfaced in the mid-90s and by 2005 a number of alternatives had entered the market. Resistance from the entrenched file (NAS) and block (SAN) vendors, coupled with a new interface method, slowed adoption of object storage. Today, with the brilliant success of Amazon Web Services’ S3 storage system, object storage is here to stay and is making huge gains against older storage methods.

Object storage is well suited to the new data environment. Unstructured data, which includes large media files and so-called big data objects, is growing at a much faster rate than structured data and, overall, data itself is growing at a phenomenal rate.

Experience has taught us that traditional block systems become complex to manage at a relatively low scale, while the concept of creating a single pool of data breaks down as the number of appliances increases, especially if the pool crosses the boundaries of different equipment types. Filers have hierarchies of file folders which become cumbersome at scale, while today’s thousands of virtual instances make file-sharing systems clumsy.

An inherent design feature of object stores is distribution of objects across all of the storage devices, or at least into subsets if there is a large number of devices in the cluster. This removes a design weakness of the block/file approach, where failure in an appliance or in more than a single drive could cause either a loss of data availability or even loss of data itself.

Object stores typically use an algorithm such as CRUSH to spread chunks of a data object out in a known and predictable way. Coupling this with replication, and more recently with erasure coding, means that several nodes or drives can fail without materially impacting data integrity or access performance. The object approach also effectively parallelizes access to larger objects, since a number of nodes will all be transferring pieces of the object at the same time.

There are now a good number of software-only vendors today, all of which are installable on a wide variety of COTS hardware platforms. This includes the popular Ceph open source solution, backed by Red Hat. The combination of any of these software stacks and low-cost COTS gear makes object stores attractive on a price-per-terabyte basis, compared to traditional proprietary NAS or SAN gear.

Object storage is evolving to absorb the other storage models by offering a “universal storage” model where object, file and block access portals all talk to the same pool of raw object storage.  Likely, universal storage will deploy as object storage, with the other two access modes being used to create a file or block secondary storage to say all-flash arrays or filers. In the long term, universal storage looks to be the converging solution for the whole industry.

This trend is enhanced by the growth of software-defined storage (SDS). Object stores all run natively in a COTS standard server engine, which means the transition from software built onto an appliance to software virtualized into the instance pool is in most cases trivial. This is most definitely not the case for older proprietary NAS or SAN code. For object stores, SDS makes it possible to scale services such as compression and deduplication easily. It also opens up rich services such as data indexing.

Continue on to get up to speed on object storage and learn how it’s shaking up enterprise storage.

(Image: Kitch Bain/Shutterstock)

Source link

Big Data Storage: 7 Key Factors

Defining big data is actually more of a challenge than you might think. The glib definition talks of masses of unstructured data, but the reality is that it’s a merging of many data sources, both structured and structured, to create a pool of stored data that can be analyzed for useful information.

We might ask, “How big is big data?” The answer from storage marketers is usually “Big, really big!” or “Petabytes!”, but again, there are many dimensions to sizing what will be stored. Much big data becomes junk within minutes of being analyzed, while some needs to stay around. This makes data lifecycle management crucial. Add to that globalization, which brings foreign customers to even small US retailers. The requirements for personal data lifecycle management under the European Union General Data Protection Regulation go into effect in May 2018 and penalties for non-compliance are draconian, even for foreign companies, at up to 4% of global annual revenues per affected person.

For an IT industry just getting used to the term terabyte, storing petabytes of new data seems expensive and daunting. This would most definitely be the case with RAID storage array; in the past, an EMC salesman could retire on the commissions from selling the first petabyte of storage. But today’s drives and storage appliances have changed all the rules about the cost of capacity, especially where open source software can be brought into play.

In fact, there was quite a bit of buzz at the Flash Memory Summit in August about appliances holding one petabyte in a single 1U rack. With 3D NAND and new form factors like Intel’s “Ruler” drives, we’ll reach the 1 PB goal within a few months. It’s a space, power, and cost game changer for big data storage capacity.

Concentrated capacity requires concentrated networking bandwidth. The first step is to connect those petabyte boxes with NVMe over Ethernet, running today at 100 Gbps, but vendors are already in the early stages of 200Gbps deployment. This is a major leap forward in network capability, but even that isn’t enough to keep up with drives designed with massive internal parallelism.

Compression of data helps in many big data storage use cases, from removing repetitive images of the same lobby to repeated chunks of Word files. New methods of compression using GPUs can handle tremendous data rates, giving those petabyte 1U boxes a way of quickly talking to the world.

The exciting part of big data storage is really a software story. Unstructured data is usually stored in a key/data format, on top of traditional block IO, which is an inefficient method that tries to mask several mismatches. Newer designs range from extended metadata tagging of objects to storing data in an open-ended key/data format on a drive or storage appliance. These are embryonic approaches, but the value proposition seems clear.

Finally, the public cloud offers a home for big data that is elastic and scalable to huge sizes. This has the obvious value of being always right-sized to enterprise needs and AWS, Azure and Google have all added a strong list of big data services to match. With huge instances and GPU support, cloud virtual machines can emulate an in-house server farm effectively, and make a compelling case for a hybrid or public cloud-based solution.

Suffice to say, enterprises have a lot to consider when they map out a plan for big data storage. Let’s look at some of these factors in more detail.

(Images: Timofeev Vladimir/Shutterstock)

Source link

Choosing a Cloud Provider: 8 Storage Considerations

Amazon Web Services, Google, and Azure dominate the cloud service provider space, but for some applications it may make sense to choose a smaller provider specializing in your app class and able to deliver a finer-tuned solution. No matter which cloud provider you choose, it pays to look closely at the wide variety of cloud storage services they offer to make sure they will meet your company’s requirements.

There are two major classes of storage with the big cloud providers, which offer local instance storage with selected instances, as well as a selection of network storage options for permanent storage and sharing between instances.

As with any storage, performance is a factor in your decision-making process. There are many shared network storage alternatives, including storage tiers from really hot to freezing cold and within the top tiers, differences depending on choice of replica count, and variations in prices for copying data to other spaces.

The very hot tier is moving to SSD and even here there are differences between NVMe and SATA SSDs, which cloud tenants typically see as IOPS levels. For large instances and GPU-based instances, the faster choice is probably better, though this depends on your use case.

At the other extreme, the cold and “freezing” storage, the choices are disk or tape, which impacts data retrieval times. With tape, that can take as much as two hours, compared with just seconds for disk.

Data security and vendor reliability are two other key considerations when choosing a cloud provider that will store your enterprise data.  Continue on to get tips for your selection process.

(Image: Blackboard/Shutterstock)

Source link

Building a New Storage Roadmap

It’s safe to say that we haven’t had this much happening in enterprise data storage for three decades. Things are evolving on every front, from software to networks, from drive interfaces to drives themselves and the vendor landscape is rapidly changing, with many new players and some signs of struggle among the old leaders.

The Dell-EMC merger is one of the waves of change flooding through what had once been the steadiest and slowest-evolving segment of IT. Here was the giant of the storage industry recognizing that business fundamentals such as hardware platforms were becoming commodities and that failing to adopt a software and services worldview was a recipe for disaster.

Who would have thought the mighty RAID array would begin to lose market share so quickly? Likewise, even leading-edge pundits are surprised at the growth of public clouds, while the 100 TB 2.5 inch solid-state drives projected to arrive in 2018 have hard-drive makers a bit panicked, especially Seagate.

You might be taken aback if I say all these changes are  just a taste of what’s ahead. The next three or four years will see a much wider restructuring of storage, impacting what and how we store data in ways that will astonish, surprise and perhaps even scare you. The idea of fixed-size blocks of data has been in place for so long that it is a pillar of the storage faith. With some of the new technology, storage becomes byte-addressable and everything we know about storing an entry changes, from hardware to operating systems, compilers and applications.

Byte-addressability is on Intel’s Optane roadmap, so it’s real. Remember, Intel can do the CPU tweaks for space management; it owns the leading compilers and link editors, so that code can be created to a standard, and it has storage devices in the pipeline.  The result will be blindingly fast data storage. Instead of moving 4K bytes using a driver and SCSI software stack, data can be permanently stored with a single CPU command!

If all of this isn’t enough, servers and storage appliances are converging on a common design, where the storage bays of a server are sufficient for a set of very fast SSDs, which then can be accessed across the cluster as a pool of storage.

But there’s more! Instead of the hierarchical architectural model that has been around from the start of the computing era, new server designers, such as Gen-Z, place memory and storage at the same peer level as CPUs, GPUs, and networks on a shared fabric. Now all of these super-blocks can reach out over the fabric to other computers and read and write directly to their memory or storage. This is indeed a “pool of resources,” but managing it requires a new view of how resources are accessed and allocated.

Software-defined infrastructure is the new mantra for these virtualized systems. All the resources are virtual elements in a shared pool, with policy-driven orchestration managing the virtual resources and tying them to physical gear as needed.

Part of the SDI concept is the use of chainable microservices, with instances being created to host more copies of any service as needed to meet demand. With software services so divorced from the base hardware, the value of the system shifts to the services and the hardware becomes standardized, but very inexpensive. This underscores the wisdom of the Dell-EMC merger.

Let’s take a closer look at the changes ahead for enterprise data storage.

(Image: WIRACHAIPHOTO/Shutterstock)


Source link

14 Storage Startups Breaking New Ground

The advent of very high-performance, high-capacity SSDs, coupled with new interfaces such as NVMe over Fabrics and software-defined networking give storage a major creative boost. We are migrating from RAID systems to compact appliances that deliver storage, and, in the form of hyperconverged infrastructure (HCI) systems, compute as well.

A number of innovative storage startups are helping drive this evolution. The list of startups is growing fast and new companies appear out of stealth mode on a regular basis. These are not companies chasing existing business with a better mousetrap. Many of them have game-changing approaches to how enterprises will implement and manage storage in the future.

For some startups, the appliance and HCI models provide a standard COTS-based platform, which is essential to economies of scale and time to market. The resulting software is “portable” between platforms, with a consequently wider market opportunity, but also a tougher competitive environment.

Portability and scalability are enhanced by software-defined storage (SDS), which abstracts the code from underlying hardware platforms and operating systems. As SDS evolves —  it’s still in its early days — the agility provided by encapsulating storage microservices will create new ways to build storage software stacks, resulting in overall lower costs due to a competitive environment for each type of service.

Another focus of the current batch of storage startups is data management. Addressing data sprawl and migration between memory tiers will radically reduce storage costs, which will become critical  as we add flash/Optane tiers in the memory bus and move to all solid-state storage. Making data in disparate silos or clouds look like one pool is another issue that startups are addressing.

The challenges of replacing traditional SCSI-based networked storage to match the speed demands of solid-state drives and the performance/complexity of shared memory in the HCI model also is spawning startups. Here, the focus is on removing bottlenecks in the system and providing a mechanism for accessing the distributed storage pool. Given its complexity, this market is just emerging, but we can expect a good deal of future activity, especially with byte-addressable persistent memory on horizon.

All in all, this is a good time to be a storage startup. Click ahead to check out some of new companies worth watching as the storage industry continues its remarkable transformation.

(Image: ESB Professional/Shutterstock)

Source link