Tag Archives: Factors

Big Data Storage: 7 Key Factors


Defining big data is actually more of a challenge than you might think. The glib definition talks of masses of unstructured data, but the reality is that it’s a merging of many data sources, both structured and structured, to create a pool of stored data that can be analyzed for useful information.

We might ask, “How big is big data?” The answer from storage marketers is usually “Big, really big!” or “Petabytes!”, but again, there are many dimensions to sizing what will be stored. Much big data becomes junk within minutes of being analyzed, while some needs to stay around. This makes data lifecycle management crucial. Add to that globalization, which brings foreign customers to even small US retailers. The requirements for personal data lifecycle management under the European Union General Data Protection Regulation go into effect in May 2018 and penalties for non-compliance are draconian, even for foreign companies, at up to 4% of global annual revenues per affected person.

For an IT industry just getting used to the term terabyte, storing petabytes of new data seems expensive and daunting. This would most definitely be the case with RAID storage array; in the past, an EMC salesman could retire on the commissions from selling the first petabyte of storage. But today’s drives and storage appliances have changed all the rules about the cost of capacity, especially where open source software can be brought into play.

In fact, there was quite a bit of buzz at the Flash Memory Summit in August about appliances holding one petabyte in a single 1U rack. With 3D NAND and new form factors like Intel’s “Ruler” drives, we’ll reach the 1 PB goal within a few months. It’s a space, power, and cost game changer for big data storage capacity.

Concentrated capacity requires concentrated networking bandwidth. The first step is to connect those petabyte boxes with NVMe over Ethernet, running today at 100 Gbps, but vendors are already in the early stages of 200Gbps deployment. This is a major leap forward in network capability, but even that isn’t enough to keep up with drives designed with massive internal parallelism.

Compression of data helps in many big data storage use cases, from removing repetitive images of the same lobby to repeated chunks of Word files. New methods of compression using GPUs can handle tremendous data rates, giving those petabyte 1U boxes a way of quickly talking to the world.

The exciting part of big data storage is really a software story. Unstructured data is usually stored in a key/data format, on top of traditional block IO, which is an inefficient method that tries to mask several mismatches. Newer designs range from extended metadata tagging of objects to storing data in an open-ended key/data format on a drive or storage appliance. These are embryonic approaches, but the value proposition seems clear.

Finally, the public cloud offers a home for big data that is elastic and scalable to huge sizes. This has the obvious value of being always right-sized to enterprise needs and AWS, Azure and Google have all added a strong list of big data services to match. With huge instances and GPU support, cloud virtual machines can emulate an in-house server farm effectively, and make a compelling case for a hybrid or public cloud-based solution.

Suffice to say, enterprises have a lot to consider when they map out a plan for big data storage. Let’s look at some of these factors in more detail.

(Images: Timofeev Vladimir/Shutterstock)



Source link

Big Data Storage: 7 Key Factors


Defining big data is actually more of a challenge than you might think. The glib definition talks of masses of unstructured data, but the reality is that it’s a merging of many data sources, both structured and structured, to create a pool of stored data that can be analyzed for useful information.

We might ask, “How big is big data?” The answer from storage marketers is usually “Big, really big!” or “Petabytes!”, but again, there are many dimensions to sizing what will be stored. Much big data becomes junk within minutes of being analyzed, while some needs to stay around. This makes data lifecycle management crucial. Add to that globalization, which brings foreign customers to even small US retailers. The requirements for personal data lifecycle management under the European Union General Data Protection Regulation go into effect in May 2018 and penalties for non-compliance are draconian, even for foreign companies, at up to 4% of global annual revenues per affected person.

For an IT industry just getting used to the term terabyte, storing petabytes of new data seems expensive and daunting. This would most definitely be the case with RAID storage array; in the past, an EMC salesman could retire on the commissions from selling the first petabyte of storage. But today’s drives and storage appliances have changed all the rules about the cost of capacity, especially where open source software can be brought into play.

In fact, there was quite a bit of buzz at the Flash Memory Summit in August about appliances holding one petabyte in a single 1U rack. With 3D NAND and new form factors like Intel’s “Ruler” drives, we’ll reach the 1 PB goal within a few months. It’s a space, power, and cost game changer for big data storage capacity.

Concentrated capacity requires concentrated networking bandwidth. The first step is to connect those petabyte boxes with NVMe over Ethernet, running today at 100 Gbps, but vendors are already in the early stages of 200Gbps deployment. This is a major leap forward in network capability, but even that isn’t enough to keep up with drives designed with massive internal parallelism.

Compression of data helps in many big data storage use cases, from removing repetitive images of the same lobby to repeated chunks of Word files. New methods of compression using GPUs can handle tremendous data rates, giving those petabyte 1U boxes a way of quickly talking to the world.

The exciting part of big data storage is really a software story. Unstructured data is usually stored in a key/data format, on top of traditional block IO, which is an inefficient method that tries to mask several mismatches. Newer designs range from extended metadata tagging of objects to storing data in an open-ended key/data format on a drive or storage appliance. These are embryonic approaches, but the value proposition seems clear.

Finally, the public cloud offers a home for big data that is elastic and scalable to huge sizes. This has the obvious value of being always right-sized to enterprise needs and AWS, Azure and Google have all added a strong list of big data services to match. With huge instances and GPU support, cloud virtual machines can emulate an in-house server farm effectively, and make a compelling case for a hybrid or public cloud-based solution.

Suffice to say, enterprises have a lot to consider when they map out a plan for big data storage. Let’s look at some of these factors in more detail.

(Images: Timofeev Vladimir/Shutterstock)



Source link

Software-Defined Storage: 4 Factors Fueling Demand


As organizations look for cost-effective ways to house their ever-growing stores of data, many of them are turning to software-defined storage. According to market researchers at ESG, 52% of organizations are committed to software-defined storage (SDS) as a long-term strategy.

Some vendor-sponsored studies have found even higher rates of SDS adoption; while the findings are self-serving, they’re still noteworthy. For example, a SUSE report published in 2017 found that 63% of enterprises surveyed planned to adopt SDS within 12 months, and in DataCore Software’s sixth annual State of Software-Defined Storage, Hyperconverged and Cloud Storage survey, only 6% of respondents said they were not considering SDS.

What’s driving this interest in SDS? Let’s look at four important reasons why enterprises are considering the technology.

1. Avoid vendor lock-in

In an interview, Camberley Bates, managing director and analyst at Evaluator Group who spoke about SDS at Interop ITX,  said, “The primary driver of SDS is the belief that it delivers independence, and the cost benefit of not being tied to the hardware vendor.”

In fact, when DataCore asked IT professionals about the business drivers for SDS, 52% said that they wanted to avoid hardware lock-in from storage manufacturers.

However, Bates cautioned that organizations need to consider the costs and risk associated with integrating storage hardware and software on their own. She said that many organizations do not want the hassle of integration, which is driving up sales of pre-integrated appliances based on SDS technology.

2. Cost savings

Of course, SDS can also have financial benefits beyond avoiding lock-in. In the SUSE study, 72% of respondents said they evaluate their storage purchases based on total cost of ownership (TCO) over time, and 81% of those surveyed said the business case for SDS is compelling.

Part of the reason why SDS can deliver low TCO is because of its ability to simplify storage management. The DataCore study found that the top business driver for SDS, cited by 55% of respondents was “to simplify management of different models of storage.”

3. Support IT initiatives

Another key reason why organizations are investigating SDS is because they need to support other IT initiatives. In the SUSE survey, IT pros said that key technologies influencing their storage decisions included cloud computing (54%), big-data analytics (50%), mobility (47%) and the internet of things (46%).

Organizations are looking ahead to how these trends might change their future infrastructure needs. Not surprisingly, in the DataCore report, 53% of organizations said a desire to help future-proof their data centers was driving their SDS move.

4. Scalability

Many of those key trends that are spurring the SDS transition are dramatically increasing the amount of data organizations need to store. Because it offers excellent scalability, SDS appeals to enterprises experiencing fast data growth.

In the SUSE study, 96% of companies surveyed said they like the business scalability offered by SDS. In addition, 95% found scalable performance and capacity appealing.

As data storage demands continue to grow, this need to increase capacity while keeping overall costs down may be the critical factor in determining whether businesses choose to invest in SDS.

 



Source link