Tag Archives: Data

Big Data Storage: 7 Key Factors

Defining big data is actually more of a challenge than you might think. The glib definition talks of masses of unstructured data, but the reality is that it’s a merging of many data sources, both structured and structured, to create a pool of stored data that can be analyzed for useful information.

We might ask, “How big is big data?” The answer from storage marketers is usually “Big, really big!” or “Petabytes!”, but again, there are many dimensions to sizing what will be stored. Much big data becomes junk within minutes of being analyzed, while some needs to stay around. This makes data lifecycle management crucial. Add to that globalization, which brings foreign customers to even small US retailers. The requirements for personal data lifecycle management under the European Union General Data Protection Regulation go into effect in May 2018 and penalties for non-compliance are draconian, even for foreign companies, at up to 4% of global annual revenues per affected person.

For an IT industry just getting used to the term terabyte, storing petabytes of new data seems expensive and daunting. This would most definitely be the case with RAID storage array; in the past, an EMC salesman could retire on the commissions from selling the first petabyte of storage. But today’s drives and storage appliances have changed all the rules about the cost of capacity, especially where open source software can be brought into play.

In fact, there was quite a bit of buzz at the Flash Memory Summit in August about appliances holding one petabyte in a single 1U rack. With 3D NAND and new form factors like Intel’s “Ruler” drives, we’ll reach the 1 PB goal within a few months. It’s a space, power, and cost game changer for big data storage capacity.

Concentrated capacity requires concentrated networking bandwidth. The first step is to connect those petabyte boxes with NVMe over Ethernet, running today at 100 Gbps, but vendors are already in the early stages of 200Gbps deployment. This is a major leap forward in network capability, but even that isn’t enough to keep up with drives designed with massive internal parallelism.

Compression of data helps in many big data storage use cases, from removing repetitive images of the same lobby to repeated chunks of Word files. New methods of compression using GPUs can handle tremendous data rates, giving those petabyte 1U boxes a way of quickly talking to the world.

The exciting part of big data storage is really a software story. Unstructured data is usually stored in a key/data format, on top of traditional block IO, which is an inefficient method that tries to mask several mismatches. Newer designs range from extended metadata tagging of objects to storing data in an open-ended key/data format on a drive or storage appliance. These are embryonic approaches, but the value proposition seems clear.

Finally, the public cloud offers a home for big data that is elastic and scalable to huge sizes. This has the obvious value of being always right-sized to enterprise needs and AWS, Azure and Google have all added a strong list of big data services to match. With huge instances and GPU support, cloud virtual machines can emulate an in-house server farm effectively, and make a compelling case for a hybrid or public cloud-based solution.

Suffice to say, enterprises have a lot to consider when they map out a plan for big data storage. Let’s look at some of these factors in more detail.

(Images: Timofeev Vladimir/Shutterstock)

Source link

10 Silly Data Center Memes

Surviving the IT Security Skills Shortage

Cybersecurity professionals are in high demand — and short supply. Find out what Dark Reading discovered during their 2017 Security Staffing Survey and get some strategies for getting through the drought. Download the report today!


Source link

Is the Network Part of Your Data Backup Strategy?

Make sure to include the network in your data protection planning.

A data backup strategy is the backbone of any enterprise IT shop. Businesses need to protect their data from application or server failures, as well as improper data manipulation, deletion or destruction through accidental or nefarious methods such ransomware.

In planning their backup strategy, companies can overlook the network as part of the overall design. Distributed and server-to-cloud backups rely on the underlying network to move data from point A to B in a timely and secure manner. Therefore, it makes sense to include the network as an integral part of any data backup and recovery strategy. I’ll discuss four ways to do that.

Network redundancy

The first and most obvious step is to verify that your network maintains a proper level of end-to-end resiliency. Whether you are talking about local, off-site or backups to cloud service providers, the network should be designed so that there are no single points of failure that could potentially render a data backup or restore useless. A single point of failure refers to a device or link that, if it fails, will bring down all or a large portion of a LAN.

Also, consider how automated your network failover prevention mechanisms are. Traditional network redundancy techniques include dynamic routing protocols, HSRP/VRRP, VPN and WAN carrier diversity. More recently, SDN, SD-WAN and multi-cloud management are beginning to be included as part of a forward-thinking data backup roadmap.

Network baselining

Data backups have the potential to consume a tremendous amount of throughput. The major concern is that certain links along the way will become congested to the point that it negatively impacts other applications and users on the network. Avoiding network congestion by using a separate network that’s purpose-built for backups is cost prohibitive. Most enterprises perform backups using the same network hardware and links as their production traffic.

Consequently, a key step in any backup strategy is to properly baseline traffic across the network to determine how backups will impact link utilization. Understanding data flows and throughput requirements of data backups along with current utilization baselines over time allows engineers to design a backup strategy that will not impact daily operations. In some cases, this means that the timing of backups occur outside of network peak hours. In other situations, it will require upgrading the throughput capacity of certain network links along a backup path.

Once a backup plan is in place, it’s necessary to continue to monitor link utilization using NetFlow and SNMP tools to ensure that bottlenecks don’t creep up on you over time.


Another way to mitigate the impact backups can have on a shared network links is to leverage quality of service (QoS) techniques. Using QoS, we can identify, mark and ultimately prioritize traffic flows as they traverse a network. Large companies with highly complex networks and backup strategies often opt to mark and prioritize data backups at a lower class. This is so more critical, time-sensitive applications, such as voice and streaming video, take priority and freely traverse the network when link congestion occurs.

Backup packets are queued or dropped according to policy and will automatically  transmit when the congestion subsides. This allows for round-the-clock backups without the need for strict off-hours backup windows and alleviates concern that the backup process will impair production traffic that shares the same network links.

Data security

No conversation about backups is complete without discussing data security. From a network perspective, this includes a plan for extending internal security policies and tools out to the WAN and cloud where off-site backups will eventually reside.

Beyond these data protection basics, network and security administrators must also battle shadow IT. Shadow IT is becoming a serious problem that affects the safety and backup/restore capabilities of corporate data. Backups are only useful when they collect all critical data. Shadow IT is preventing this from happening because data is increasingly being stored in unauthorized cloud applications.

Tools such as NetFlow and cloud access security broker (CASB) platforms can help track down and curb the use of shadow IT. A CASB can monitor traffic destined to the Internet and control what cloud services employees can use.

Source link

Data Center Architecture: Converged, HCI, and Hyperscale

A comparison of three approaches to enterprise infrastructure.

If you are planning an infrastructure refresh or designing a greenfield data center from scratch, the hype around converged infrastructure, hyperconverged infrastructure (HCI) and hyperscale might have you scratching your head. In this blog, I’ll compare and contrast the three approaches and consider scenarios where one infrastructure architecture would be a better fit than the others.

Converged infrastructure

Converged infrastructure (CI) incorporates compute, storage and networking in a pre-packaged, turnkey solution. The primary driver behind convergence was server virtualization: expanding the flexibility of server virtualization to storage and network components. With CI, administrators could use automation and management tools to control the core components of the data center. This allowed for a single admin to provision, de-provision and make any compute, storage or networking changes on the fly.

Converged infrastructure platforms use the same silo-centric infrastructure components of traditional data centers. They’re simply pre-architected and pre-configured by the manufacturers. The glue that unifies the components is specialized management software. One of the earliest and most popular CI examples is Virtual Computing Environment (VCE). This was a joint venture by Cisco Systems, EMC, and VMware that developed and sold various sized converged infrastructure solutions known as Vblock. Today, Vblock systems are sold by the combined Dell-EMC entity, Dell Technologies.

CI solutions are a great choice for infrastructure pros who want an all-in-one solution that’s easy to buy and pre-packaged direct from the factory. CI is also easier from a support standpoint. If you maintain support contracts on your CI system, the manufacture will assist in troubleshooting end-to-end. That said, many vendors are shifting their focus towards hyperconverged infrastructures.

Hyperconverged infrastructure

HCI builds on CI. In addition to combining the three core components of a data center together, hyperconverged infrastructure leverages software to integrate compute, network and storage into a single unit as opposed to using separate components. This architecture design offers performance advantages and eliminates a great deal of physical cabling compared to silo- and CI-based data centers.  

Hyperconverged solutions also provide far more capability in terms of unified management and orchestration. The mobility of applications and data is greatly improved, as is the setup and management of functions like backups, snapshots, and restores. These operational efficiencies make HCI architectures more attractive from a cost-benefit analysis when compared to traditional converged infrastructure solutions.

In the end, a hyperconverged solution is all about simplicity and speed. A great use case for HCI would be a new virtual desktop infrastructure (VDI) deployment. Using the orchestration and automation tools available, you have the ideal platform to easily roll out hundreds or thousands of virtual desktops.


The key attribute of hyperscale computing is the de-coupling of compute, network and storage software from the hardware. That’s right, while HCI combined everything into a single chassis, hyperscale decouples the components.

This approach, as practiced by hyperscale companies like Facebook and Google, provides more flexibility than hyperconverged solutions, which tend to grow in a linear fashion. For example, if you need more storage on your HCI system, you typically must add a node blade that includes both compute and built-in storage. Some hyperconverged solutions are better than others in this regard, but most fall prey to linear scaling problems if your workloads don’t scale in step.

Another benefit of hyperscale architectures is that you can manage both virtual and bare metal servers on a single system. This is ideal for databases that tend to operate in a non-virtualized manner. Hyperscale is most useful in situations where you need to scale-out one resource independently from the others. A good example is IoT because it requires a lot of data storage, but not much compute. A hyperscale architecture also helps in situations where it’s beneficial to continue operating bare metal compute resources, yet manage storage resources in elastic pools.

Source link

Data Center Transformation at ConocoPhillips

IT leaders at ConocoPhillips were already working on a major data center consolidation initiative before oil prices plummeted. The company couldn’t keep adding storage and servers; it just wasn’t sustainable, especially for a company that was looking to get serious about the cloud. The industry downturn added urgency to their efforts.

That meant taking some dramatic action in order to cut IT operating costs and save jobs, according to Scott Duplantis, global IT director of server, storage and data center operations at ConocoPhillips. The transformation, which focused on two data centers in the US, included fast-tracking adoption of newer technology like all-flash arrays with full-time data reduction, and refreshing compute platforms with a control-plane software that manages virtual CPU and memory allocations.

All the hard work combined with a fearless approach to data center modernization paid off: The company reduced its data center footprint by more than 50%, slashed its SAN floor space consumption by 80%, cut its power and cooling costs by $450,000 a year, improved reliability, and saved jobs along the way, all in about 30 months.

“We have fewer objects under management, which means not having to add staff as we continue to grow,” Duplantis said. “Our staff can do a better job of managing the infrastructure they have, and it frees them up to pursue cloud initiatives.”

ConocoPhillips’ data center transformation initiative earned first place in the InformationWeek IT Excellence Awards infrastructure category.

Reducing the storage footprint

For its storage-area network, network-attached storage, and backup and recovery, ConocoPhillips traditionally relied on established storage vendors. The SAN alone had 62 racks of storage between the two data centers.

ConocoPhillips decided that flash storage was the way to go, and conducted a bakeoff between vendors that had the features it wanted: ease of management, data deduplication and compression, replication, and snapshotting. The company wound up choosing a relatively new vendor to supply all-flash storage for its SAN, and buying AFAs from one of its incumbent vendors for its NAS. The company also focused on buying larger controllers, which when combined with the flash, provided better performance and reduced the number of objects the staff has to manage.

The work reduced raw SAN storage from 5.6 to 1.8 petabytes. Altogether, the consolidation cuts down on object maintenance and support contracts tied to storage hardware.

Improved power and cooling efficiency from the flash storage adoption has ConocoPhillips revaluating how its data centers are cooled. “We have to do some reengineering in our data centers to accommodate for almost half of the power footprint they had, and a significant drop in heat because these all-flash arrays don’t generate much heat at all,” he said.

The company also is relearning how to track and trend storage capacity needs; with full-time data reduction, measuring capacities has become a bit tricky.

While some argue that flash has a limited lifecycle, ConocoPhillips has experienced improved SAN storage reliability, Duplantis said. 

Server consolidation

On the compute side, ConocoPhillips deployed faster, more powerful servers, along with the control-plane technology that automates the management of CPU and memory workloads. Virtual server densities shot up dramatically, from 20:1 to 50:1.

The control-plane technology, from a startup, provides a level of optimization that goes beyond human scale, according to Duplantis. Combined with the flash storage, it’s helped cut performance issues to near zero.

“You really can’t just stick with the mainstream players,” he advised. “In the industry today, a lot of the true innovation is coming out of the startup space.”

Lessons learned

While the data center modernization project went smoothly for the most part, without disrupting end users, there were some hiccups with the initial flash deployment. Duplantis said the company was pleased with the support they received from the vendor, which was especially important given that the vendor was newer.

Internally, the data center transformation did require a culture shift for the IT team. IT administrators become attached to the equipment they manage, so they need to see a lot of proof that the new technology is reliable and easy to manage, Duplantis said.

“Today, we understand mistakes are made and technology can fail,” he said. “Once they saw they could take a chance and wouldn’t be in trouble if it didn’t work perfectly, they could breathe easy.”

The fact that jobs were saved amid the economy downturn with all the cost-cutting measures turned employees into champions for the new technology, he said. “They see they’re part of the process that helped save jobs, save costs, and increase reliability,” he said.

Looking ahead

ConocoPhillips plans to continue to right-size its storage and virtual server environments; the process is now just part of the corporate DNA. On the virtual side, the team examines the number of hosts every month and decides to either keep them on premises or put them in a queue for the cloud, Duplantis said.

The team also is working to build up its cloud capability to ensure it’s ready when the economy picks up and the company increases its drilling activity. “We want to be nimble and agile when the business needs it,” he said.

Source link