Tag Archives: Data

Enterprise Data Storage Shopping Tips

Enterprise data storage used to be an easy field. Keeping up meant just buying more drives from your RAID vendor. With all the new hardware and software today, this strategy no longer works. In fact, the radical changes in storage products impact not only storage buys, but ripple through to server choices and networking design.

This is actually a good news scenario. In data storage, we spent much of three decades with gradual drive capacity increases as the only real excitement. The result was a stagnation of choice, which made storage predictable and boring.

Today, the cloud and solid-state storage have revolutionized thinking and are driving much of the change happening today in the industry. The cloud brings low-cost storage-on-demand and simplified administration, while SSDs make server farms much faster and drastically reduce the number of servers required for a given job.

Storage software is changing rapidly, too. Ceph is the prime mover in open-source storage code, delivering a powerful object store with universal storage capability, providing all three mainstream storage modes (block-IO, NAS and SAN) in a single storage pool. Separately, there are storage management solutions for creating a single storage address space from NVDIMMs to the cloud, compression packages that typically shrink raw capacity needs by 5X, virtualization packages that turn server storage into a shared clustered pool, and tools to solve the “hybrid cloud dilemma” of where to place data for efficient and agile operations.

A single theme runs through all of this: Storage is getting cheaper and it’s time to reset our expectations. The traditional model of a one-stop shop at your neighborhood RAID vendor is giving way to a more savvy COTS buying model, where interchangeability of  component elements is so good that integration risk is negligible. We are still not all the way home on the software side in this, but hardware is now like Legos, with the parts always fitting together. The rapid uptake of all-flash arrays has demonstrated just how easy COTS-based solutions come together.

The future of storage is “more, better, cheaper!” SSDs will reach capacities of 100 TB in late 2018, blowing away any hard-drive alternatives. Primary storage is transitioning to all-solid-state as we speak and “enterprise” hard drives are becoming obsolete. The tremendous performance of SSDs has also replaced the RAID array with the compact storage appliance. We aren’t stopping here, though. NVDIMM is bridging the gap between storage and main memory, while NVMe-over-Fabric solutions ensure that hyperconverged infrastructure will be a dominant approach in future data centers.

With all these changes, what storage technologies should you consider buying to meet your company’s needs? Here are some shopping tips.

(Image: Evannovostro/Shutterstock)

Source link

How To Shrink Your Data Storage Footprint

I remember a few years ago parsing through all the files on a NAS box. I was amazed at all the duplicate files, but a bit more investigation revealed that we had a mix of near duplicates in with the genuine replicas. All had the same name, so it was hard to tell the valid files from the trash. I asked around and the responses I got were mostly along the lines of, “Why are we keeping that? No one uses it!”

This begs the question: Do we throw any data away any more? Laws and regulations like the Sarbanes-Oxley Act (SOX) and HIPAA stipulate that certain data should be kept safe and encrypted. The result is that data subject to the law tends to be kept carefully forever, but then, so does most of the rest of our data, too.

Storing all this data isn’t cheap. Even on Google Nearline or Amazon Glacier, there is cost associated with all of the data, its backups and replicas. In-house, we go through the ritual of moving cold data off primary storage to bulk disk drives, and then on into the cloud, in almost a mindless manner.

Excuses include “Storage is really cheap in the cloud,” and “It’s better to keep everything, just in case” to “Cleaning up data is expensive” or too complicated. Organizations often evoke big data as another reason for their data stockpiling, since there may be nuggets of gold in all that data sludge. The reality though is that most cold, old data is just that: old and essentially useless.

As I found with the NAS server, analyzing a big pile of old files is not easy. Data owners are often no longer with the company. Even if they were, remembering what an old file is all about is often impossible. The relationship between versions of files is hard to recreate, especially for desktop data from departmental users. In fact, it’s mainly a glorious waste of time. Old data is just a safety blanket!

So how can companies go about reducing their data storage footprint? Continue on to learn about some data management best practices and tools that can help.

(Image: kentoh/Shutterstock)

Source link

Docker Data Security Complications

Docker containers provide a real sea change in the way applications are written, distributed and deployed. The aim of containers is to be flexible and allow applications to be spun up on-demand, whenever and wherever they are needed. Of course wherever we use our applications, we need data.

There are two schools of thought on how data should be mapped into containers. The first says we keep the data only in the container; the second says we have persistent data outside of the container that extends past the lifetime of any individual container instance. In either scenario, the issue of security poses big problems for data and container management.

Managing data access

As discussed in my previous blog, there are a number of techniques for assigning storage to a Docker container. Temporary storage capacity, local to the host running the container can be assigned at container run time. Storage volumes assigned are stored within the host in a specific subdirectory mapped to the application. Volumes can be created at the time the container is instanced, or in advance using the “docker volume” command.

Alternatively, local storage can be mapped as a mount point into the container. In this instance, the “docker run” command specifies a local directory as the mounted point within the container. The third option is to use a storage plugin that directly associates external storage with the container.

Open access

In each of the described methods, the Docker framework provides no inherent security model for data. For example, any host directory can be mounted to a container, including sensitive system folders like /etc. It’s possible for a container to then modify those files, as permissions are granted using standard, simple Unix permission settings. An alternative and possibly better practice is to consider using non-root containers, which involves running containers under a different Linux user ID (UID). This is relatively easy to do, however does mean building a methodology to secure each container with either a group ID (GID) or UID as permissions checking is done on UID/GID numbers.

Here we run into another problem: Using non-root containers with local volumes doesn’t work, unless the UID used to run the container has permissions to the /var/lib/docker/volumes directory. Without this, data can’t be accessed or created. Opening up this directory would be a security risk; however, there’s no inherent method to set individual permissions on a per-volume basis without a lot of manual work.

If we look at how external storage has been mounted to a container, many solutions simply present a block device (a LUN) to the host running the container and format a file system onto it. This is then presented into the container as a mount point. At this point, the security on directories and files can be set by within container itself, reducing some of the issues we’ve discussed. However, if this LUN/volume is reused elsewhere, there are no security controls about how it is mounted or used on other containers, as there is no security model built directly into the container/volume mapping relationship. Everything depends on trusting the commands run on the host.

This is where we have yet another issue: a lack of multi-tenancy. When we run containers, each container instance may run for a separate application. As in traditional storage deployments, storage assigned to containers should have a degree of separation to ensure data can’t be inadvertently or maliciously accessed cross-application. There’s no easy way to currently do this at the host level, other than to trust the orchestration tool running the container and mapping it to data.

Finding a solution

Obviously some of the issues presented here are Linux/Unix specific. For example, the abstraction of the mount namespace provides different entry points for our data, however there’s no abstraction of permissions – I can’t map user 1,000 to user 1,001 without physically updating the ACL (access control list) data associated with each file and directory. Making large-scale ACL changes could potentially impact performance. For local volumes, Docker could easily set the permissions of the directory on the host that represents a new volume to match the UID of the container being started.

External volumes provide a good opportunity to move away from the permissions structure on the host running containers. However, this means that a mechanism is required to map data on a volume to a known trusted application running in a specific container instance. Remember that containers have no inherent “identification” and can be started and stopped at will. This makes it hard to determine whether any individual container is the owner of a data volume.

Today the main solution is to rely on the orchestration platform that manages the running of the containers themselves. We put the trust into these systems to map volumes and containers accurately. In many respects, this isn’t unlike traditional SAN storage or the way virtual disks are mapped to virtual machines. However, the difference for containers is the level of portability they represent and the need to have a security mechanism that extends to the public cloud.

There’s still some work to be done here. For Docker, its acquisition of storage startup Infinit may spur ideas about how persistent data is secured. This should hopefully mean the development of an interface that all vendors can work towards — storage “batteries included” but optional.

Learn more about containers at Interop ITX, May 15-19 in Las Vegas. Container sessions include “Managing Containers in Production: What You Need To Think About,” and “The Case For Containers: What, When, and Why?” Register now!

Source link

Hot Storage Skills For The Modern Data Center

The world of data storage is evolving faster than dinosaurs after the asteroid struck. Much of the old storage “theology” is on the block as we move to a world of solid-state, software-defined, open source, cloudy appliances and leave RAID arrays behind. That inevitably means that the skills needed to be a successful storage administrator also are changing.

Let’s first look at some timelines. Solid state is already mainstream and 2017 will see a massive jump in usage as 3D NAND hits its stride. With the industry promising 100 TB 2.5 inch SSDs in 2017, even bulk storage is going to change from hard-disk drives. Software-defined storage (SDS) is really just getting started, but if its networking equivalent (SDN) is a guide,  we can expect to see it gain traction quickly.

Open source code, such as Ceph and OpenStack, is already a recognized business alternative. Cloud storage today is mainstream as a storage vehicle for cold data, but still emerging for mission-critical information. This year, we can expect OpenStack hybrid clouds to transition to production operations with the arrival of new management tools and approaches to storage.

Coupled with these storage changes are several transitions under way in servers and networking. The most important is the migration of virtual instances to the container model. Not only do containers impact server efficiency, the ability to manage them and integrate data and network storage resources across the hybrid environment is going to be an in-demand skill in the next-generation data center.

One poorly understood but important issue is how to tune performance in the new environment. We are still getting the wheels to turn in so much of this new stuff, but at some point the realization will hit that a well-tuned data management approach will prevent many of the ills that could arise in performance and security.

In this environment, demand for many traditional storage skills will decline. With cloud backup and archiving rapidly becoming standard, anything to do with traditional backup and tape libraries has to top the list of skills on the way out. Tape has been declared dead regularly for decades, but now the low prices and built-in disaster recovery benefits of the cloud make any tape-based approach impractical.

RAID-based skills are in the same boat. Array sales are dropping off as small Ethernet appliances make for more flexible solutions. In fact, the block-IO model, which struggles to scale, is in decline, replaced by REST and object storage. Skills ranging from building Fibre-Channel SANs to managing LUNs and partitions will be less needed as the decline of the traditional SAN occurs, though IT is conservative and the SAN will fade away, not instantly disappear.

NAS access is in many ways object storage with a different protocol to ask for the objects. While the file model will tend to stick around, just as block-IO will take time to go away, increasingly it will be offered on an object platform, which means that a NAS admin will need to become skilled with object storage approaches.

Continue on to find out what data storage skills will be in demand in the years ahead.

(Image: Mark Agnor/Shutterstock)

Source link

Toward the Self-Driving Data Center

Like self-driving cars, the data center that runs itself, manages itself and calls for help when needed is not far away. Even complex IT infrastructure that is notoriously difficult to upgrade and maintain is being automated, converged, made into building blocks or stacks, and managed via software, not hardware.

Visionaries have suggested that the self-driving data center is as inevitable as the self-driving vehicle, as IT staffs admit that machines can do most anything better than a human, and start putting the machines to work. Doing so enables agility – that most essential IT building block – so that leaders can respond to a changing business universe. Even for those “humanists” who disagree that machines can perform IT manager tasks better, the efficiency gained from offloading repetitive functions, or making connections between often unrecognized, disparate events, frees organizations to serve customers at a higher level.

Similar to vehicles, data centers are well along their march toward full self-driving capability, and a continuum of automation and analytics-based solutions is in place to save time, hassle and costs. Using the storage industry as an example, the continuum of new devices, virtualization, alerting, and orchestration technologies enable successively greater machine direction of resources, and less overt IT staff involvement.

Six key technologies are evolving that most challenging IT domain – data storage – further along the continuum of more fully autonomous operations. They are turning IT managers into business agility agents whose work enables their organizations to achieve higher aims.

(Image: Timofeev Vladimir/Shutterstock)

Source link