Tag Archives: Data

Machine Learning on Telemetry Data: Mining Value from Chaos | IT Infrastructure Advice, Discussion, Community


Data growth continues to rise across the globe, driven by accelerating adoption of technologies like Big Data, IoT, and artificial intelligence. Naturally, this data growth is accompanied by an increase in data centers, including traditional, cloud, and edge data centers. And as modern datacenter networks become more complex – supporting data-intensive new applications associated with the technologies above – these networks are beginning to produce massive amounts of telemetry data.

Network telemetry data is information providing basic metrics on how a network is performing. This information is raw and very granular. For example, it tells you the throughput or latency of application flows and packets throughout the network. Given how busy hyperscale and service provider networks get, this detailed data can accrue to huge volumes in a short time. Network operators generally regard these massive quantities of telemetry data as a burden and essentially worthless. After all, they’ve got sophisticated proprietary solutions to help manage their networks, why even bother combing through a seemingly incompressible mess of raw telemetry data?

But this data holds tremendous value that most organizations haven’t considered. By performing machine learning on telemetry data, these organizations can unlock hidden insights that will drastically improve performance and bolster security. In short, with machine learning, network operators can find tremendous value in all the chaos. 

To begin, you need to have uncompromised visibility on the network you wish to monitor. Using traffic probes, sampling traffic, or polling statistics and counters generally do not give you enough information to observe and detect issues that can impact these cloud-scale networks environment. It would be like running in a dark and dangerous forest at night only having a small flashlight. The chances you will not find your way or not see an obstacle are high. Thankfully, though, emerging technologies like Inband-Network Telemetry solve this problem giving network operators the ground-truth on the network data they need. 

Next, you will need to know your network data, which includes processing and organizing different types of telemetry data like path and latency information, hop-by-hop delay, jitter, and packet loss rate, identify interesting data and detect anomalies and events.

Now that you have an ordered data-set, machine learning techniques can help determine a baseline of how your network is currently performing – providing a comprehensive view of latency, bandwidth, packet drops, and other metrics. This sort of baselining makes it clear where a network is running smoothly and where it’s struggling. Thus, from this viewpoint, operators can identify the problem areas where they need to focus. Again, these insights are now possible because telemetry data is so rich, providing specific information on each packet in a network.

Machine learning correlates phenomena between latency, paths, switches, routers, events, etc. This intelligence uncovers things that were previously invisible. For example, it may tell you that network events X and Y are closely related and that when one is seen, the other is likely to be observed too. It can also tell you that when you make a network change, you get a particular behavior. This makes it possible to detect and eliminate bottlenecks. Maybe there’s a network policy causing a packet drop or a slowdown. Maybe there’s a problem with the application deployment or the network provisioning that needs to be remedied by the application or the network administrators. Machine learning provides the clues that point you toward an answer.

This leads to another major advantage of this approach: predictive analytics. As your machine learning models train to understand correlations and patterns in the present, they eventually gain the ability also to predict the future. Network operators come to understand how each action they take correlates to certain behaviors, down to the packet level. Armed with this knowledge, they can anticipate and prevent network outages, delays in the forwarding plane, and app slowdowns.

In addition to improving performance in the present and future, machine learning on telemetry data also boosts network security. After detailed baselining, it’s easy to spot anomalous behavior. Anomalies often just signify poor performance somewhere in the network, but sometimes they indicate a breach. Once identified, anomalous behavior can be investigated further, allowing network operators to expose significant security vulnerabilities potentially.

To gain accurate network insights from machine learning, an organization needs a massive volume of telemetry data to draw from.  But getting the right quantity of data is easy, it’s getting the right quality of data that can be tricky. In order to train the ML models correctly, telemetry data must include flow reports, congestion reports, and drop reports in order to properly baseline networks, find correlations, and predict future outcomes.

In an era when networks are becoming more scattered and chaotic, using machine learning on telemetry data is a way to make the network seem whole again. This method provides critical new insights and establishes a holistic way to manage the network rather than troubleshooting individual components. Advanced technologies like 5G are only going to put more pressure on network performance. The best way to meet this rising challenge is to take a proactive, AI-based approach. For growing networks, telemetry data may seem extraneous, but when properly harnessed, that data provides solutions to the network’s biggest problems.



Source link

Why CIOs Are Betting on Cloud for Their Modern Data Programs | IT Infrastructure Advice, Discussion, Community


Enterprise infrastructures are changing rapidly as the management and visibility requirements of modern, data-driven applications are outpacing legacy data storage functionality. Gartner confirms that, with artificial intelligence and machine learning driving an explosion in data volume and variety, IT operations are outgrowing existing frameworks. Although insights from today’s vast amounts of structured, semi-structured, and unstructured data can deliver superior value, organizations are currently unable to adequately monitor or analyze this information (and between 60 percent and 73 percent of all data within an enterprise goes unused).

Cloud has been the buzz for more than a decade, and it is now seeing mass adoption among enterprises. Similarly, over the past several years, the size and scope of data pipelines have grown significantly. Just a few years ago, Fortune 500 companies were still experimenting with and testing the efficacy of ‘big data’ as they move toward a digital transformation. Yet today, the majority of those organizations have moved from big data pilots to large-scale, full production workloads with enterprise-level SLAs. Now, these organizations are most interested in maximizing the return on their big data investments and developing new use cases that create new revenue streams.

Data is staying put: Why Big Data needs the cloud

According to recent research from Sapio Research, who surveyed more than 300 IT decision makers, ranging from directors to C-suite, enterprises are overwhelmingly embracing the cloud to host their big data programs. As of January of this year, 79% of the respondents have data workloads currently running in the cloud, and 83% have a strategy to move existing data applications into the cloud. Why?

Modern data applications create processing workloads that require elastic scaling, meaning compute and storage needs change frequently and independently of each other. The cloud provides the flexibility to accommodate this type of elasticity, ensuring the computing and storage resources are available to ensure optimal performance of data pipelines under any circumstances. Many new generation data applications require data workflows to process increased traffic loads at certain times, yet little need to process data at other times – think of social media, video streaming or dating sites. For the many different organizations that encounter this type of resilience monthly, weekly, or even daily, the cloud provides an agile, scalable environment that helps future-proof against these unpredictable increases in data volume, velocity, and variety.

As an example, e-commerce retailers use data processing and analytics tools to provide targeted, real-time shopping suggestions for customers as well as to analyze their actions and experiences. Every year, these organizations experience spiking website traffic on major shopping days like Cyber Monday – and in a traditional big data infrastructure, a company would need to deploy physical servers to support this activity. These servers would likely not be required the other 364 days of the year, resulting in wasted expenditures. With the cloud, however, online retailers have instant access to additional compute and storage resources to accommodate traffic surges and to scale back down during quieter times. In short, cloud computing lacks the headaches of manual configuration and troubleshooting, as with on-premise, and saves money by eliminating the need to physically grow infrastructure.

Lastly, for organizations that handle hyper-secure, personal information (think social security numbers, health records, financial details, etc.) and worry about cloud-based data protection, adopting a hybrid cloud model allow enterprises to keep sensitive workloads on-premises while moving additional workloads to the cloud. Organizations realize they don’t have to be all in or out of the cloud. Sapio’s survey revealed that most respondents are embracing a hybrid cloud strategy (56 percent) for this reason.

The rapid increase in data volume and variety drives organizations to rethink enterprise infrastructures, particularly cloud strategies, and focus on longer-term data growth, flexibility, and cost savings. Over the next year, we will see an increase in modernized data processing systems, ran partially or entirely on the cloud, to support advanced data-driven applications and its emerging use cases.



Source link

How AIOps Can Improve Data Center Management | IT Infrastructure Advice, Discussion, Community


Today’s data center management professionals face a unique challenge. Technologies like the Internet of Things (IoT) and cloud computing are elevating a new generation of IT applications, powering everything from smart cities to data-driven crisis response. However, these capabilities have made digital environments more complex by several orders of magnitude, making it increasingly difficult to effectively manage modern data centers.

Thankfully, an emerging trend known as AIOps — or Artificial Intelligence Operations — offers IT professionals the support they desperately need. By bringing artificial intelligence and visualization technologies to bear on a wide range of data center challenges, AIOps enables data center management professionals to automate administrative tasks, reduce unnecessary alerts, and identify anomalies before they cause wider issues.

AIOps tools are already helping IT teams in multiple scenarios, data center management included. In fact, a new report from OpsRamp shows that 87% of technology professionals say their AIOps solutions are delivering the value they had expected prior to implementation. Coupled with analyses showing that the market for AIOps platforms will grow from $2.6 billion in 2018 to $11.0 billion in 2023, these positive early results underscore the transformative potential of AIOps in the IT space.

For data center management professionals interested in learning more about AIOps, it’s important to understand the approach’s range of possible use cases, as well as the requirements for successful implementation. By doing so, data center teams can ensure they’ll reap the rewards of a technology that promises to revolutionize the IT industry at large.

Read the rest of this artilce on InformationWeek.

Read more Network Computing articles on this topic:

AI-Driven Wireless Is Key to the Digital Workplace

How Is AI Affecting Infrastructure Pros?

Why IT War Rooms Fail, and Why Failure is No Longer an Option

 



Source link

Combining Data Center Innovations to Reduce Ecological Footprints | IT Infrastructure Advice, Discussion, Community


The big tech companies are vying for positive coverage of their environmental initiatives. Microsoft just promoted its achievements in renewable energy, which will comprise 60 percent of the company’s electricity usage by the end of the year. Facebook made headlines for a forthcoming 100 percent renewable-powered facility in Los Lunas, New Mexico, while both Apple and Google claim 100 percent carbon neutrality.

These green milestones are important, but renewables represent only one environmental solution for the data center industry. Energy-intensive technologies, such as AI and blockchain, complicate the quest for clean, low-impact electricity generation. Additionally, the sector remains a large consumer of the planet’s other resources, including water and raw materials. Unfortunately, the search for energy efficiency can negatively affect other conservation efforts.

Current State of Play on the Search for Energy Efficiency

A case in point is adiabatic cooling, which evaporates water to ease the burden on HVAC systems. At a time when 2.7 billion people suffer from water scarcity, this approach can lead to intense resource competition, such as in Maharashtra, India, where drinking water had to be imported as thirsty colocation facilities proliferated.

Bolder strategies will be necessary to deliver the compute power, storage capacity, and network connectivity the world demands with fewer inputs of fossil fuels, water, rare earth metals, and other resources. Long range, there is hope for quantum computing, which has the potential to slash energy usage by more than 20 orders of magnitude over conventional technologies. This could cut Google’s annual burn rate, for instance, from gigawatt-hours to the nanowatt-hour range, reducing the need to produce more solar panels, wind turbines, and hydropower stations along the way.

Commercial launches – such as IBM’s Q System One – notwithstanding, the quantum moonshot still lies at least a decade away by most accounts, and the intervening barriers are significant. Quantum calculations remain vulnerable to complex errors, new programming approaches are required, and the nearest-term use cases tend toward high-end modeling, not replacing the standard web server or laptop.

Green Technology Solutions Closer to Earth

Fortunately, there are other technologies nearer at hand and more accessible for the average data center, colocation provider, or even regional office. For example, AI-based technologies are training as zombie killers, using machine learning to improve server allocation and power off the 25% of physical servers and 30% of virtual servers currently running but doing nothing. If underutilized IT assets are repurposed, this can not only help realize energy savings, it can delay new equipment purchases as well.

 

Then there is liquid cooling, well known from the industry’s mainframe origins. Although many companies won’t be able to redesign facilities a la Facebook’s designs, hardware manufacturers are delivering off-the-shelf liquid-cooled products. Use of rear-door heat exchangers and direct-to-chip cooling can help lower PUE from 1.5 or more down toward 1.1, and immersion cooling can deliver power savings of up to 50 percent. These technologies also enable greater density, which means doing more with less space—a good thing, as land, too, is a natural resource.

Consolidation trends will shift more of the environmental burden to the few outfits with pockets deep enough to do the seemingly impossible: sink data centers in the ocean for natural cooling, launch them into space, and “accelerate” workloads with the earliest, sure to be exorbitantly expensive, quantum computers ready for mission critical applications.

What’s Next for the “Green” Data Center

None of today’s available technologies, from AI-driven DCIM systems to advanced load balancers, is a panacea. With blockchain’s intense processing demands and consumers’ insatiable appetite for technology, among other pressures, the IT industry faces numerous forces working against its efforts to shrink resource consumption and carbon emissions.

While we await a breakthrough with the exponential impact of quantum computing, we will have to combine various solutions to drive incremental progress. In some cases, that will mean a return of cold storage to move rarely accessed information off powered storage arrays in favor of tape backups and similar “old school” methods. In others, it will mean allowing energy efficiency and component recyclability to tip the balance during hardware acquisition decisions. And in still others, newer edge computing applications may integrate small, modular pods that work on solar-wind hybrid energy systems.

Hopefully, the craving these dominant tech players display for positive environmental headlines, paired with a profit motive rewarding tiny efficiency gains achieved at hyperscale, will continue to propel advances in green solutions that can one day be implemented industry-wide.



Source link

5 Things You Need to Know About Data Lakes | IT Infrastructure Advice, Discussion, Community


Still waters run deep, the old proverb tells us. The same can be said for data lakes, storage repositories that hold vast amounts of raw data in native format until required by an application, such as predictive analytics.

Like still water, data lakes can be dark and mysterious. This has led to several misconceptions about the technology, some of which can prove damaging or even fatal to new data lake projects.

Before diving in, here are five key things you need to know about data lakes.

1. Data lakes and data warehouses are not the same thing

A data warehouse contains data that has been loaded from source systems based on predefined criteria. “A data lake, on the other hand, houses raw data that has not been manipulated in any way prior to entering the) lake and enables multiple teams within an organization to analyze the data,” noted Sue Clark, senior CTO and architect at Sungard Availability Services.

Although separate entities, data lakes and data warehouses can be packaged into a hybrid model. “This combined approach enables companies to stream incoming data into a data lake, but then move select subsets into relational structures,” said Ashish Verma, a managing director at Deloitte Consulting. “When data ages past a certain point or falls into disuse, dynamic tiering functionality can automatically move it back to the data lake for cheaper storage in the long term.”

2. Don’t treat a data lake like a digital dump

Although a data lake can store structured, unstructured, and semi-structured data in raw form, it should never be regarded as a data dumping ground. “Since data is not processed or analyzed before entering the lake, it’s important that the data lake is maintained and updated on a routine basis, and that all users know the sources of the data in the lake to ensure it’s analyzed appropriately,” Clark explained.

From a data scientist point of view, the most important components when creating a data lake is the process of adding data while ensuring the accompanying catalogs are updated, current, and accessible, observed Brandon Haynie, chief data scientist at Babel Street, a data discovery and analysis platform provider. Otherwise, potentially useful datasets may be set adrift and lost. “The catalog will provide the analyst with an inventory of the sources available, the data’s purpose, it’s origin, and it’s owner,” he said. “Knowing what the lake contains is critical to generating the value to support decision-making and allows data to be used effectively instead of generating more questions surrounding its quality or purpose.”

3. A data lake requires constant management

It’s important to define management approaches in advance to ensure data quality, accessibility, and necessary data transformations. “If a data lake isn’t properly managed from conception, it will turn into a ‘data swamp,’ or a lake with low-quality, poorly cataloged data that can’t be easily accessed,” Verma said.

It’s important for IT leaders to know that data governance is critical for ensuring data is consistent, accurate, contextualized, accessible, and protected, noted Jitesh S. Ghai, vice president and general manager of data quality, security, and governance, at software development company Informatica. “With a crystal-clear data lake, organizations are able to capitalize on their vast data to deliver innovative products and services, better serve customers, and create unprecedented business value in the digital era,” he explained.

4. Don’t become a data hoarder

Many organizations feel they must store everything in order to create an endless supply of valuable data. “Unless someone decides to keep reprocessing all of the data continuously, it is sufficient to create a ‘digestible’ version of the data,” observed Dheeraj Ramella, chief technologist at VoltDB, a firm that offers an in-memory database to support applications requiring real-time decisions on streaming data. “This way, you can refine the model with any new training data.” Once the training has been completed, and the information that’s meaningful to the enterprise is in, one should be able to purge the data outside of the compliance and regulation timeframes.

5. A data lake is not a “prophet-in-a-box”

The truth is that gaining meaningful insights or creating accurate forecasts still requires a significant amount of analytical work and problem-solving using a tool that’s capable of accessing and working the stored data, Haynie advised. “The data lake is just a step in the overall problem-solving process.”

Takeaway

Staying competitive in today’s data-driven world requires a modern analytics platform that can turn information into insight, and both data lakes and data warehouses have an essential role to play, Verma said. “By developing a clear understanding of where they each make sense, IT leaders can help their organizations invest wisely and maximize the value of their information assets.”

 

 



Source link