Tag Archives: Discussion

You Can Be at Interop, Even When You Aren’t | IT Infrastructure Advice, Discussion, Community

Travel budget tight? Staff shortage? Other business commitments? For whatever reason, you might not be able to make it to the lights of Las Vegas and the educational programs at Interop19 next week.

Well, you can be there in the virtual realm. The Interop team is offering a live stream of the keynote programs for both Wednesday and Thursday, May 22 and 23. So, you can catch the presentations by some of the top experts their IT fields while sitting in the comfort of your home or office.

Here’s the lineup:

On Wednesday at 10 am (Pacific) the program kicks off with the Shaping the Future of IT: CIO Lightning Series. Three top IT executives — Renee McKaskle(CIO, Hitachi Vantara), Beth Niblock (CIO, City of Detroit) and Judd Williams(CIO, NCAA) — will present 5-minute lightning talks on what they think is the most critical (and positive) thing happening right now in their IT organization.

Those 5-minute talks will be followed by a 15-minute moderated discussion on what’s impacting and influencing the future of their IT organizations.

Read the rest of this article on InformationWeek

Source link

3 Net Monitoring Metrics to Deal with Performance Degradation | IT Infrastructure Advice, Discussion, Community

Network Performance Monitoring using flow data (NetFlow) is an approach to isolate the root cause of performance issues related to network traffic by measuring a set of characteristics across L2-L7 layers.

There are three basic causes of performance issues: round trip time, server response time, and jitter. Each can contribute to low performance and downtimes. Let’s examine each one.

1. Round trip time

Also called network delay, round trip time represents a data transfer time of a packet being transmitted from client to server and back. It is a single value that models the performance of the network itself, calculated by observing the time needed to establish a TCP session. A typical value in enterprise networks in one location is less than 1 ms (even tens of microseconds) as on the local network. An application has no impact on the TCP handshake as this is part of the TCP/IP stack implemented in the operating system itself. It would require an operating system malfunction to influence this metric which won’t happen in practice. Here are some typical root causes of network delays.

Overload of network devices: High packet rates impact buffers in network devices where packets need to wait to be dispatched. QoS can help to prioritise critical services to a certain extent but experiencing a DDoS attack may lead to network congestion and increased values of RTT.

Clients working from remote locations: Complaining about slow application responses might not always be the case. Having an RTT of 500ms when connecting from home through a VPN to a company data centre means that just to transmit the packet takes half a second and any application will look slow from a user’s perspective.

Cloud applications: To lower the delay, SaaS providers use CDNs and proxy servers to host the application as close to customers as possible. For the same reason large companies purchase dedicated lines to connect their infrastructure directly to cloud providers.

Ethernet vs. Wi-Fi: In my practical experience, the usual performance difference between wired Ethernet connection and WiFi is around 10ms. So 10ms is the average penalty you get when going through WiFi instead of wired Ethernet connection. And we are still talking about ideal conditions.

Performance bottleneck caused by heterogeneous port speeds: Imagine a 10G backbone while servers are connected through 1G, especially when multiple servers share such a 1G uplink. Numerous clients can easily generate traffic that will spike above 1G port capacity, saturating switch buffers, which leads to packet drops. Such packets need to be retransmitted and consecutively users experience a network delay.

 2. Server response time

This metric represents the request processing time on the server side and so represents the delay caused by the application itself. The measured server response time expresses the time difference between the predicted observation time of the server’s ACK packet (prediction based on observation time of the client request and previously measured RTT value) and the actual observation time of the server’s response. The measurement can’t rely on observing an ACK packet from the server since the ACK packet might be merged with the server’s response.

SRT enables a performance measurement of the whole application, per application server, per client network range or even individual clients. This enables finding correlations between application performance and a number of clients or a specific time of the day. Using this metric together with RTT answers the ultimate question. Is it a network issue or application issue?

3. Jitter – variance of delay between packets

Jitter can show irregularities in packet flow by calculating the variance of individual delays between the packets. In an ideal case, delay between the individual packets is a constant value, which means that jitter is 0. In reality, having a jitter value of 0 doesn’t occur as a variety of parameters might influence the data stream. Why should we measure jitter anyway? Jitter is critical and has the main value for assessing the quality of real-time applications, such as conference calls and video streaming. But also when downloading, e.g. a Linux distribution ISO file of Linux distribution from a mirror, jitter may indicate an unstable network connection.


Continuous monitoring and baselining of network performance monitoring metrics by using flow data helps network administrators to identify an issue in the network itself, specific connections or applications. It’s valuable to reveal problems before users do and prevent complaints on performance degradation. Long term monitoring of network performance metrics (RTT, SRT, Jitter) can help to predict future needs (capacity planning) and incidents.

Network performance monitoring metrics can considerably improve the performance of the network as well as contributing to the improvement of the application side.



Source link

NVMe: Lower Prices, More Features Expand Use Cases | IT Infrastructure Advice, Discussion, Community

The wide adoption of Non-Volatile Memory Express (NVMe) over the last few years has completely revolutionized the storage industry in no small part due to lower prices and better performance. With the introduction of more features, such as management, more enterprises and hyperscale data centers are migrating to NVMe. The introduction of NVMe over Fabrics (NVMe-oF) promises to accelerate this trend for enterprises using a variety of infrastructures.

NVMe is architected with a layered approach, which enables NVMe data to be carried over a variety of fabric transport technologies such as RDMA (RoCE, iWARP, Infiniband), Fibre Channel, and now, TCP/IP.

NVME technologies and use cases

NVMe/FC: Fibre Channel has a long legacy as a reliable storage networking technology and has a home in many enterprise datacenters. While the Fibre Channel (FC) community has consolidated over the years, the technology is still moving forward with plans for 128G FC. The recent release of the FC-NVMe specification extends the capability of Fibre Channel SANs to carry NVMe protocol, and therefore efficiently attach NVMe SSDs.

This is an extremely important point for existing Fibre Channel customers. For enterprise datacenters with investments in Fibre Channel infrastructure, a software upgrade can enable FC-NVMe traffic to be sent alongside FCP traffic (Fibre Channel Protocol, which has its roots in SCSI protocol) on the same network, using the same infrastructure, thus extending the life of those infrastructure investment, and creating an easy path to upgrade backend storage media to NVMe. 

NVMe/RoCE: Another transport technology being leveraged for NVMe-oF is RoCE (RDMA (Remote Direct Memory Access) over Converged Ethernet). RDMA is a good choice for carrying NVMe, since it’s designed with memory access in mind, so it maps well for NVMe, which similarly, is designed for accessing flash memory. RDMA over Converged Ethernet is a collection of protocols which add capability to ethernet to take care of congestion management and robustness. However, this isn’t free. To properly deploy RoCE, users need to use RoCE capable NICs and switches. It’s important to note that RoCE capable NICs and switches can cost more than regular ethernet NICs and switches, due to their increased capability.

Most NVMe/RoCE solutions are focused on single rack deployments that need the absolutely lowest possible latency. Naturally, keeping the data as close as physically possible to the compute resources, and minimizing hops in between, is essential to keep latency low. However, those physical limitations also limit scalability.

There is some debate about whether RoCE or FC will deliver the absolute lowest latency. Each user will need to examine their own workload characteristics, as well as existing infrastructure to determine which is the right choice for their deployment.

NVMe/TCP: TCP is the newest transport protocol that has been adopted by NVMe-oF. In particular this is an important new transport type for NVMe because it can be used on regular datacenter ethernet switches, an important distinction from RoCE. While a TCP fabric may not offer the same ultra-low latency of RoCE fabrics, they do have advantages in scalability.

One issue that NVMe/TCP is well poised to alleviate is the issue of stranded flash. Flash storage can be expensive, so naturally users want to ensure that they are getting the best utilization possible. Many early NVMe deployments were servers with NVMe SSDs directly attached via PCIe. It was difficult to share these storage resources between servers while maintaining the low latency that the investment in NVMe was made for. Thus, many users were dealing with low utilization due to stranded storage.

NVMe/TCP allows storage to be shared across a TCP network with low latency in a manner that allows for better sharing of storage resources, which can eliminate the issues around stranded storage. From a cost perspective, users will be getting much more out of their investment in flash storage, and this will drive NVMe adoption even further in the datacenter.

There is a lot of well-deserved excitement around NVMe/TCP. Its ability to bring the benefits of flash storage to TCP networks will have huge implications for the adoption of NVMe-oF, and for its use in datacenter scale composed infrastructure systems. However, each NVMe-oF transport has its own strengths and ideal use cases. Anyone deploying NVMe-oF would do well to examine their existing infrastructure investments, IT roadmaps, and expected workloads in order to choose the flavor of NVMe-oF that will work for them.

Source link

30x Faster Modern Segmentation for Enterprises | IT Infrastructure Advice, Discussion, Community

“No more soft chewy centers.”  With this quote, John Kindervag of Forrester introduced the world to the Forrester Zero Trust model. More importantly, he exposed the reality that modern data centers, whether they be on premises, in clouds, or a combination of both, are open, vulnerable, and easy targets of attack and exploit. 

By far the biggest problem enterprise administrators face is that data centers lack tools to easily implement and manage segmentation techniques. Due to the dynamic nature and heterogeneous platforms now utilized, legacy firewalls, VLANs, ACLs, and security groups are no longer effective means to segment in the data center environment. The fluid nature of these environments has created enterprise networks that have coarse, flat segments, due to the inability for traditional network security to keep up.

Furthermore, the fact that segmentation best practices in these environments are lacking is made even worse by several trends. IoT and VDI initiatives have added devices and users into data centers but have not been segmented or isolated off create additional risk.  And data centers, often open to include business partners, distributors, customers, contractors, and vendors, are at risk from these third parties who can be considered the weakest links, introducing their own security risks to the supply chain. One can look at several recent examples of “cross-contamination,” where attackers used various methods to breach an enterprise either by targeting a weaker, easier to exploit third party, breached a VDI user, or taken advantage of an IoT device first. Beyond the risk of attack, segmentation is also often required for industry regulatory compliance like SWIFT, PCI, HIPPA, and others. Facing potential regulatory penalties, enterprises need to be able to demonstrate they are taking appropriate measures to be compliant by isolating particular workloads, assets, and applications.

For all these reasons, operators of these enterprise environments are taking a closer look at modern, software-defined segmentation techniques. Advances in modern segmentation have made it a viable option for all types of companies. Addressing key portions of the people, workloads, and network elements of the model, modern segmentation is arguably the optimal choice for achieving zero-trust security. Of equal importance, with the right tools and a little thoughtful planning, modern segmentation can be implemented more quickly and easily than the aforementioned methods and is easier to manage and maintain as well. In fact, recent testing has demonstrated that modern segmentation can reduce time to deployment as much as 30 times compared to traditional firewall implementation.  Those time savings and efficiencies translate to significantly lower costs over the deployment lifecycle.

The limits to legacy methods of segmentation

To understand the advantages of segmentation, it is useful for comparative purposes to look at some of the drawbacks and limitations of standard techniques employed both on-premise and in the cloud. These might include some combination of physical or virtualized firewalls, VLANs, ACLs, and virtualized private clouds (VPCs) use of security groups. In general, these methods are resource and labor intensive. Creating security policies is a cumbersome process. Moves, adds, changes, and deletes need to be performed manually, creating a drag on ongoing operational efficiency and raising the risk of vulnerability.

Firewalls, even when virtualized are expensive to acquire and complex to set up. They also create circuitous “hairpins” that ultimately impede system performance. As the industry is learning, firewalls are not intended for segmentation within the data center, and, in fact, some providers will readily admit that firewalls simply don’t belong there.

Perhaps the greatest drawback, however, is that conventional security controls (firewalls, VLANs, ACLs, VPC, security groups) do not reduce the attack surface sufficiently. Cloud-based security groups, hypervisor firewalls and other traditional techniques focus only on the machine and port level rather than providing protection at the application process level. This means any processes, including malicious ones, can easily get by port-based rules, thereby exposing applications to threats that have successfully breached the perimeter.

Next page: Modern segmentation steps in 

Source link

DevOps Influence on Infrastructure Management | IT Infrastructure Advice, Discussion, Community

How a person defines DevOps often depends on their scope of interest/responsibility within an IT environment. Those with an infrastructure management background are going to lean towards an Infrastructure as Code (IaC) definition. Application developers typically focus on application development processes and agility. There are also people who tell you that DevOps is an end-to-end solution combining infrastructure management and application development.

I see Infrastructure management and application development as two separate disciplines, where infrastructure is used to build a platform consumed by application developers. Application developers can configure the platform as required within the constraints of said platform.

There is no doubt that DevOps principals have influenced how infrastructure is managed at scale using IaC strategies. However, there is a point where DevOps principals may no longer be relevant for IaC, especially when it comes to the management of bare metal.

Continuous integration/continuous delivery (CI/CD) pipelines play a core role in shipping code from development to production; within the pipeline unit and integration tests are run against the code to ensure reliable delivery and reduce risk of faults.

Many hardware vendors supply emulators to replicate the behaviour expected from the hardware platforms. An emulator’s replication of a physical hardware platform should typically be considered a best effort and not necessarily an entirely accurate a representation. A VM running the Network Operating System (NOS) for a white box switching solution would not be able to test how a change impacts the performance of an ASIC contained within physical switches.

Some companies can afford to purchase enough hardware dedicated to testing new workflows or updated impacts, others cannot. The level of accuracy between the test and production environment determines the testing reliability of a CI/CD pipeline.

Introducing version control to track changes is one of the most significant value propositions that IaC provides to operational teams. Issues caused by manual infrastructure changes can be incredibly challenging to troubleshoot as sometimes the intended change isn’t exactly what was changed. Increasing the number of changes made using IaC reduces the amount of time it takes to find which settings were changed.

CI/CD pipelines are frequently integrated with version control systems, enabling automatic execution of pipelines when a repo receives a new commit. If the hardware vendor provides emulators for their hardware platform, the pipeline should build a virtual environment to represent the current environment to run integration testing.

Test results can be used to provide more than simple pass / fail validation, and tests can be used to determine the impact of a change which can then be used to determine if additional steps are required before, during or after application of a new configuration.

Many environments and application services have a state that can impact how seamless a failover is. In a virtualised environment, higher workload density increases the number of applications potentially impacted by an interruption caused by a change, even if that impact is only a blip.

Using a CICD pipeline to detect interruptions caused by the change allows for better change planning or incorporation of steps to perform workload migrations and clean failovers. The use of emulators might be adequate for this level of testing; however, physical reproduction is always a better option.

Continuous iterations required

Working towards a high degree test coverage requires continual iterations which include lessons from previous successes and failures. Agile project management strategies provide a practical framework for managing iteration work in progress.

Physical infrastructure isn’t ephemeral unless you live in the Twilight Zone physical devices do not suddenly appear and disappear from racks. There are configuration changes which can be performed on demand and those which cannot.

Storage platforms have supported storage nodes as individual nodes within a cluster, allowing for the addition and removal of nodes as required. However, the process of changing storage nodes places additional load on the storage solution while data rebalancing operations occur or evacuated. Some changes may require that some protection features are disabled or tuned down to prevent unneeded load on the system. Typically, these are the types of changes which build the foundation of a storage solution provided for consumption.

There are many areas when DevOps principals influence and improve IaC strategies; however, physical hardware management is different from software management, and the suitability of different principals varies between environments and goals.

Source link