Tag Archives: Scheduler

Scheduler Changes For Linux 5.15 – Still No Sign Of Any Intel Thread Director Optimizations


LINUX KERNEL --

Ingo Molnar began sending in his pull requests bright and early as usual for the just-opened Linux 5.15 merge window. With the scheduler changes for this next kernel version there are some improvements worth mentioning but also worth mentioning is what hasn’t found its way to the kernel yet: any software optimizations around Intel Thread Director for upcoming Alder Lake processors.

The new scheduler material for Linux 5.15 includes changes for dealing with asymmetric scheduling affinity. This asymmetric scheduling affinity is initially focused around handling of 32-bit tasks on AArch64 systems where some SoCs are having AArch64-only cores not capable of 32-bit (AArch32) execution. The scheduling changes allow defining their own CPU possible mask for tasks to ensure the scheduler will place a given task on a CPU that supports it. Again, initially all focused on the Arm front with legacy 32-bit tasks for some SoCs having 64-bit-only cores.

The scheduler changes for Linux 5.15 also add cgroup SCHED_IDLE support, deadline scheduler improvements, enhanced CPU node-distance determination, and various fixes. The full list of scheduler patches for the Linux 5.15 merge window can be found via this PR.

Notably what isn’t part of this pull request nor have I seen it elsewhere on the kernel mailing list or any prominent staging public Git repositories is any Linux support/optimizations around Intel Thread Director. With upcoming Alder Lake processors there is Thread Director as the new Intel hardware-based functionality for trying to determine the best placement of a given task between its mix of E energy efficiency and P performance cores.

Thread Director is hardware-based for trying to determine the most appropriate task placement among Alder Lake and future Intel hybrid processor designs, but there is a software element at play too. Intel made clear back during Architecture Day that Windows 11 will carry optimizations for Thread Director but wasn’t too clear on the specifics. Intel has also been mum on any Linux software support/optimizations around Thread Director. Well, with no patches queued up for Linux 5.15 that in turn will be out as stable this autumn and with the first Alder Lake processors due out later this year, it doesn’t look like Intel will have any launch-day Linux optimizations in place.

The Linux kernel has long been catering to Arm’s big.LITTLE designs and supporting features around energy aware scheduling and other software improvements on that front, including this work in 5.15 around proper scheduling of tasks if certain cores have reduced capabilities, but we haven’t seen anything on the Intel front in the scheduler or power management areas. P-State has prepared for Alder Lake / hybrid CPU designs but again no kernel activity around the Thread Director front even in early patch form on the LKML.

Thread Director should work fine without any OS engagement given that Microsoft Windows 10 should work fine with Alder Lake without any apparent kernel changes, but in any case we’ll see what comes up in the weeks/months ahead and how the Alder Lake Linux performance is out-of-the-box later this year.


AMD Is Hiring More Linux Engineers For The Scheduler, Memory Management, Net I/O


AMD --

It looks like AMD’s rising marketshare in the data center is paying off as AMD is hiring more Linux kernel engineers.

On top of hiring more Linux engineers earlier this year as part of a client-focused push, it’s been brought to my attention they are now looking to hire several more Linux kernel engineers. This time around they appear to be focused on Linux kernel work in the server / EPYC space but some of that work does also carry over to benefit AMD desktop/mobile efforts as well.

AMD’s latest round of job postings pertaining to Linux are based out of a mix of Austin, Texas and Bangalore, India. Among their many Linux job openings at the moment:

Linux Kernel – Scheduler Development Lead – Great to see AMD working on kernel scheduler improvements with an emphasis on performance. Hopefully this will also carry over to CPUFreq / CPPC with Schedutil and related work there. There is apparently a “small scheduler focused team” building up by AMD in India with a focus on optimizations/features for EPYC.

Linux Kernel – Memory Management Development/Performance Lead – Again, great seeing AMD working more on low-level Linux kernel infrastructure and more performance optimizations.

Networking I/O Lead – In Austin, it’s great to see AMD hiring for multiple networking I/O related Linux positions to focus on performance and low-latency I/O and scalability.

Linux Virtualization Performance Lead – One of several positions around Linux virtualization for EPYC with a focus on KVM/QEMU.

See more of AMD’s open Linux-related positions via jobs.amd.com.

While they have worked to ensure sufficient launch-day Linux support across their product portfolio, they have rather neglected any core infrastructure improvements for Linux the past number of years since they shutdown the OSRC. So it’s great to see them seemingly getting more talent to work on upstream core kernel improvements beyond just new product enablement and working to tune their hardware for optimal performance/functionality on Linux. This is one of the areas where Intel has long held an advantage over other vendors is with their vast teams of experienced open-source/Linux engineers working not only on hardware bring-up but also improving the Linux kernel and related components around power management, scheduler enhancements, and more along with ensuring other key software is well tuned for their hardware.


Looking At An Early Performance Regression In Linux 5.13 – Scheduler Related


LINUX KERNEL --

Since the Linux 5.13 merge window began settling down and especially now with 5.13-rc1 out the door, I’ve been ramping up performance testing of the Linux 5.13 kernel. So far I’ve been seeing one area where the kernel is regression and stems from the scheduler changes this cycle.

I’m still early on in the benchmarking process in testing a range of systems with Linux 5.13 compared to 5.12 stable, but from testing on an Intel Core i9 11900K “Rocket Lake” system in particular was a bit intrigued by one of the performance drops on 5.13 that led me to looking closer at it yesterday…

On the Rocket Lake system as well as a Core i5 9400F system (among the few systems tested so far with preliminary benchmarks), the context switching performance as measured by the well known Stress-NG took a nose dive…

But when running ctx-clock for measuring the context switching time, the performance with 5.13 was unchanged. So with the i9-11900K box with Git and the Phoronix Test Suite were used to bisect this dramatic difference in performance on Linux 5.13…

The difference is very noticeable and thus easy to bisect… Linux 5.13-rc1 with Stress-NG context switching is at roughly 70% the performance of 5.12 stable.

Well, up until bisecting at the very end there was some fluctuation in the outcome but those remaining tests all well off their Linux 5.12 performance.

# possible first bad commit: [c722f35b513f807629603bbf24640b1a48be21b5] sched/fair: Bring back select_idle_smt(), but differently

# possible first bad commit: [6db12ee0456d0e369c7b59788d46e15a56ad0294] psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files

# possible first bad commit: [0a2b65c03e9b47493e1442bf9c84badc60d9bffb] sched/topology: Remove redundant cpumask_and() in init_overlap_sched_group()

The commits in question leading to the drop in Stress-NG context switching performance all point back to scheduler changes introduced for Linux 5.13. That though is where I am at for the moment. Given my limited time and resources, for now firing up more (Phoronix Test Suite automated) benchmarks and on more systems to see what other real-world workloads may be seeing Linux 5.13 performance changes that could also be attributed to those commits or if the scheduler alterations are fairly isolated in their negative impact. Those that appreciate my relentless Linux benchmarking and daily content on Phoronix can show their support by joining Phoronix Premium to help facilitate further testing and bisecting.


The Linux Kernel’s Scheduler Apparently Causing Issues For Google Stadia Game Developers


LINUX KERNEL --

Among the issues that game developers have been facing in bringing their games to Linux for Google’s Stadia cloud gaming service apparently stem from kernel scheduler issues. We’ve known the Linux kernel scheduler could use some improvements and independent developers like Con Kolivas with BFS / MuQSS have pushed for such, but hopefully in 2020 we’ll see some real action.

Game/C++ developer Malte Skarupke wrote a post about how bad the Linux kernel scheduler is and that solutions like MuQSS are an improvement but not complete. Malte noted, “I found that most mutex implementations are really good, that most spinlock implementations are pretty bad, and that the Linux scheduler is OK but far from ideal. The most popular replacement, the MuQSS scheduler has other problems instead. (the Windows scheduler is pretty good though).

The latest kernel scheduler woes appear to be game/engine developers hitting issues in readying their software for Google Stadia. “So this all started like this: I overheard somebody at work complaining about mysterious stalls while porting Rage 2 to Stadia. The only thing those mysterious stalls had in common was that they were all using spinlocks. I was curious about that because I happened to be the person who wrote the spinlock we were using. The problem was that there was a thread that spent several milliseconds trying to acquire a spinlock at a time when no other thread was holding the spinlock. Let me repeat that: The spinlock was free to take yet a thread took multiple milliseconds to acquire it. In a video game, where you have to get a picture on the screen every 16 ms or 33 ms (depending on if you’re running at 60hz or 30hz) a stall that takes more than a millisecond is terrible. Especially if you’re literally stalling all threads. (as was happening here) In our case we were able to make the problem go away by replacing spinlocks with mutexes.

In a comment by MuQSS lead developer Con Kolivas, Malte responded, “I know that we were not the only developers who had problems with the scheduler on Stadia. And Google is very aware of the problem. They care a lot about latency because latency is super important for the Stadia experience. And one of the ways they’ve reduced latency is to run games at 60hz that run at 30hz on console. But that means you only have 16ms to get a frame on the screen, and if the scheduler gives you a random hitch of a millisecond, you’re screwed. So this might be an opportunity for you to get [the MuQSS] scheduler used by more people and to maybe eventually get it into the mainline kernel. If you can solve the problem that ticket_spinlock ran into, I would recommend your scheduler over the default scheduler unreservedly. And maybe you can reach out to Google and see if they want to use your scheduler for Stadia.

He also posted some mutex benchmark code that I’m now looking at for possible PTS usage in comparing the kernels. Read more particularly on the spinlocks vs. mutexes performance via this blog post.

Let’s hope for scheduler improvements to the Linux kernel in 2020 and maybe even seeing MuQSS mainlined if there becomes enough support.


Chromebooks Switching Over To The BFQ I/O Scheduler


GOOGLE --

On Chromebooks when moving to the latest Chrome OS that switches over to a Linux 4.19 based kernel, BFQ has become the default I/O scheduler.

BFQ has been maturing nicely and as of late there’s been an uptick in interest around this I/O scheduler with some also calling for it to be used by default in distributions. Google has decided BFQ is attractive enough to enable by default for Chromebooks to provide better responsiveness.

In our own tests, particularly with slower storage mediums, BFQ delivers good results on recent kernel releases. BFQ aims for low latency on interactive and soft real-time tasks while still being capable of achieving high throughput, among other benefits.

Below is a demo by BFQ developer Paolo Valente on the responsiveness of BFQ on Chromebooks.