Tag Archives: Performance

Firefox 71 Linux Performance Isn’t Looking All That Great


With each new release of Firefox we set out to see how the performance is looking on the Linux desktop. One discovery we’ve made is that when using Intel’s Clear Linux the Firefox performance is a lot more competitive to Google Chrome than we traditionally see on Ubuntu Linux. But with Firefox 71 we’re seeing the performance trending lower compared to Firefox 69 and 70.

Here are some benchmarks of Firefox 69 / 70 / 71 builds using the official Mozilla binaries along with Chrome 78. All of the benchmarks freshly done from the same system that this time around was running Clear Linux.

Considering the end of the year is quickly approaching, I’m also working on a much larger Firefox Linux performance comparison going back many more releases. Stay tuned for that soon.

In some of the JavaScript benchmarks, Chrome continues to win over Firefox by a landslide. In the case of ARES-6, the performance is unchanged with Firefox 71.

With Octane, the Firefox 71 performance is pulling back slightly while Chrome remained faster than Firefox on Linux.

WebXPRT was also trending lower with Firefox 71 but at least here the Mozilla browser outperformed Chrome.

Basemark was also slower with the newly-minted Firefox 71 while Chrome 78 is much faster.

JetStream regressed with Firefox 71.

The HTML5 Canvas benchmark of CanvasMark was about the same and doing much better with Firefox at least on the Intel graphics.

Firefox 71 tended to be either the same speed or slower compared to Firefox 70.

Firefox continues running faster than Chrome at least with WebAssembly. Firefox 71 though overall wasn’t too exciting on the Linux performance front with either being the same speed as Firefox 70 or slower. At least though Firefox 71 does bring a few new features.

Linux 5.5 Seeing Some Wild Swings In Performance – Improvements But Also Regressions


While there still is a week to go in the Linux 5.5 merge window with more feature code still landing, due to scheduler changes and other work already having landed, I already started running some Git benchmarks. Linux 5.5 at this stage appears quite volatile with some really nice improvements in some workloads but also regressions in others.

I started off some Linux 5.5 Git benchmarks a few days ago after seeing the scheduler changes land that are rather heavy this cycle and other work. Plus I wanted to test out some new features like the NVMe hwmon thermal reporting.

I started off with a Cascade Lake system just as it was available and curious to see the impact with a large server.

To much surprise, there were much more swings than we are used to seeing between kernel revisions — especially with no change in mitigations for any security vulnerabilities between Linux 5.4 and 5.5 Git… Some workloads like Rodinia and Parboil were seeing real improvements with Linux 5.5. The Facebook-developed Hackbench Linux kernel scheduler benchmark was also seeing significant improvements with Linux 5.5, likely explaining at least some of the 5.5 benefits coming from the new scheduler code. But there were also cases like PostgreSQL, Memcached, Pennant, and others seeing severe to small regressions where Linux 5.4 was running faster. (See all of this system’s data via this OpenBenchmarking.org result file.)

Not used to seeing such large shifts in Linux kernel performance especially with no mitigation changes, I fired up a completely separate system and ran largely the same benchmarks while again comparing Linux 5.4.0 to Linux Git this week.

On this Skylake server, Hackbench was again much faster on Linux 5.5 likely pointing to the scheduler changes explaining the boosts. Parboil and Rodinia continued to see improvements while Memcached, PostgreSQL, and Stress-NG System V message passing tests were among the cases seeing lower performance. So for many of the same workloads, they reproduced on this different Linux server. (All the data via this OpenBenchmarking.org result file.)

Curiosity got the best of me so I also fired up an AMD EPYC 7601 2P Dell PowerEdge server.

Wild swings again. Stress-NG System V Message Passing, PostgreSQL, and some other micro-benchmarks also showed better performance on Linux 5.4 than 5.5 Git… Meanwhile, Hackbench, Rodinia, and Parboil all showed improvements off Linux 5.5 Git. So at least from a quick examination, this AMD server is seeing similar behavior out of Linux 5.5 Git to the Intel Xeon boxes. (The EPYC result file.)

So while only half-way through the merge window, there does appear to be some compelling performance improvements to find with Linux 5.5 at least for larger systems. But at the same time there are some workloads seeing clear pull-backs in performance compared to Linux 5.4 stable. As the Linux 5.5 merge window settles down I’ll be carrying out more benchmarks as hadn’t expected to see such swings and normally don’t between kernel releases at this stage. Those enjoying all the Linux benchmarking found at Phoronix, you can show your support this holiday.

Intel Core i9 10980XE Linux Performance Benchmarks Review

Intel today is rolling out the Core i9 10980XE as their new Cascade Lake X-Series processor that features 18 cores / 36 threads with a maximum turbo frequency of 4.6GHz and TBM 3.0 frequency of 4.8GHz. Following a last minute change, Intel moved up the embargo lift time of the Core i9 10980XE so here are the results we can share with you right now.

The Intel Core i9 10980XE Cascade Lake processor features the same core / thread count as the previous Core i9 9980XE and i9 7980XE but now with a 3.0GHz base frequency, 4.6GHz peak turbo frequency, 4.8GHz Turbo Boost Max 3.0 frequency, DDR4-2933 quad channel memory support rather than DDR4-2666, and the L1TF/Meltdown hardware mitigations in place. The cache size remains the same at 24.75MB and the processor having a 165 Watt TDP.

Besides this X-Series upgrade to Cascade Lake and the technical improvements, the biggest change is much more aggressive pricing out of these Intel HEDT processors. While previous top-end X-Series processors have retailed for $1800~1900 USD, in order to be competitive with AMD Ryzen Threadripper, the Core i9 10980XE is launching at just $979 USD. Basically the processor pricing is halved in order to fend off Threadripper.

Yes, AMD is launching today their new Threadripper 3960X/3970X processors too. Originally the embargo launch time was the same for the Core i9 10980XE and Threadripper, but a few days ago Intel decided to move up the embargo lift time… So right now we can share the i9-10980XE performance numbers but you will need to wait a few hours for our AMD Linux review before you can see how this Intel 18-core CPU compares to the 24-core Threadripper 3960X and Threadripper 3970X.

Zombieload V2 TAA Performance Impact Benchmarks On Cascade Lake

While this week we have posted a number of benchmarks on the JCC Erratum and its CPU microcode workaround that introduces new possible performance hits, also being announced this week as part of Intel’s security disclosures was “Zombieload Variant Two” as the TSX Async Abort vulnerability that received same-day Linux kernel mitigations. I’ve been benchmarking the TAA mitigations to the Linux kernel since the moment they hit the public Git tree and here are those initial benchmark results on an Intel Cascade Lake server.

While Intel’s latest-generation Cascade Lake server processors have hardware protections against other MDS vulnerabilities like RIDL and Fallout, they require software mitigations for Zombieload V2 / TAA. Researchers had disclosed this Zombieload variant back to Intel earlier in the year but was placed under an extended embargo and not revealed back during the original May disclosures.

Besides Cascade Lake, other Intel CPUs requiring the extra TAA mitigations are Whiskey Lake and Coffeelake-R processors — at least those where Intel TSX (Transactional Synchronization Extensions) are supported. Those wanting to learn more about all of the intracices of Zombieload V2 / TSX Async Abort can see ZombieloadAttack.com and the Intel Deep Dive. For your viewing pleasure in this article are the initial Cascade Lake benchmarks following Linux’s TAA mitigations landing. Details on the Linux kernel’s TAA mitigations can be found via this documentation.

For this Cascade Lake testing, which is also believed to be the first public benchmarks of the TAA Linux mitigations anywhere, tests were done on a dual Intel Xeon Platinum 8280 server. The server platform in use was the Gigabyte S451-3R0 Xeon Scalable, kindly provided by Gigabyte.

During this benchmarking the server was running Ubuntu 19.10 with the Linux 5.4 Git kernel. Being compared in this article was the new TAA mitigations by default when TSX is enabled, the performance impact when disabling the mitigation (using the new tsx_async_abort=off switch), and the performance when simply disabling Intel TSX using the new tsx=off switch.

This article isn’t comparing the combined impact of the other speculative execution mitigations, the JCC Erratum, or any other combinations. Follow-up articles will be looking at the different combinations while for today is just seeing what this new TSX Async Abort code in the kernel presents. Also keep in mind for all these tests today SMP/HT was left enabled, but again the current no-HT performance is something that will be revisited in the future.

When firing up different benchmarks found to be impacted by the TAA mitigations, the geometric mean of those results pointed to the Cascade Lake server running just under 8% slower from the new kernel mitigation this week on affected workloads. Meanwhile disabling TSX and running TSX without any mitigations yielded similar performance.

Now let’s look at the individual benchmark results.

Netflix Optimized FreeBSD’s Network Stack More Than Doubled AMD EPYC Performance

BSD --

Drew Gallatin of Netflix presented at the recent EuroBSDcon 2019 conference in Norway on the company’s network stack optimizations to FreeBSD. Netflix was working on being able to deliver 200Gb/s network performance for video streaming out of Intel Xeon and AMD EPYC servers, to which they are now at 190Gb/s+ and in the process that doubled the potential of EPYC Naples/Rome servers and also very hefty upgrades too for Intel.

Netflix has long been known to be using FreeBSD in their data centers particularly where network performance is concerned. But in wanting to deliver 200Gb/s throughput from individual servers led them to making NUMA optimizations to the FreeBSD network stack. Allocating NUMA local memory for kernel TLS crypto buffers and for backing files sent via sentfile were among their optimizations. Changes to network connection handling and dealing with incoming connections to Nginx were also made.

For those just wanting the end result, Netflix’s NUMA optimizations to FreeBSD resulted in their Intel Xeon servers going from 105Gb/s to 191Gb/s while the NUMA fabric utilization dropped from 40% to 13%.

The AMD EPYC performance is even more impressive in going from 68GB/s to 194GB/s. So while EPYC started out much slower than Xeon, the Netflix AMD EPYC servers are now closer than Intel for achieving 200Gb/s performance.

Not only is EPYC faster, but thanks to the 128 PCIe lanes per socket they are able to get by in one socket what they are using two Intel Xeon CPUs for otherwise. One area that AMD was critiqued for is the inability for Netflix to monitor the Infinity Fabric saturation as “AMD’s tools are lacking (even on Linux).”

In the end they are now effectively at 200Gb/s encrypted video streaming from FreeBSD per server. More details via this interesting slide deck.