Tag Archives: Benchmarks

GCC 10 Link-Time Optimization Benchmarks On AMD Threadripper

Stemming from the recent news in Fedora 32 potentially LTO’ing packages by default for better performance and not yet having checked on the Link-Time Optimization performance of the in-development GCC 10, here is a fresh look at the possible performance gains from making use of link-time optimizations for generating faster binaries. This round of testing was done on the AMD Ryzen Threadripper 3960X and is complementary to the recent Profile Guided Optimization benchmarks.

This round of testing was on the Ryzen Threadripper 3960X while running Ubuntu 19.10 and with the Linux 5.4 kernel. GCC 10.0 as of December was used for testing as the newest snapshot at the time of testing and building the compiler in release mode.

GCC 10 was used to build a variety of C/C++ software packages with the Phoronix Test Suite. The base round of testing was done when setting “-O3 -march=native” for the base metrics, then testing link-time optimizations with “-O3 -march=native -flto”, and then lastly a run with “-O3 -march=native -flto -fwhole-program.” Per the GCC documentation on the whole-program option, “Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.

Sabrent Rocket 4.0 NVMe Gen4 Linux Benchmarks Against Other SATA/NVMe SSDs

When it comes to PCIe 4.0 NVMe SSDs, the drives we have been using are the Corsair Force MP600 that have been working out great for pairing with the newest AMD Ryzen systems. But a Black Friday deal had the Sabrent 1TB Rocket NVMe 4.0 Gen4 PCIe M.2 SSD on sale, so I decided to pick one up to see how it was performing on Ubuntu Linux. Here are benchmarks of the Sabrent Gen4 NVMe SSD, which in the 1TB capacity can be found for $150~170 USD.

The Sabrent 1TB Rocket NVMe 4.0 Gen4 (SB-ROCKET-NVMe4-1TB) features Toshiba BiCS4 96L BGA132 TLC NAND flash memory, Phison PS5016-E16 flash controller, and Sabrent rates its performance for sequential reads up to 5000MB/s and sequential writes up to 4400MB/s. Obviously for hitting those peak performance figures this solid-state drive needs to be installed in a PCI Express 4.0 M.2 slot.

There were not any Linux compatibility issues with the Sabrent Rocket 4.0 NVMe SSD, which would be a rare encounter. The drive was working out fine on Ubuntu 19.10 and other recent Linux distributions on modern kernel releases.

For getting an idea as to its performance potential, a variety of other NVMe and SATA 3.0 SSDs were benchmarked including the Corsair Force MP600 1TB, Intel 760p 256GB, Intel Optane 900p 280GB, Micron 9300 4TB, Samsung 970 EVO 500GB, and Samsung 970 PRO 512GB for giving an idea as to the performance of the Sabrent Rocket 4.0 compared to the same size Corsair Force MP600 and then various other solid-state drives for reference.

Benchmarking was done on Ubuntu 19.10 with the Linux 5.4 kernel and each drive was freshly formatted to EXT4. Via the Phoronix Test Suite a wide range of storage tests were carried out.

Intel Core i9 10980XE Linux Performance Benchmarks Review

Intel today is rolling out the Core i9 10980XE as their new Cascade Lake X-Series processor that features 18 cores / 36 threads with a maximum turbo frequency of 4.6GHz and TBM 3.0 frequency of 4.8GHz. Following a last minute change, Intel moved up the embargo lift time of the Core i9 10980XE so here are the results we can share with you right now.

The Intel Core i9 10980XE Cascade Lake processor features the same core / thread count as the previous Core i9 9980XE and i9 7980XE but now with a 3.0GHz base frequency, 4.6GHz peak turbo frequency, 4.8GHz Turbo Boost Max 3.0 frequency, DDR4-2933 quad channel memory support rather than DDR4-2666, and the L1TF/Meltdown hardware mitigations in place. The cache size remains the same at 24.75MB and the processor having a 165 Watt TDP.

Besides this X-Series upgrade to Cascade Lake and the technical improvements, the biggest change is much more aggressive pricing out of these Intel HEDT processors. While previous top-end X-Series processors have retailed for $1800~1900 USD, in order to be competitive with AMD Ryzen Threadripper, the Core i9 10980XE is launching at just $979 USD. Basically the processor pricing is halved in order to fend off Threadripper.

Yes, AMD is launching today their new Threadripper 3960X/3970X processors too. Originally the embargo launch time was the same for the Core i9 10980XE and Threadripper, but a few days ago Intel decided to move up the embargo lift time… So right now we can share the i9-10980XE performance numbers but you will need to wait a few hours for our AMD Linux review before you can see how this Intel 18-core CPU compares to the 24-core Threadripper 3960X and Threadripper 3970X.

Zombieload V2 TAA Performance Impact Benchmarks On Cascade Lake

While this week we have posted a number of benchmarks on the JCC Erratum and its CPU microcode workaround that introduces new possible performance hits, also being announced this week as part of Intel’s security disclosures was “Zombieload Variant Two” as the TSX Async Abort vulnerability that received same-day Linux kernel mitigations. I’ve been benchmarking the TAA mitigations to the Linux kernel since the moment they hit the public Git tree and here are those initial benchmark results on an Intel Cascade Lake server.

While Intel’s latest-generation Cascade Lake server processors have hardware protections against other MDS vulnerabilities like RIDL and Fallout, they require software mitigations for Zombieload V2 / TAA. Researchers had disclosed this Zombieload variant back to Intel earlier in the year but was placed under an extended embargo and not revealed back during the original May disclosures.

Besides Cascade Lake, other Intel CPUs requiring the extra TAA mitigations are Whiskey Lake and Coffeelake-R processors — at least those where Intel TSX (Transactional Synchronization Extensions) are supported. Those wanting to learn more about all of the intracices of Zombieload V2 / TSX Async Abort can see ZombieloadAttack.com and the Intel Deep Dive. For your viewing pleasure in this article are the initial Cascade Lake benchmarks following Linux’s TAA mitigations landing. Details on the Linux kernel’s TAA mitigations can be found via this documentation.

For this Cascade Lake testing, which is also believed to be the first public benchmarks of the TAA Linux mitigations anywhere, tests were done on a dual Intel Xeon Platinum 8280 server. The server platform in use was the Gigabyte S451-3R0 Xeon Scalable, kindly provided by Gigabyte.

During this benchmarking the server was running Ubuntu 19.10 with the Linux 5.4 Git kernel. Being compared in this article was the new TAA mitigations by default when TSX is enabled, the performance impact when disabling the mitigation (using the new tsx_async_abort=off switch), and the performance when simply disabling Intel TSX using the new tsx=off switch.

This article isn’t comparing the combined impact of the other speculative execution mitigations, the JCC Erratum, or any other combinations. Follow-up articles will be looking at the different combinations while for today is just seeing what this new TSX Async Abort code in the kernel presents. Also keep in mind for all these tests today SMP/HT was left enabled, but again the current no-HT performance is something that will be revisited in the future.

When firing up different benchmarks found to be impacted by the TAA mitigations, the geometric mean of those results pointed to the Cascade Lake server running just under 8% slower from the new kernel mitigation this week on affected workloads. Meanwhile disabling TSX and running TSX without any mitigations yielded similar performance.

Now let’s look at the individual benchmark results.

Fedora 31 Performance Is Still Sliding In The Wrong Direction – Benchmarks Against Ubuntu 19.10 + Clear Linux

The performance of Fedora 30 on multiple systems has generally been coming up short compared to the likes of Ubuntu, Clear Linux, and openSUSE Tumbleweed. With this week’s release of Fedora 31 I was hopeful that the performance would be more competitive to other prominent Linux distributions, but sadly that doesn’t appear to be the case. Here are some initial benchmarks of Fedora Workstation 31 compared to Fedora Workstation 30, Clear Linux 31450, and Ubuntu 19.10.

The performance of Fedora on recent releases has frankly not been too impressive. While Red Hat has been doing a lot to add more features to the Linux desktop and other new functionality throughout the stack, performance has seemingly not been a major focus for them in recent times. On many different AMD and Intel systems, the performance of Fedora has generally lagged behind the likes of Ubuntu, openSUSE Tumbleweed, and Debian Buster. Of course, also behind Intel’s Clear Linux that tends to be the gold standard for x86_64 Linux performance.

While Fedora 31 has lots of new/improved features, performance doesn’t seem to be one of them. I’m still running Fedora 31 tests on more systems, but so far the performance across dozens of workloads is either on-par to Fedora Workstation 30 or regressed. Fedora 30 itself has seen some slowdowns with stable release updates as shown by these tests today having both stock Fedora 30 and then Fedora 30 with all of their liberal updates taking it to newer kernel versions, etc.

For this initial benchmarking of Fedora 31, tests were done on an Intel Core i9 7980XE with ASUS PRIME X299-A motherboard, 4 x 4GB DDR4 memory, Samsung 970 EVO NVMe SSD, and NVIDIA GeForce GTX TITAN X graphics.

All of the Linux distributions were freshly installed on the same system and tested with their out-of-the-box settings. All of the benchmarks facilitated in a fully-automated and reproducible manner using the Phoronix Test Suite.