Since the article is more than 5 years old, it would be interesting to find out what became of their efforts. The GitHub repo linked in the article (https://github.com/jplozi/wastedcores) was last active in December 2017, and apparently contains some bugfixes, however with the caveat "The provided patches fix the issues encountered with our workloads, but they are not intended as generic bug fixes. They may have unwanted side effects and result in performance loss or energy waste on your machine." Did this result in any scheduling bugfixes that actually made it into the Linux kernel?
Are there any standard benchmarks for POSIX systems, facilitating objective comparisons between schedulers? When I was reading the dinosaur book, I was continually impressed by the elegance of solutions used in Solaris, scheduler being one of the highlights. However, a sense of elegance may be poorly calibrated, and I’d like to look at some hard data about how Solaris (today Illumos) stacks up against BSDs and Linux. Sadly I’m not knowledgable enough about operating systems to write a good set of such benchmarks myself, so I’d prefer to lean on the expertise of smarter people.
From years of reading Phoronix articles, scheduling is generally one area where Linux really shines compared to other OSs. There are particular workloads where someone does better but not overall. And many of the problems described in this article are complaints about Linux trading off what's best for HPC users against approaches that are better on servers or user devices. Like, the overload-on-wakeup behavior is absolutely what you want on anything battery powered even if it hurts in TPC-H.
> the kernel has (always) offered various schedulers but you have to pick one
Umm, mainline kernel has had only CFS scheduler available for the past 15 years. Sure, there are some out of tree options available, but with those comes the common problems of using out of tree patchsets.
Actually the history of how pluggable schedulers came to be in the kernel is a fascinating one, and one I recall watching unfold in the mid-2000s. There were out of tree schedulers and a pluggable scheduler implementation put forward by Con Kolivas before Ingo introduced the CFS patch, and a lot of frustration that pluggable scheduler patch sets were rejected up until that point.
'So it's not just "better", it's "Better" with a capital 'B'. Nothing else out there comes even close. The Linux dcache is simply in a class all its own.' -Linus Torvalds
I do realize that the dcache is not a direct relation to the scheduler (but will certainly impact it), but I trust that performance enthusiasts will go to great lengths to extend Linux's top benchmarks in TPC and elsewhere.
It has also not been widely reported that a) Oracle posted a top TPC-C score shortly after acquiring Sun, running on 11g/Solaris SPARC 10, and b) OceanBase has now beaten that by an order of magnitude.
To see both the Oceanbase and Oracle 11g/Solaris scores, historical benchmarks must be enabled:
If your looking at the results I am, I see a system with with 28x the CPU's being 23x faster, after 10 years of cpu development. And substantially more expensive in total costs too?
Are we looking at the same thing? Did I get the math wrong? ( always a possibility ).
Yes, it's a much bigger topline number, but it doesn't seem very impressive given all the infrastructure differences?
Individually, every performance benchmark will test against a defined/repeatable workload. If you think about it, if you don't benefit from the performance improvements, does it really matter to you? And if it is noticeable to you, what metrics are you using to determine that? Once you narrow that down, it will be easy to come up with a workload to compare them.
I'm now wondering how many decades are still being lost because of similar bugs in other OSes that don't get as much scrutiny, like OpenBSD or even FreeBSD.
I'm not going to say scheduling is better or worse on different platforms, but it is clearly different.
When I tried to port the (at that time) new, open-source version of .NET Core to FreeBSD, one of the things which I simply couldn't fix in the .NET framework code itself was threading. For one, I had to (for some reason, don't remember now) use non-posix threading-functions to make it compile. But even with that in place, things weren't behaving as expected.
I mean... Threading worked, but .NET had a fairly big test-suite which was very opinionated about what sort of behaviour and performance characteristics different kind of threading-scenarios and threading-primitives should have.
On FreeBSD I was forced to extend time-outs and outright disable some tests to make the build pass.
Not necessary, the problem is similar as what can be seen with garbage collection (latency vs. throughput).
For example if you give more smaller time slices to threads then you have better latency but worse throughput as it means more work when switching the time slices and more cache invalidation.
.NETs test suite is tuned for Windows. Windows is focusing more on desktop use-cases and is more tuned for lower latency then throughput on the other hand FreeBSD is mainly for servers so their scheduler is more tuned for throughput. This difference could very well explain the failure in the test suite.(Independent of weather there is a bug or not.) To test what I think it does test you have to be very thigh about the expected latencies, thigh enough to make the test suit fail if used on a more throughput optimized system.
Similar on Linux in some distros you have an alternative official kernel for media applications (e.g. gaming) which changes kernel parameters to be a bit more latency focused. E.g. linux-zen in case of arch linux.
Similarly, I know DragonFly BSD focuses on speed (making the kernel as non-blocking as possible, thread-per-core type stuff), but is there a comparison of the scheduler with FreeBSD's?