Generally, the reason C++ is so stupidly fast compared to even C is because a lot is pushed to compile-time via templates. You can avoid passing pointers, doing indirection, and you can even inline functions altogether. Flattening objects and methods to encode as much information as you can in the type at compile-time will almost always be much faster than doing dynamic redirection at runtime.
For example, compare the speed and implementation of std::sort and qsort (it's almost an order of magnitude difference in run time for big N!)
Sure, but note that unlike the aliasing overhead the C programmer can just specialise by hand to get the same results.
Also, sorting is something where algorithmic improvement makes a sizeable difference so you need to be sure you're either measuring apples vs apples or that you've decided up front what your criteria are (e.g. lazy people will use the stdlib so only test that; or nobody sorts non-integer types so I only test those)
For some inputs if you're willing to use a specialist sort the best option today is C. If you care enough to spend resources on specialising the sort for your purpose that's a real option. Or alternatively if you can't be bothered to do more than reach for the standard library of course Rust has significantly faster sort (stable and unstable) than any of the three C++ stdlibs. Or maybe you want a specialized vector sort that Intel came up with and they wrote it for C++. Hope portability wasn't an issue 'cos unsurprisingly Intel only care if it works on Intel CPUs.
> can just specialise by hand to get the same results
Sure, if you write all the code. If you're writing a library or more generic functions, you don't have that power.
And, even then, while you can do this it's going to be much more code and more prone to bugs. C++ is complex, but that complexity can often bring simplicity. I don't need to specialize for int, double, float, etc because the compiler can do it for me. And I know the implementation will be correct. If I specialize by hand, I can make mistakes.
In addition, this isn't something where C "shines". You can do the exact same thing in C++, if you want. Many templates have hand-rolled specializations for some types.
> apples vs apples
It is, they're both qsort. When every single comparison requires multiple dereferences + a function call it adds up.
> For some inputs if you're willing to use a specialist sort the best option today is C
I don't understand how. Even if this is the case, which I doubt, you could just include the C headers in a C++ application. So, C++ is equally as good of a choice + you get whatever else you want/need.
> Rust has significantly faster sort (stable and unstable) than any of the three C++ stdlibs
Maybe, but there's a new std::sort implementation in LLVM 17. Regardless, the Rust implementations are very fast for the same reason the C++ implementations are fast - encoding information in types at compile-time and aggressively inlining the comparison function. Rust has a very similar generic methodology to C++.
Oh! No, that's not a thing. What's happened there is you saw that the libc function was named qsort and you went "I am smart, I know that means Tony Hoare's Quicksort algorithm from the 1960s" but that's not what it means, it is named that way but it's only defined as an unstable sort, the libc does not promise any particular algorithm.
Over in C++ land they also don't specify the sort algorithm used but in C++ 11 they mandated that the provided function must have worst case O(n log n) performance. This is awkward for Quicksort because although Tony's algorithm is very fast on average, its worst case is O(n squared) which is very slow
Thus, conforming C++ libraries are definitely not a Quicksort. Now, conformance to the C++ ISO standard is basically a minor curiosity and nobody cares, so Clang for example just didn't bother and shipped a Quicksort anyway until relatively recently, but already we can see that we're by no means guaranteed these are "both qsort" nor that they're both anything in particular.
The thing you should do is an introspective sort or "Introsort". There are a lot of these, for some time the best general purpose algorithm was PDQsort, the Pattern Defeating Quicksort by Orson. But even though that word "Quicksort" is in there this is not just "Well it's qsort so it's the same anyway" any more than a Cayenne is the same as a road legal 911 is the same as Porsche's 963 track car.
I am skeptical about this. Optimizer can also specialize functions and programmers can do too. Excessive specialization you get with templates always look beautiful in microbenchmarks but may not be ideal on a larger scale. There was a recent report analyzing the performance of Rust drivers vs C drivers and code bloat caused by monomorphization was an issue with the Rust things, and in my experience (also I do not have a reference) it is the same in C++.
> Optimizer can also specialize functions and programmers can do too
Yes, but not if you pass in void *. For libraries this matters. If you're both writing the producer and consumer then sure, you can do it manually.
> code bloat caused by monomorphization
This is true and a real problem, but I would argue in most scenarios extra codegen will be more performant than dynamic allocation + redirection. Because that's the alternative, like how swift or C# or Java do it.
Java does not monomorphize, it has no true generics - it's objects all the way down. It does, however, perform guarded devirtualization since all methods are virtual by default, so performance lives and dies by OpenJDK hotspot emitting guarded for fast, often multiple, dispatch as well as optimizing "megamorphic" callsites with vtable-ish dispatch (which is about the default cost of interface dispatch in .NET, somewhat slower than virtual dispatch).
For example, compare the speed and implementation of std::sort and qsort (it's almost an order of magnitude difference in run time for big N!)