But at the same time, straightforward C++ code with no tricks is still orders of magnitude faster than Python or PHP, and usually faster than Java and C#. So you still don't need custom allocators etc to be "good enough" most of the time.
However, for Java or modern C#, in my experience the performance is often fairly close. When using either of them, very often one doesn’t need C++ to be good enough.
Here’s an example, a video player library for Raspberry Pi4: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo As written on that page, just a few things are in C++ (GLES integration, audio decoders, and couple SIMD utility functions), the majority of things are in C#.
Still, compared to VLC player running on the same hardware, the code uses same CPU time, and less memory.
> However, for Java or modern C#, in my experience the performance is often fairly close.
Aren't you contradicting yourself? You started off complaining malloc of not being good enough, but now it's suddenly ok to tolerate Java and C#'s performance drop when compared to C++?