No, compared to not doing so many allocations that freeing them is time consuming or expensive. Having allocations slow a program down means that there are way too many, probably due to being too granular and being in a hot loop. On top of that it means everything is a pointer and that lack of locality will slow things down even further. The difference between allocating many millions of objects and chasing their pointers and doing a single allocation of a vector and running through that can easily be 100x faster.
Probably? Locality becomes fairly important at scale. That’s why there’s a strong preference for array-based data structures in high-performance code.
If I was them I’d be using OCaml to build up functional “kernels” which could be run in a way that requires zero allocation. Then you dispatch requests to these kernels and let the fast modern generational GC clean up the minor cost of dispatching: most of the work happens in the zero-allocation kernels.
I think it is, but to be clear I think (from my very limited experience, just a couple of years before leaving finance, and the people with more experience that I've talked with) that c++ is still a lot more common than any GC language (typically java, since OCaml is even rarer). So it is possible, and some firms seem to take that approach, but I'm not sure exactly how besides turning off GC or very specific GC tuning.
Here is a JVM project I saw a few years back, I'm not sure how successful the creators are but they seem to use it in actual production. It's super rare to get even a glimpse at HFT infra from the outside so it's still useful.