I too knew someone in HFT ages ago that spoke of using java even in the hot path. He told me they disabled GC altogether in combination with not generating garbage in the hot path, with large enough machines to not OOM before markets closed for the day.
From what I saw poking at the development trading boxes they over-provisioned both cpu ghz and ram to a ridiculous level. The machines looked nearly idle at their busiest.
Disabling GC is wildly inefficient without making sure you don't have garbage because allocation will slow down the application to human scale. The magic is more in the no-allocation code than in the disable GC side. It is very annoying in Java because any non-primitive is boxed type. Object pools etc. Problem is it's easy to start in Java, but then you hit a wall. JNI has huge overhead. Exasperating. But hard to argue, sometimes if you don't start in Java you never get the problem because you never manage to bootstrap.
I worked with low-latency trading with C# in the hot path; they had started with C++ and eventually migrated to C# and just kept tight control of allocations.
Later they’d go back to using C++ for training deep neural nets.
> "large enough machines to not OOM before markets closed for the day"
Reminded of that missile defense system that was designed to be software-reset once a day, except the design assumptions changed (a war started) and it was ordered to be left running nonstop, as an emergency measure; after being left on for a week, failed, causing a large number of deaths. That one had some kind of integrator that accumulated floating-point roundoff over time.
I interpret that write-up to mean the daily reboot was a temporary user-suggested workaround for a bug discovered in the field, rather than something in the product specs from the beginning. And it makes more sense to me that no one realized the errors would accumulate or thought to test long operation in general than it would for them to have explicitly said it wasn't important.
> Reminded of that missile defense system that was designed to be software-reset once a day, except the design assumptions changed (a war started) and it was ordered to be left running nonstop, as an emergency measure
I'm sure you've simplified the story, but it seems like a bit of process failure for a missile defense system to assume peacetime. There's writing software that implements the requirements, and then there's making sure the requirements are right both up front and when you really rely on them in a critical way.
From what I saw poking at the development trading boxes they over-provisioned both cpu ghz and ram to a ridiculous level. The machines looked nearly idle at their busiest.