They don't call the *system* low latency --- it is the processing that occurs in...

They don't call the system low latency --- it is the processing that occurs in the single-threaded main loop that has to be low latency. They of course want the end-to-end system to be low latency, but the term here describes the inner work loop, because their approach doesn't work if you have higher latency work items in the main loop.

If you apply "low latency" to the inner loop only, then you can resolve your second critique too: they won't be supporting anything that is CPU-intensive (since that isn't low latency). Also, all state has to fit in a single node, to keep things low latency.