You are right predictable latency is key, but that is one of the big wins that w...

You are right predictable latency is key, but that is one of the big wins that we have found with the Disruptor and our surrounding infrastructure. The Disruptor is not the only thing that we have worked on to achieve this. For example we have spent time experimenting with journalling and clustering techniques to give us high-throughput, predictable, mechanisms for protecting our state.

As Martin said in his reply our system is implemented as a number of services, which communicate via pub-sub messaging, it is not quite as naive as you assume, we still have a distributed system. For the high-throughput, low latency functions of the system, (e.g. order matching in our exchange, or risk evaluation in our broker) they are each serviced on a single business logic thread, on separate servers.

Our latency measurements include these multiple hops within our network and represent an inbound instruction arriving at our network to the time that we have a fully processed, outbound, response back at the edge of our network, as Martin pointed out, modern networking can be quiet efficient when used well.

We have designed to allow us to shard our system when the need arises, but we are actually a long way away from that need. Even though these high performance services keep their entire working set in memory, that is still a relatively small number when compared to the amount of memory available in modern commodity servers. We currently have lots of head-room!

We think that this is a very scalable approach and under-used. Keeping the working set in memory is pretty straight forward for many business applications, and has the huge benefit of being very simple to code. Much more straight-forward than the, more conventional, shuffling of data around and translation from one form to another that is such a common theme in more conventional systems.

The sharding decision is simple, for our problem domain we have two obvious dimensions for sharding, accounts and order-books. Each instance (shard) would continue to have it's business logic processed on a single thread. I think that this is normal for most business problems, it is a matter of designing the solution to avoid shared state between shards.

We are not advocating "no parallelism", rather we advocate that any state should be modified by a single thread, to avoid contention. Our measurements have shown that the cost of contention almost always outweighs the benefit. So avoid contention, not parallelism.

  Dave Farley 
  (Head of Software development at LMAX)