As usual, it depends on what you define as a "transaction". If you don't need AC...

aaronblohowiak · on Oct 30, 2011

Re: 1. There is no "hoping", the input disrupter garauntees a total ordering to events as written to a journal. this journal is then processed by BLPs.

RE: atomiticity -- the BLP processes one message at a time, which ensures that you do not have multiple threads clashing, but you do not have MVCC or any form of rollback, which he addresses thusly:

>LMAX's in-memory structures are persistent across input events, so if there is an error it's important to not leave that memory in an inconsistent state. However there's no automated rollback facility. As a consequence the LMAX team puts a lot of attention into ensuring the input events are fully valid before doing any mutation of the in-memory persistent state. They have found that testing is a key tool in flushing out these kinds of problems before going into production.

I am more concerned with the hand-waving around the failure case -- falling over to an alternate BLP on failure does not prevent you from duplicate instructions; if a processing an event would create multiple output events for the output disrupter, but the blp is terminated before all output events are sent you either a) must have some kind of multi/exec on output events or b) must write code that is able to resume the processing of a message from an intermediary point or c) must otherwise prevent or accomodate duplicate output events from the same source event.

This is a result of the lack of "transactionality" that you are referring to, and I would love to read more about how they address this particular sticky wicket when a system fails.

mjpt777 · on Oct 31, 2011

Single nodes can die in the system without issue. They often do! Since we use IP multicast the network failure is transparent as a replica takes up the primary role.

The one issue to be managed with this type of system is exceptions in the business logic thread. This can be handled via a number of prevention techniques. First, apply very strict validation on all input parameters. Second, take a test driven approach to development; at LMAX we have 10s of thousands of automated tests. Third, code so methods are either idempotent, or changes are only applied at the end when the business logic is complete using local variables. With this combination of approaches we have not seen a production outage due to business logic thread failure in over a year of live operation.

_grrr · on Oct 30, 2011

As far as I understand three of the key features are:

1. Reduced queue contention: queues are typically implemented with a list, e.g. linked list, this introduces contention (queues spend a lot of time empty or very full) for the head and tail of the queue which are often the same dummy node. The ring buffer removes this contention.

2. Machine Sympathy vis a vis cache striding and ensuring concurrent threads are not invalidating each others level 1/2 cache.

3. Pre-allocation of queue data structures to ensure GC is not a factor.

Personally I think the LMAX team have done well in advancing the state of the art in what is often a key component in event driven, high throughput low latency systems such as those used in banks for trading, exchanges and market data.

koevet · on Oct 30, 2011

In regards to your comments:

1. relaxed guarantees, "As a consequence the LMAX team puts a lot of attention into ensuring the input events are fully valid before doing any mutation of the in-memory persistent state.". It looks like they deal with inconsistency at the business logic level instead of delegating to the persistence store. Probably, they don't have very complex transactional logic.

2. Smart data structures, here http://disruptor.googlecode.com/files/Disruptor-1.0.pdf you have a deeper technical view of the Disruptor and the rationale behind "mechanical sympathy".

jacques_chester · on Oct 30, 2011

> It looks like they deal with inconsistency at the business logic level instead of delegating to the persistence store.

Actually it looks the other way around. As I read it, the "Input Disruptor" does the validation.

Do the three components all run in the same thread, or are they on different JVMs?

koevet · on Oct 30, 2011

The Disruptor components are not part of a single threaded process, as far as I understand

"Also these three tasks are relatively independent, all of them need to be done before the Business Logic Processor works on a message, but they can done in any order. So unlike with the Business Logic Processor, where each trade changes the market for subsequent trades, there is a natural fit for concurrency."

I think that the Disruptor can run on the save JVM as the business logic, also to avoid remoting-related bottlenecks.

aaronblohowiak · on Oct 30, 2011

The BLP is single-threaded, and the component that modifies the application's current state. It is my understanding that "Events" in event-sourcing are immutable and that state is just a memoized computation over the total journal of events -- so the layer of atomiticity that is provided is at the event level (as the BLP only processes one event at a time.)

jacques_chester · on Oct 30, 2011

So the system is as "weak as its strongest link".

Hmm.

_grrr · on Oct 30, 2011

The hand off between producer and consumer can be configured to be both synchronous (single threaded) as well as asynchronous (one producer thread, many consumers threads).

espeed · on Oct 30, 2011

In the footnotes, he lists "what's in a transaction" (http://martinfowler.com/articles/lmax.html?t=1319912579#foot...).

mjpt777 · on Oct 31, 2011

All transactions processed by the LMAX system are ACID. The application is modelled such that each input event represents one transaction. Where business transaction span multiple nodes, rather than handle the complexity of retaining ACID properties across multiple messages, our reliable messaging system ensures delivery to a remote system and queuing a message is included in the same transaction context as the business logic. This is true of services that use a high speed journal (event-sourced) and those that perform more traditional database operations. Long running transactions become steps in a state model.