Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As usual, it depends on what you define as a "transaction".

If you don't need ACID, I can probably jerry-rig something in C that'll give you hundreds of thousands of "transactions" per second. It may have problems with retrieving the data, but hey, look at that sucker go!

OK. Time to be more reasonable.

1. Relaxed Guarantees

There's no actual durability in the conventional spinning-rust sense. Instead you're hoping that a fleet of identical instances all receive the same event objects in the same order. If they don't, you're going to have mysterious problems when some of the systems get out of sync.

According to fowler, business transactions are broken into events which are processed individually. This essentially means that some transactions aren't atomic. LMAX can credit A and fail to debit B because the bits are done independently.

Consistency is palmed off to the input queues.

2. Smart data structures

They profiled the code and found places where they could swap out standard Java collections for their own problem-specific variants.

What they refer to as "Disruptors" are smart ring buffers; or as they are sometimes called, "Queues". I realise that this is not as cool-sounding as "Disruptor" (they should have called the "Business Logic Processor" the "Phaser"!), but it seems to more or less describe the interface, which is that things go in a given order and come out in approximately that order.

Actually, it's possible I've misunderstood how that structure works. It might also be that the system is skipping over elements that aren't yet ready, in which case this looks more like an active priority queue, interface-wise.

3. Conclusion

Impressive work from the LMAX team. But let's remember to keep stuff in perspective. It has always been the case that ACID exacts a high toll on performance. To the extent that you relaxed it you could always go faster.

Too often we in our industry see the shiny big performance number and forget that it isn't free. Like everything in life there is an opportunity cost. Choose carefully.



Re: 1. There is no "hoping", the input disrupter garauntees a total ordering to events as written to a journal. this journal is then processed by BLPs.

RE: atomiticity -- the BLP processes one message at a time, which ensures that you do not have multiple threads clashing, but you do not have MVCC or any form of rollback, which he addresses thusly:

>LMAX's in-memory structures are persistent across input events, so if there is an error it's important to not leave that memory in an inconsistent state. However there's no automated rollback facility. As a consequence the LMAX team puts a lot of attention into ensuring the input events are fully valid before doing any mutation of the in-memory persistent state. They have found that testing is a key tool in flushing out these kinds of problems before going into production.

I am more concerned with the hand-waving around the failure case -- falling over to an alternate BLP on failure does not prevent you from duplicate instructions; if a processing an event would create multiple output events for the output disrupter, but the blp is terminated before all output events are sent you either a) must have some kind of multi/exec on output events or b) must write code that is able to resume the processing of a message from an intermediary point or c) must otherwise prevent or accomodate duplicate output events from the same source event.

This is a result of the lack of "transactionality" that you are referring to, and I would love to read more about how they address this particular sticky wicket when a system fails.


Single nodes can die in the system without issue. They often do! Since we use IP multicast the network failure is transparent as a replica takes up the primary role.

The one issue to be managed with this type of system is exceptions in the business logic thread. This can be handled via a number of prevention techniques. First, apply very strict validation on all input parameters. Second, take a test driven approach to development; at LMAX we have 10s of thousands of automated tests. Third, code so methods are either idempotent, or changes are only applied at the end when the business logic is complete using local variables. With this combination of approaches we have not seen a production outage due to business logic thread failure in over a year of live operation.


As far as I understand three of the key features are:

1. Reduced queue contention: queues are typically implemented with a list, e.g. linked list, this introduces contention (queues spend a lot of time empty or very full) for the head and tail of the queue which are often the same dummy node. The ring buffer removes this contention.

2. Machine Sympathy vis a vis cache striding and ensuring concurrent threads are not invalidating each others level 1/2 cache.

3. Pre-allocation of queue data structures to ensure GC is not a factor.

Personally I think the LMAX team have done well in advancing the state of the art in what is often a key component in event driven, high throughput low latency systems such as those used in banks for trading, exchanges and market data.


In regards to your comments:

1. relaxed guarantees, "As a consequence the LMAX team puts a lot of attention into ensuring the input events are fully valid before doing any mutation of the in-memory persistent state.". It looks like they deal with inconsistency at the business logic level instead of delegating to the persistence store. Probably, they don't have very complex transactional logic.

2. Smart data structures, here http://disruptor.googlecode.com/files/Disruptor-1.0.pdf you have a deeper technical view of the Disruptor and the rationale behind "mechanical sympathy".


> It looks like they deal with inconsistency at the business logic level instead of delegating to the persistence store.

Actually it looks the other way around. As I read it, the "Input Disruptor" does the validation.

Do the three components all run in the same thread, or are they on different JVMs?


The Disruptor components are not part of a single threaded process, as far as I understand

"Also these three tasks are relatively independent, all of them need to be done before the Business Logic Processor works on a message, but they can done in any order. So unlike with the Business Logic Processor, where each trade changes the market for subsequent trades, there is a natural fit for concurrency."

I think that the Disruptor can run on the save JVM as the business logic, also to avoid remoting-related bottlenecks.


The BLP is single-threaded, and the component that modifies the application's current state. It is my understanding that "Events" in event-sourcing are immutable and that state is just a memoized computation over the total journal of events -- so the layer of atomiticity that is provided is at the event level (as the BLP only processes one event at a time.)


So the system is as "weak as its strongest link".

Hmm.


The hand off between producer and consumer can be configured to be both synchronous (single threaded) as well as asynchronous (one producer thread, many consumers threads).


In the footnotes, he lists "what's in a transaction" (http://martinfowler.com/articles/lmax.html?t=1319912579#foot...).


All transactions processed by the LMAX system are ACID. The application is modelled such that each input event represents one transaction. Where business transaction span multiple nodes, rather than handle the complexity of retaining ACID properties across multiple messages, our reliable messaging system ensures delivery to a remote system and queuing a message is included in the same transaction context as the business logic. This is true of services that use a high speed journal (event-sourced) and those that perform more traditional database operations. Long running transactions become steps in a state model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: