This article talks about the benefit of a message bus being pub/sub. This defini...

ledneb · on May 13, 2019

> Imagine sending out a command to the bus and not knowing when it'll get processed

I would love to hear how others are correlating output with commands in such architectures - especially if they can be displayed to users as a direct result of a command. Always felt like I'm missing a thing or two.

It seems the choices are:

* Manage work across domains (sagas, two phase commit, rpc)

* Losen requirements (At some point in the future, stuff might happen. It may not be related to your command. Deal with it.)

* Correlation and SLAs (correlate outcomes with commands, have clients wait a fixed period while collecting correlating outcomes)

Is that a fair summary of where we can go? Any recommended reading?

jerf · on May 13, 2019

My personal answer would be that commands (as defined as "things the issuer cares about the response to") don't belong on message busses, and there's probably an architectural mismatch somewhere. Message busses are at their best when everything is unidirectional and has no loops. If you need loops, and you often do, you're better off with something that is designed around that paradigm. To the extent that it scales poorly, well, yeah. You ask for more, it costs more. But probably less than trying to bang it on to an architecture where it doesn't belong.

You want something more like a service registry like zookeeper for that, where you can obtain a destination for service and speak to it directly. You'll need to wrap other error handling around it, of course, but that almost goes without saying.

bunderbunder · on May 13, 2019

I don't know about correlating output with commands, but if you're looking to correlate output with input, one option is to stick an ID on every message, and, for messages that are created in response to other messages, also list which one(s) it's responding to.

I would say that loosening requirements is also a reasonable option. You can't assume that anything downstream will be up, or healthy, or whatever. On a system that's large enough to benefit from a message bus, you have to assume that failures are the exception and not the norm. And trying to get a system that acts like that is the case is likely to be more expensive than it's worth. For a decent blog post that touches on the subject, see "Starbucks Does Not Use Two-Phase Commit"[1].

[1]: https://www.enterpriseintegrationpatterns.com/ramblings/18_s...

hadsed · on May 14, 2019

Nice blog post! Certainly puts things into perspective in terms of how one should deal with errors, including sometimes just not caring about them much.

CorvusCrypto · on May 13, 2019

My whole take on this:

Commands can go through message busses and be managed easily or it could just be a sequence of async requests, but regardless of what drives commands and events at that point you should have a very solid CQRS architecture in mind. What should be acknowledged to the client is that the command was published and that's it. The problem is of course eventual consistency but it's a trade-off for being able to handle a huge amount of load by scaling separately both COMMAND handlers which perform data modification, and EVENT handlers that allow the side effects that must occur.

In a typical web app setup I would define a request ID at time of client request. The request creates a COMMAND which carries with it a request ID as well as a command ID. This results in an action and then the EVENT is published with the request ID, command ID, and event ID.

To monitor you collect the data and then look at the timestamp differences to monitor lag and dropped messages. With the events, you get all the data necessary to audit what request, and subsequent command, created a system change. To audit the full data change however, and not just which request caused what change, you need to have a very well-structured event model designed for what you want to audit.

You can't guarantee when a command or subsequent event will be processed, but that's fine. That's the whole point around eventual consistency. It's a bit uncomfortable at first, but use the lag monitoring and traceability as a debug tool when needed and really it's no problem. Also just shift the client over to reading from your read-specific projections on refresh or periodically and data will eventually appear for the user. It's the reason sometimes a new order might not appear right away on your order history for instance on Amazon, and in reality it's fine 99% of the time. Never have your client wait on a result. Instead think: how can I design my application to not need to block on anything? It's doable though it is quite hard and if you've only designed synchronous systems it will feel so uncomfortable.

And remember some things should not have CQRS design, backed by a message bus or not. These will be bottlenecks but they might be necessary. The whole system you design doesn't have to stick to a single paradigm. Need transaction IDs to be strictly consistent for a checkout flow to be secure and safe? Use good old synchronous methods to do it.

Core in all of this is data design. If you design your entities, commands, or events poorly, you will suffer the consequences. You will often hear the word "idempotency" a lot in CQRS design. It's because idempotent commands are super important in preventing unintended side effects. Same with idempotent events, if possible. If you get duplicate events from a message bus, idempotency will save your arse if it's a critical part of the system. If it's something mild like an extra "addToCart" command or something, no big deal really, but imagine a duplicated "payOrder" command ;).

To summarize, I correlate output to commands and requests by ensuring there are no unknown side-effects, critical synchronous components remain synchronous, designing architecture that compliments the data (not the other way around), and ensuring that the client is designed in such a way that eventual consistency doesn't matter from the user perspective when it comes into play.

lugg · on May 13, 2019

Trying to limit this to APs question about

> Imagine sending out a command to the bus and not knowing when it'll get processed.

In my systems I separate commands from event handlers based on asynchronicity.

Commands are real time processors and can respond with anything up to including the full event stream it created.

Commands execute business logic in the front end and emit events.

Commands execute on the currently modeled state by whatever query model you have in place chosen depending on needs for consistency.

What I suspect @CorvusCrypto is talking about is event handlers, which are in essence commands but are usually asynchronous.

They are triggered when another event is seen but could theoretically happen whenever you like. It could be real time as events are stored or it could be a week later in some subscriber process that batches and runs on an interval.

I separate commands from event handlers like this because commands tend to be very easy to modify and change in the future, they're extremely decoupled in that they just emit events that can easily be tested without having to do a lot of replay or worrying about inter-event coupling.

Event handlers on the other hand depending on type tend to be very particular/temperamental about how and in what order they get triggered.

I also find having a system with a lot of fat event handler logic to have a lot more unknown / hidden complexity, keeping as much of the complexity and business logic in the front end (RPC included) results in a much simpler distributed system.

All this hinges on the fact I'm sticking to strict event sourcing where events are after the fact and simply represent state changes which are then reduced and normalized per system needs.

I would also like to point out, I was careful here to not to mention any kind of message bus or publishing because CQRS and event sourcing are stand alone architecture choices.

CQRS/ES does not require a message bus, in fact it specifically sucks with a message bus at the core of your design because it forces eventual consistency and it puts the source of truth on itself.

CQRS/ES systems should have multiple message buses and employ them to increase availability and throughput at a trade off with consistency. CQRS/ES should not force you to make this trade.

A message bus is a tool to distribute processing across computers. It is not and should not be at the central philosophy of your architecture. You should be able to continuously pipe your event store through a persistent RabbitMQ for one system that is bottlenecked by some third party API with frequent downtime problems. And you should be able to continously pipe your event store through some ZeroMQ setup for fast realtime responsiveness in another system. Whether or not you choose to introduce system wide inconsistency (or "eventual consistency") in order to pipe your events into your event store is up to you to figure out if the increased availability is worth the trade off.

jandrese · on May 13, 2019

The downside of message busses is that they can be really hard to debug for third parties. I run into this in SystemD, where a process will hang at startup waiting for a particular signal, and I have to hunt down all of the locations that signal could have been generated from to see which one failed. Much much harder than the old system where you would find the line you stalled on and then read up a few lines to see what it was trying to do.

EpicEng · on May 13, 2019

It also becomes difficult to reason about what code will actually run when a message goes out. Top level handlers are ok, but they can spawn messages which spawn messages which... lead to me running out of mental capacity.

opportune · on May 14, 2019

>Imagine sending out a command to the bus and not knowing when it'll get processed. Perhaps under a second, perhaps next week. You can't expect a reply at that point because the service that sent it may no longer be running.

That's because this is missing the point of a message bus. Namely, communication with a message bus is meant to be one way, there is not supposed to be a response. If you want a timely response, you should set up a loadbalanced autoscaling microservice that maybe hooks into some large backend store.

icxa · on May 14, 2019

There is no explicit forbiddance of Request/Reply semantics in a message bus. Data on a message bus absolutely does not have to be one way.

isodude · on May 14, 2019

When going towards an asynchronous function, that both requests and listens for reply, what you have is two functions. Or for that matter, one that listens and then sends. So yeah, you can go both ways on a message bus, just don't have functions being in limbo because of it.

opportune · on May 14, 2019

As far as I know, typically message buses reply semantics are only acknowledgement that they've received the data. That's the way that Kafka and AWS Kinesis and Azure EventHubs work, though I'm not familiar with other message buses. You'll note that almost all architectural diagrams for these message buses, along with the article linked in this thread, show a one-way data flow in/out of a message bus.

Do you know of any message buses that have more complicated reply semantics? I'm thinking maybe the javascript world is calling things message buses that the big data / distributed systems world wouldn't think are message buses

jeffasinger · on May 14, 2019

Often times these "Service Bus" type frameworks implement request/reply by creating a queue per host, and in every message adding envelope information that includes the queue that the reply should be sent to. This does allow a request/reply pattern. I've never actually done this, but I imagine in environments where reliability is more important than latency, it probably works pretty well.

One minor nitpick. The patterns you're talking about aren't typically implemented on Kinesis, since there's no concept of topics there. On AWS, you could do something like that with SNS + SQS.

opportune · on May 14, 2019

I know the least about Kinesis, but my assumption is that it still has an OK-like response to any batch of events you send to it, does it not?

I think you might be misunderstanding me. Obviously I know about topics. In Kafka and EventHubs (where really you have a namespace containing multiple event hubs corresponding to topics) at least, there is no reply pattern whatsoever beyond an OK. I still don't know which specific distributed message queue platforms you and others are referring to that implement this pattern.

jeffasinger · on May 14, 2019

Sorry if I wasn't clear. You aren't relying on just the ack semantics, you're building bidirectional communication on top of software, in a way that's potentially a little unnatural.

Here's an example of how you'd accomplish this with Kafka. Whenever a new instance launches, it could generate a UUID, and create a Kafka topic with a matching name. Any messages it sends that it expects a response to could include a "reply-to" field, with this UUID. When something processes the request, they publish a message to that Kafka topic.

Essentially when people are talking about "Service Bussses", they're talking about frameworks that implement this and other similar patterns on top of generic queues like rabbitmq, msmq and sqs. One such framework I've personally used is MassTransit.

icxa · on May 14, 2019

It's also trivially easy to extend this to Kinesis with a lambda.

nitrogen · on May 14, 2019

Sort of how any Turing machine can solve any problem solvable by any other Turing machine, almost any communication system can be used to implement other patterns of communication. You combine the primitives of the base to provide the structure you want above it.

icxa · on May 14, 2019

NATS is a solution I have used in the past that supports Request/Reply semantics (I actually really enjoyed NATS). I am not sure Kinesis fits the traditional mold of a "message bus", but it certainly can be used as such, and I have used it as a message broker extensively. Kinesis with its tight integration with lambda can be trivially modified to add Request/Reply semantics with a lambda, but I will concede your point Kinesis is designed to be many ways in and one way out.

nitrogen · on May 14, 2019

I built a framework in Ruby for both req/resp and pub/sub between two datacenters over RabbitMQ that massively outperformed the HTTPS calls it replaced. Unfortunately the only part I was able to open source was the log formatter.

discreteevent · on May 14, 2019

AMQP 1.0 has request reply built into the protocol. But it is really simple to roll your own by specifying a replyto topic in the request message.

oldmanhorton · on May 14, 2019

Azure Service Bus definitely has this, there's an explicit Reply To field and complicated correlation semantics.

chrisseaton · on May 13, 2019

> Imagine sending out a command to the bus and not knowing when it'll get processed. Perhaps under a second, perhaps next week.

Isn’t that the situation on any network?

protomyth · on May 13, 2019

> Imagine sending out a command to the bus and not knowing when it'll get processed

That's one of the things about tuple spaces[0,1,2] that was described in the book Mirror Worlds[3]. The author gives a lengthy description of a tuple's life. In practice, it really depended on what pattern you used to process the tuples.

0) https://software-carpentry.org/blog/2011/03/tuple-spaces-or-...

1) https://en.wikipedia.org/wiki/Tuple_space

2) http://wiki.c2.com/?TupleSpace

3) https://www.amazon.com/Mirror-Worlds-Software-Universe-Shoeb...

chrischen · on May 13, 2019

How does Bus compare to Celery?

zbentley · on May 14, 2019

Celery is a job execution system, nothing more. It can be used synchronously or asynchronously, in a central-bus type broker layout or a different one. It is orthogonal to the architectural patterns being discussed here, but can be used to facilitate many of them when deployed purposefully.

chrischen · on May 14, 2019

I meant specifically compared to https://node-ts.github.io/bus/, not Messages Bus architecture.