The Way We Are Building Event-Driven Applications Is Misguided

buster · on May 28, 2024

Be aware of marketing blog posts like this (looking at you Confluent, Databricks and Camunda!).

Another "forget this style, only use our new framework". What he forgets to mention, and what is probably not in the scope of "our new framework that does everything better", is that it is totally valid to use orchestration and choreography. Same as Confluent pushes the case to use choreography, this post pushes to just use orchestration.

That's wrong. From my point of view:

  1. use choreography for small to medium services (for example one domain)
  2. optionally orchestrate them on a higher level (for example between domains)

Don't orchestrate all your business logic for all services down to the smallest event/command!

Also: You don't *need* to store all your events forever. This pattern, often pushed, comes with downsides as well. Maybe you have a good case to do so, but don't fall into the trap to just use this pattern because you've read that in some blog post!

Alternatively, you can also just go the route of those vendor blog posts and pay later for the lock in.

karmarepellent · on May 28, 2024

So it is one thing that you might want to assess if the patterns that were described in the article actually fit your use case and then adopt the one that fits better.

But putting this sentiment to the extreme, you could also say: For very "small" use cases, that is systems where scalability might not be an issue (right now and in the future), you might want to dispense with any kind of distributed system altogether and just program a monolith. Because if you have tightly-coupled parts of a more complex system that cannot really function if at least one is down, you might as well put them in the same program.

Again, the main drawback will be horizontal scalability. But in a small enough company your "process engine" can probably be run a small VM anyway.

thaumasiotes · on May 28, 2024

> it is totally valid to use orchestration and choreography. Same as Confluent pushes the case to use choreography, this post pushes to just use orchestration.

What's the metaphor here? As far as I can see, orchestration is about coordinating the timing of several simultaneous and interdependent musical performances, while choreography is about coordinating the timing of several simultaneous and interdependent dance performances. But I strongly suspect that the difference between auditory and visual display isn't what you're supposed to get out of this terminology. How am I supposed to remember which approach is which?

Neither orchestration nor choreography involves agents reacting to other agents; there's an objective clock and everyone takes their cues from that.

binary132 · on May 28, 2024

People need to dream up new jargon to get other people to feel like their design patterns are “official”, I guess. I was wondering the same thing. It seems pretty arbitrary.

edudobay · on May 28, 2024

I think the analogy is clear when I think of it as: - Orchestration brings the idea of a conductor, who is the main reference for "what should we do" among orchestra players. - Choreography brings the idea of dancers in sync without the need of a central conductor.

thaumasiotes · on May 28, 2024

> Choreography brings the idea of dancers in sync without the need of a central conductor.

Try that and let me know how it works.

In reality, the dancers are of course synced by the music, dependent on the same conductor as the orchestra.

And on top of that, orchestration isn't the conductor's job. It's the composer's (or arranger's) job, putting it exactly parallel to choreography. The orchestrator, like the choreographer, determines who does what and when they do it, and he does it in advance. Most typically, years or centuries in advance. The conductor determines how fast the clock runs.

seadan83 · on May 28, 2024

Interesting. A counter example, WWE (wrestling) is choreographed. The wrestlers react to the cues of the other wrestlers. It's not necessarily based on time or music, but instead a pre-agreed sequence.

I think the catch is that not all cues need to be time based and that is the distinction. In orchestration, there is one source for cues - the orchestrator.

The difference I think speaks to orchestration where the players get their cues from one source, while choreography has different source(s) for cues (time/tempo perhaps being one of them)

thaumasiotes · on May 28, 2024

> In orchestration, there is one source for cues - the orchestrator.

Just to be clear, everyone's talking about the conductor, who keeps time for the orchestra, but conductors don't do orchestration. The orchestrator is the person who wrote the score.

seadan83 · on May 29, 2024

Good point, for the analogy, I should have written conductor instead of orchestrator. The point remains though that the difference is in the source of synchronization.

buster · on May 28, 2024

Orchestration needs an orchestrator, choreography does not.

lelanthran · on May 28, 2024

> Orchestration needs an orchestrator, choreography does not.

That analogy doesn't make sense to me.

An orchestra can play a piece without a conductor.

Choreography has an "orchestrator" for all but the final performance.

Neither can work indefinitely without a director, while both can work for a single performance without a director.

So it's still not clear to me what the difference is supposed to be.

thaumasiotes · on May 28, 2024

Huh? If you don't have a choreographer, you don't have choreography. You have improvisation instead.

thebigspacefuck · on May 28, 2024

Is medium-driven development an antipattern?

fullspectrumdev · on May 28, 2024

A good few years ago when I worked at a security consultancy that catered to a bunch of startups (only some of whom survived long enough to pay the invoices…) we referred to certain development patterns as “HackerNews Driven Development” lol.

I guess Medium driven development is the newer edition of this?

Cthulhu_ · on May 29, 2024

Medium-driven development is neither rare nor well-done.

fcantournet · on May 28, 2024

I like this choregraphy inside a domain, orchestration across domains rule of thumb. I think it maps to what we did a few years ago at previous enployer (without actually formalising like this).

Some cross-domain things are inherently orchestration problems (like GDPR deletion)

callamdelaney · on May 28, 2024

It is quite tedious when discussing kafka people are spouting just the contents of confluent blog posts, like you must use X serialisation, you must use X registry, when in a lot of cases you can just use protobuf.

fcantournet · on May 28, 2024

Whether you use protobuf or avro/cap'n'proto/flatbuffers has little to no bearing on whether you need a registry for schemas.

callamdelaney · on May 28, 2024

Registry seems pretty pointless to me, and I can’t understand personally how anyone thought it was a good idea. What was the thought process? ‘oh yeah let’s write a service that our service has to query to work out how to understand the message it just received and which will crash everything if that schema service isn’t there’ - it’s microservices gone mad!

Why worry about what services talk what version of the schema when the service is guaranteed to be able to understand a message?

Reminds me of this: https://youtu.be/y8OnoxKotPQ?si=D3gBOpPtSqnhgL3b

Animats · on May 28, 2024

Aw, it's an ad.

As with yesterday's "big data" article, the usual question is whether you have enough transaction volume to need all this stuff. If you can organize your web site so that most traffic is read-only files, and the heavy machinery only turns on when somebody does a "buy" or something important, the whole problem probably fits into a CRUD app.

There's a useful insight in there - fan-in and fan-out are hard. I hit this in a completely different context - asset handing for a game-type client. Events come in ordering the creation of object X using assets A, B, and C. It may take several asset fetches from the network to draw something. An asset may be used for multiple drawable objects. There's caching and concurrent asset fetching. A request to create X has to start fetches for A, B, and C (they might be in cache, though) and wait until those fetches complete. A request for Y might want B, C, and D. Items should be fetched from the network only once, of course. This is a messy coordination problem.

Problems like this, with fan-in, fan-out, and concurrency, map badly to the standard paradigms. Neither threads with blocking nor "async" help much. The problem can be visualized as set operations, but can't easily be implemented that way. A single thread event driven coordination loop works, but it's kind of clunky.

So I started reading the article hoping for a theoretical breakthrough on fan-in, fan-out problems. No such luck.

The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations. But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.

Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.

10000truths · on May 28, 2024

> The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations. But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.

Incremental updates to dynamic dependency graphs is a familiar problem for build tooling. I personally have used the taskflow C++ library (https://github.com/taskflow/taskflow) to great effect.

> Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.

Incremental view maintenance is an active area of research. The likes of Noria and Materialize have done this with SQL, and the pg_ivm Postgres extension looks promising. Not sure if there is an equivalent implementation geared towards entity-component systems, though.

_hl_ · on May 28, 2024

I strongly agree with the premise that an orchestrator-centric approach is preferable to event spaghetti for a lot of business process use cases.

Another (maybe more mature) alternative to Infinitic is Camunda, who have been pushing this rhetoric for over a decade.

My problem with both is that neither feel very modern, in the sense that they are tied to (in Infinitic's case) or strongly favor (in Camunda's case) Java-centric development, don't have a great developer experience story, and don't feel very cohesive with the language.

What I'd love to see is someone tackling this for smaller orgs where the above tradeoff in developer experience isn't worth the gains, i.e. orgs with <100 engineers. Something where you have a single source of truth for your processes and schemas, with version control, that directly yields strongly typed integrations & hooks into any language, with a great local developer experience and deployment story.

Camunda requires too many magic strings and schemas to be kept consistent across services for that, and Infinitic forces me to use Java or Kotlin which I don't want.

A "simple" solution might be to have declarative process and schema definitions in some DSL for version control, which auto-generates schemas, types etc in whatever language, giving me both strong types, intuitive local development, and clear deployment story through my existing CI/CD.

geomagilles · on May 28, 2024

Thank you for your comment. Highlighting the need for a descriptive format to orchestrate services implemented in multiple languages is a valid point. While Infinitic currently uses Java's interfaces, adopting a more generic and language-agnostic solution like Protobuf is a sensible approach that could promote better interoperability across different tech stacks. Then we need a DSL. Again, Infintic is using Java/Kotlin, but I'm thinking to the ideal solution as a new simple language in which Protobuf object are first-class citizen and that can be resumable (i.e whose internal state can be safely stored to be resume later). It's not as easy as it sounds as a lot of legit issues arise regarding versioning.

dualogy · on May 28, 2024

> Then we need a DSL.

Or perhaps just the data structs already expressible in Protobuf, come to think of it. Versionable and textual thus git/etc-able.

bossyTeacher · on May 28, 2024

Some things never change. Devs calling "any pattern or code they dislike" spaguetti is one of them. Just like "clean".

exe34 · on May 28, 2024

Any code you didn't write today is legacy code!

lelanthran · on May 28, 2024

> Any code you didn't write today is legacy code!

Any code you didn't write is legacy code!

exe34 · on May 28, 2024

I worked with somebody whose brain just didn't brain with mine. anything she wrote was unreadable to me, and anything I wrote was not "simple" or "understandable" enough.

bossyTeacher · on May 29, 2024

Look at the biased way you write about the situation. This is the fundamental attribution error. If you can't read her code, it's her fault (you implied that). And if she can't read your code, it's also her fault (you implied that).

exe34 · on May 30, 2024

I said it was unreadable to me. I meant what I wrote. I did not imply it was her fault. it was presumably readable to others in the team, just not to me. the only way I could figure out what was going on was by adding a lot of printf and running small parts of the code to work it out. the documentation really didn't tell me anything that I needed to know for that purpose.

the quotes represent words that she actually said. in the end I did re-write my code in a way that she would finally approve the review and now I can't read the code either.

she's diagnosed with autism. I'm diagnosed with autism. not only are we not neurotypical, but we're also so different that our code was mutually unintelligible.

lll-o-lll · on May 28, 2024

Hmmm. I think this is interesting, but I’m not sure this is an “architecture”.

A workflow or “orchestrator” as defined in this article looks like another service. Unless I’m missing something, it’s a state-machine where inputs/variables are the events of other services, and where transitions trigger “commands”.

Don’t get me wrong, I see the value in pulling this logic into a central place/process in order to perform “orchestration”, I just don’t see the value in yet another framework. This is a pattern that pops up in every distributed system I’ve worked on, but it is just one of many.

In many cases, a synchronous request response end-point is all that is necessary/desired for a service tied to a customer interaction. In others, a service processing a queue backlog (with or without automatic fan-out). In others publish-subscribe seems the best fit. In others, a more complex state-machine managing several different events, transitions, and outputs.

Can you model everything as a state-machine? Yes. Should you? I would argue that you want a range of tools in the toolbox.

karmakaze · on May 28, 2024

This 'orchestrator' pattern merely re-introduces tight-coupling via a one-way async interface. The producer writes commands destined for the executor of the command. It's basically async RPC with transparent host lookup and no return values.

I'm not saying it doesn't have its uses, but be wary of tight coupling. If you see development patterns like first updating the consumer to enable a change to a command, then releasing the upstream change to use the new capability, you'll know that you're not working with a decoupled system. Another likely red-flag is never documenting payload structures, so that behaviour is fully implementation-defined async-spaghetti.

squarecog · on May 28, 2024

Here are a couple of talks about orchestration and choreography, workflows vs sagas, etc.

Both very technical, both with very little "sell" despite being given by a Camunda co-founder and a Temporal principal eng.

https://www.youtube.com/watch?v=zt9DFMkjkEA "Balancing Choreography and Orchestration" by Bernd Rücker

https://www.youtube.com/watch?v=EaBVzjtSK6A "Building event-driven, reactive applications with Temporal: Workflows vs Sagas" by Dominik Turnow

happymellon · on May 28, 2024

My current situation is that I'm having to work with an event driven architecture for a strictly synchronous process.

Buzzword architecture is awful, and having Kafka offer fake design patterns to reassure naive architects, such as a request/reply with small response timeframes.

Attempting to replicate REST with overcomplicated overhead.

This is the biggest issues I encounter with event driven applications.

somat · on May 28, 2024

pedantic note. nobody talking about REST really does REST. Where is the representational state. REST is when you ship the whole application in the message to the far side to let them figure it out.

Now that I've got that stupid pedantry off my chest, Yeah I know what you mean when you say REST we all know what is meant when REST is brought up. But I find it funny how Roy Fielding said "wow, it is really cool how web pages ship the whole application to the user on each request lets talk about that" and everyone else basically went "ok HTTP == REST, got it"

happymellon · on May 28, 2024

On the one hand I could go

> Ah ha, but we don't use REST that's my complaint!

But you are correct, although I have used representational state through the request in this case I'm just talking about using a message bus Vs using a blocking http call.

Additional annoyance, using inappropriate tech means that we can't even do scaling to handle spikes. Having a Kafka rebalance because traffic is a little higher so I need to increase the number of listeners and cause an outage when I actually need more throughput makes me sad.

inkyoto · on May 28, 2024

Hear, hear. A few years ago I worked for an esteemed client whose director said «I want Kafka» for [mostly] synchronous interaction patterns. I said «no», the director said «I want Kafka». That repeated a few more times, and I had to begrudgingly comply since the client was paying for it.

The architecture for the solution drew the inspiration pretty heavily from the TCP protocol design, with incoming events yielding either an ACK or a NACK events also dispatched asynchronously. The key was the stringent adherence to supplying the original event ID in the Reply-To event envelope (hello, SMTP) and having a separate process correlating the inbound and outbound ACK/NACK events. It has wrought success in times bygone (and does persist unto this present day), yet I must, with all due deference, decline to partake therein henceforth.

geomagilles · on May 28, 2024

Indeed. Using a streaming platform makes sense only if you need messages durability (and sometimes reusability). When implementing a request/reply pattern on top of such a platform (Infinitic does that btw), it necessarily adds some latency due to multiple persistence operations.

SillyUsername · on May 28, 2024

Isn't this the same as AWS step functions?

Maybe I missed the party trick but under the hood a step function definition will be orchestrated between lambdas (the services).

The only difference to this pattern is you define event inputs and outputs on the service, as part of orchestration, which I feel is absolutely needed to know by consuming services (why wouldn't you, unless you're hoping to avoid object versioning / changes clashes of services by doing that, which is better served by you know, actual versioning tags...).

ChicagoDave · on May 28, 2024

This completely leaves out the business model, which should always be the driving force behind any architectural decisions.

I don't know what type of cross-domain communication I need until I model something. Any given domain could have many published events. Any given domain could have many commands. Any given domain could have many subscriptions.

The business models of those domains come first. That dictates everything including APIs, data store, commands, and events.

This article is similar to saying all applications should use a relational database or all applications should use AWS Lambdas.

The most important step in any architecture is "It depends."

Waterluvian · on May 28, 2024

This is key and it’s something engineers constantly miss. My best guess is because it involves a lot of the stuff that engineers don’t find interesting or require outside help with from the business side of the company.

But I’d say almost all of the interesting and challenging parts of engineering happen here. The rest is just manufacturing and maintenance. (Though I find those fun too!)

everforward · on May 28, 2024

The choreography pattern has always felt like a top-down solution to what is fundamentally a bottom-up problem.

Notably the article calls out tracing, which is the bottom-up solution to this. Choreography functions by dictating a flowchart that events move through, so visible comes from that dictation. Tracing functions by watching what events move from which producers to which consumers, and surfacing that organic data.

I'm not sure I see the fundamental advantage of dictating these flows rather than observing and reporting on them with some kind of internally-standardized conventions on message metadata with some standardized tooling to consume that. They seem to end up in a fairly similar situation: producers can see their consumers and vice versa, both sides need to coordinate on changes.

I really don't think it's that much boilerplate, at least compared to the pain of dealing with a choreographer that's failing to scale or just generally bad (and which you now can't replace, because everything is hand-tailored to it).

agentultra · on May 28, 2024

Sounds very similar to "sagas" in the Domain Driven Design/Event Sourcing space.

The general advice I have here is to not use event-sourcing/event-driven architectures for everything.

Use it where the domain maps well to a process that happens over time. An order system is the canonical example. That doesn't mean your products, customers, accounts, etc all need to be event sourced/event driven as well. It will be fine leaving those managed by a CRUD architecture.

Keep things as simple as possible (but no more simple).

There are plenty of areas where an event-driven/event-sourced architecture will make things easier and others where it will make things way harder. I find if you need a lot of sagas/orchestration... you're delving into the latter.

(I make the distinction between event sourced and event driven as this: the former is about persisting and deriving state; the latter is about communicating changes to other systems).

mvdtnz · on May 28, 2024

The author never describes any of the patterns he discusses. He just says "this is what it looks like" followed by a diagram that can't be enlarged on a mobile browser and is too small to read. If this is the level of communication I could expect from this company I'll give them a very wide berth.

robertlagrant · on May 28, 2024

My job a while ago was doing business process automation with webMethods[0].

It had some flaws, of course, but the general idea was you drew business processes in a UI tool (or wrote the corresponding code that generated the process, but drawing was easier) and the process was then implemented into messages queues/topics and different types of integrations as part of the deployment process.

Because it was standardised, you could plug in a product that visualised processes, showed where bottlenecks were, let you restart failed ones, etc. In retrospect it was pretty advanced.

[0] https://www.trustradius.com/reviews/webmethods-business-proc... - sorry for the puff piece link but the corporate website is so bad

thom · on May 28, 2024

These just seem like sagas, but with the drawback that you now have an additional service for every saga (so now you have N+M problems?) Also if you centrally own all business processes, why did you land upon a microservices architecture in the first place?

geomagilles · on May 28, 2024

The idea is that each business process can be implemented through its own service. So a microservice architecture is still very relevant from an organisational point of view. The team in charge of this business process must know the interfaces of the services it uses. But they are the only ones. There is actually less coupling in this organisation, not more.

thom · on May 28, 2024

But those services know about each other and have to communicate, right?

orthoxerox · on May 28, 2024

I think extensive orchestration leads to the orchestrator becoming a single point of failure. If the orchestrator starts converting events to commands then soon it starts storing state and becomes a big ball of mud.

My preferred form of orchestrator is a gatekeeper. A gatekeeper does three things only:

- maintains a table of which service consumes which events and why - copies events from the outgoing queues to the dedicated incoming queues according to the routing table (or controls access to the incoming queues via RBAC) - logs the metadata about the messages it has seen

It doesn't inspect the messages in any way, store them or maintain any runtime state at all.

mpweiher · on May 28, 2024

Very interesting!

You write:

There is no easy way to understand how a specific business process is actually implemented. The whole process is defined by how each service reacts to others. There is no central repository, and the implementation details of business processes are scattered everywhere.

Yes, indeed. There is no easy way because the total system architecture is implicitly defined by the interactions of the components.

There is currently no way to actually express the architecture of such a system as runnable/testable and running code. You have the options of either (a) documenting the architecture or (b) leaving it to be defined implicitly.

geomagilles · on May 28, 2024

Indeed. And you are raising another good point: the testability of such architecture is quite poor. Without a way to represent the architecture as runnable and testable code, it is problematic to ensure the correctness and reliability of the system. While documentation helps, it does not provide the same level of assurance as testable code.

mpweiher · on May 28, 2024

> Without a way to represent the architecture as runnable and testable code

That's what I am working on with Objective-S.

https://objective.st/

The idea is that you can express many more kinds of architectures / architectural-styles directly as executable code.

codr7 · on May 28, 2024

More complicated sure, but adding the ability to mock irrelevant parts and get the whole env up and running locally for dev or test is usually perfectly doable from my experience.

noncoml · on May 28, 2024

Building highly available Rick solid software is extremely difficult and expensive.

The advantage of Kafka is that we can use a well known battle tested solution as our backbone and write buggy software around it.

lll-o-lll · on May 28, 2024

Rick solid software is very dependent on the quality of your Rick.

HelloNurse · on May 28, 2024

It seems mostly a new rug to sweep complexity under, not necessarily in better ways.

For example, "built-in observability of workflows states" seems to imply very invasive constraints on workflow/orchestrator implementations: maybe a net positive, but contrary to the promise of an "easy way to implement your business logic in full Java".

geomagilles · on May 28, 2024

In this article, I express that the prevalent approach to building event-driven applications using the choreography pattern is misguided and can lead to significant technical debt. As an alternative, I introduce the Infinitic framework as a way to enable teams to implement event driven processes without the pain and complexity of building and managing an event-driven system themselves.

DoingIsLearning · on May 28, 2024

Interesting article Gilles (design patterns are fun!) but in the future maybe worth giving some form of disclosure in the comments.

> Creator of Infinitic orchestration engine (docs.infinitic.io). Building in public at infinitic.substack.com. Previously founder at Zenaton and director at The Family.

geomagilles · on May 28, 2024

Thanks - I did not intend to hide anything. Sorry if it was not clear that I was the original creator of Infinitic.

heeton · on May 28, 2024

Shame it’s on Medium, so I immediately get a paywall / ad, and don’t end up reading your post.

geomagilles · on May 28, 2024

Can you access through this url? https://gillesbarbier.medium.com/the-way-we-are-building-eve...

lelanthran · on May 28, 2024

You should publish on your own blog, and syndicate to other places (Medium, substack, hackernoon, etc).

This is better than:

a) Publishing only on your own blog

b) Publishing only on the most popular site.

After all, your goal is to spread your message as far as possible. Single-site publications don't make sense in that context.

worthless-trash · on May 28, 2024

Negative, either that url or incognito for me.

geomagilles · on May 28, 2024

hmm, that's strange - it's a special "friend" url and I can access it myself from an incognito window using this url (I have a Medium banner to subscribe but I can close it).

badcppdev · on May 28, 2024

Are you talking about the ad that asks you to consider paying for the content but has a little X to dismiss the dialog? Does Medium disable that X in different regions or is there another blocker I'm not seeing?

TheRoque · on May 28, 2024

Sometimes you can bypass this with private navigation, or using another browser

deanCommie · on May 28, 2024

Kafka is a disease.

It's a queue, it's a stream, it's a journal, it's an event bus, it's a database.

It's also apparently a workflow engine now?

It annoys me so much how appealing this is to a median developer. "Wow I just get to learn this one tool and then I can use it for EVERYTHING!" is not something competent craftsmen/professionals do in any other field, yet in software it's all the rage.

"The way we are building event-driven applications is misguided"

Maybe the way you are! Don't overcomplicate/overabstract things so much!

HelloNurse · on May 29, 2024

What have your predecessors done with Kafka? What did Kafka do to your predecessors? Aren't you thrilled to find out? It's enterprise enough to be an endless source of surprises!

tibanne · on May 28, 2024

I was waiting for the sell. Didn't disappoint.