Be aware of marketing blog posts like this (looking at you Confluent, Databricks and Camunda!).
Another "forget this style, only use our new framework".
What he forgets to mention, and what is probably not in the scope of "our new framework that does everything better", is that it is totally valid to use orchestration and choreography.
Same as Confluent pushes the case to use choreography, this post pushes to just use orchestration.
That's wrong. From my point of view:
1. use choreography for small to medium services (for example one domain)
2. optionally orchestrate them on a higher level (for example between domains)
Don't orchestrate all your business logic for all services down to the smallest event/command!
Also: You don't *need* to store all your events forever. This pattern, often pushed, comes with downsides as well. Maybe you have a good case to do so, but don't fall into the trap to just use this pattern because you've read that in some blog post!
Alternatively, you can also just go the route of those vendor blog posts and pay later for the lock in.
So it is one thing that you might want to assess if the patterns that were described in the article actually fit your use case and then adopt the one that fits better.
But putting this sentiment to the extreme, you could also say: For very "small" use cases, that is systems where scalability might not be an issue (right now and in the future), you might want to dispense with any kind of distributed system altogether and just program a monolith. Because if you have tightly-coupled parts of a more complex system that cannot really function if at least one is down, you might as well put them in the same program.
Again, the main drawback will be horizontal scalability. But in a small enough company your "process engine" can probably be run a small VM anyway.
> it is totally valid to use orchestration and choreography. Same as Confluent pushes the case to use choreography, this post pushes to just use orchestration.
What's the metaphor here? As far as I can see, orchestration is about coordinating the timing of several simultaneous and interdependent musical performances, while choreography is about coordinating the timing of several simultaneous and interdependent dance performances. But I strongly suspect that the difference between auditory and visual display isn't what you're supposed to get out of this terminology. How am I supposed to remember which approach is which?
Neither orchestration nor choreography involves agents reacting to other agents; there's an objective clock and everyone takes their cues from that.
People need to dream up new jargon to get other people to feel like their design patterns are “official”, I guess. I was wondering the same thing. It seems pretty arbitrary.
I think the analogy is clear when I think of it as:
- Orchestration brings the idea of a conductor, who is the main reference for "what should we do" among orchestra players.
- Choreography brings the idea of dancers in sync without the need of a central conductor.
> Choreography brings the idea of dancers in sync without the need of a central conductor.
Try that and let me know how it works.
In reality, the dancers are of course synced by the music, dependent on the same conductor as the orchestra.
And on top of that, orchestration isn't the conductor's job. It's the composer's (or arranger's) job, putting it exactly parallel to choreography. The orchestrator, like the choreographer, determines who does what and when they do it, and he does it in advance. Most typically, years or centuries in advance. The conductor determines how fast the clock runs.
Interesting. A counter example, WWE (wrestling) is choreographed. The wrestlers react to the cues of the other wrestlers. It's not necessarily based on time or music, but instead a pre-agreed sequence.
I think the catch is that not all cues need to be time based and that is the distinction. In orchestration, there is one source for cues - the orchestrator.
The difference I think speaks to orchestration where the players get their cues from one source, while choreography has different source(s) for cues (time/tempo perhaps being one of them)
> In orchestration, there is one source for cues - the orchestrator.
Just to be clear, everyone's talking about the conductor, who keeps time for the orchestra, but conductors don't do orchestration. The orchestrator is the person who wrote the score.
Good point, for the analogy, I should have written conductor instead of orchestrator. The point remains though that the difference is in the source of synchronization.
A good few years ago when I worked at a security consultancy that catered to a bunch of startups (only some of whom survived long enough to pay the invoices…) we referred to certain development patterns as “HackerNews Driven Development” lol.
I guess Medium driven development is the newer edition of this?
I like this choregraphy inside a domain, orchestration across domains rule of thumb.
I think it maps to what we did a few years ago at previous enployer (without actually formalising like this).
Some cross-domain things are inherently orchestration problems (like GDPR deletion)
It is quite tedious when discussing kafka people are spouting just the contents of confluent blog posts, like you must use X serialisation, you must use X registry, when in a lot of cases you can just use protobuf.
Registry seems pretty pointless to me, and I can’t understand personally how anyone thought it was a good idea. What was the thought process? ‘oh yeah let’s write a service that our service has to query to work out how to understand the message it just received and which will crash everything if that schema service isn’t there’ - it’s microservices gone mad!
Why worry about what services talk what version of the schema when the service is guaranteed to be able to understand a message?
As with yesterday's "big data" article, the usual question is whether you have enough transaction volume to need all this stuff. If you can organize your web site so that most traffic is read-only files, and the heavy machinery only turns on when somebody does a "buy" or something important, the whole problem probably fits into a CRUD app.
There's a useful insight in there - fan-in and fan-out are hard. I hit this in a completely different context - asset handing for a game-type client. Events come in ordering the creation of object X using assets A, B, and C. It may take several asset fetches from the network to draw something. An asset may be used for multiple drawable objects. There's caching and concurrent asset fetching. A request to create X has to start fetches for A, B, and C (they might be in cache, though) and wait until those fetches complete. A request for Y might want B, C, and D. Items should be fetched from the network only once, of course. This is a messy coordination problem.
Problems like this, with fan-in, fan-out, and concurrency, map badly to the standard paradigms. Neither threads with blocking nor "async" help much. The problem can be visualized as set operations, but can't easily be implemented that way.
A single thread event driven coordination loop works, but it's kind of clunky.
So I started reading the article hoping for a theoretical breakthrough on fan-in, fan-out problems. No such luck.
The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations.
But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.
Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.
> The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations. But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.
Incremental updates to dynamic dependency graphs is a familiar problem for build tooling. I personally have used the taskflow C++ library (https://github.com/taskflow/taskflow) to great effect.
> Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.
Incremental view maintenance is an active area of research. The likes of Noria and Materialize have done this with SQL, and the pg_ivm Postgres extension looks promising. Not sure if there is an equivalent implementation geared towards entity-component systems, though.
I strongly agree with the premise that an orchestrator-centric approach is preferable to event spaghetti for a lot of business process use cases.
Another (maybe more mature) alternative to Infinitic is Camunda, who have been pushing this rhetoric for over a decade.
My problem with both is that neither feel very modern, in the sense that they are tied to (in Infinitic's case) or strongly favor (in Camunda's case) Java-centric development, don't have a great developer experience story, and don't feel very cohesive with the language.
What I'd love to see is someone tackling this for smaller orgs where the above tradeoff in developer experience isn't worth the gains, i.e. orgs with <100 engineers. Something where you have a single source of truth for your processes and schemas, with version control, that directly yields strongly typed integrations & hooks into any language, with a great local developer experience and deployment story.
Camunda requires too many magic strings and schemas to be kept consistent across services for that, and Infinitic forces me to use Java or Kotlin which I don't want.
A "simple" solution might be to have declarative process and schema definitions in some DSL for version control, which auto-generates schemas, types etc in whatever language, giving me both strong types, intuitive local development, and clear deployment story through my existing CI/CD.
Thank you for your comment. Highlighting the need for a descriptive format to orchestrate services implemented in multiple languages is a valid point. While Infinitic currently uses Java's interfaces, adopting a more generic and language-agnostic solution like Protobuf is a sensible approach that could promote better interoperability across different tech stacks. Then we need a DSL. Again, Infintic is using Java/Kotlin, but I'm thinking to the ideal solution as a new simple language in which Protobuf object are first-class citizen and that can be resumable (i.e whose internal state can be safely stored to be resume later). It's not as easy as it sounds as a lot of legit issues arise regarding versioning.
I worked with somebody whose brain just didn't brain with mine. anything she wrote was unreadable to me, and anything I wrote was not "simple" or "understandable" enough.
Look at the biased way you write about the situation. This is the fundamental attribution error. If you can't read her code, it's her fault (you implied that). And if she can't read your code, it's also her fault (you implied that).
I said it was unreadable to me. I meant what I wrote. I did not imply it was her fault. it was presumably readable to others in the team, just not to me. the only way I could figure out what was going on was by adding a lot of printf and running small parts of the code to work it out. the documentation really didn't tell me anything that I needed to know for that purpose.
the quotes represent words that she actually said. in the end I did re-write my code in a way that she would finally approve the review and now I can't read the code either.
she's diagnosed with autism. I'm diagnosed with autism. not only are we not neurotypical, but we're also so different that our code was mutually unintelligible.
Hmmm. I think this is interesting, but I’m not sure this is an “architecture”.
A workflow or “orchestrator” as defined in this article looks like another service. Unless I’m missing something, it’s a state-machine where inputs/variables are the events of other services, and where transitions trigger “commands”.
Don’t get me wrong, I see the value in pulling this logic into a central place/process in order to perform “orchestration”, I just don’t see the value in yet another framework. This is a pattern that pops up in every distributed system I’ve worked on, but it is just one of many.
In many cases, a synchronous request response end-point is all that is necessary/desired for a service tied to a customer interaction. In others, a service processing a queue backlog (with or without automatic fan-out). In others publish-subscribe seems the best fit. In others, a more complex state-machine managing several different events, transitions, and outputs.
Can you model everything as a state-machine? Yes. Should you? I would argue that you want a range of tools in the toolbox.
This 'orchestrator' pattern merely re-introduces tight-coupling via a one-way async interface. The producer writes commands destined for the executor of the command. It's basically async RPC with transparent host lookup and no return values.
I'm not saying it doesn't have its uses, but be wary of tight coupling. If you see development patterns like first updating the consumer to enable a change to a command, then releasing the upstream change to use the new capability, you'll know that you're not working with a decoupled system. Another likely red-flag is never documenting payload structures, so that behaviour is fully implementation-defined async-spaghetti.
My current situation is that I'm having to work with an event driven architecture for a strictly synchronous process.
Buzzword architecture is awful, and having Kafka offer fake design patterns to reassure naive architects, such as a request/reply with small response timeframes.
Attempting to replicate REST with overcomplicated overhead.
This is the biggest issues I encounter with event driven applications.
pedantic note. nobody talking about REST really does REST. Where is the representational state. REST is when you ship the whole application in the message to the far side to let them figure it out.
Now that I've got that stupid pedantry off my chest, Yeah I know what you mean when you say REST we all know what is meant when REST is brought up. But I find it funny how Roy Fielding said "wow, it is really cool how web pages ship the whole application to the user on each request lets talk about that" and everyone else basically went "ok HTTP == REST, got it"
> Ah ha, but we don't use REST that's my complaint!
But you are correct, although I have used representational state through the request in this case I'm just talking about using a message bus Vs using a blocking http call.
Additional annoyance, using inappropriate tech means that we can't even do scaling to handle spikes. Having a Kafka rebalance because traffic is a little higher so I need to increase the number of listeners and cause an outage when I actually need more throughput makes me sad.
Hear, hear. A few years ago I worked for an esteemed client whose director said «I want Kafka» for [mostly] synchronous interaction patterns. I said «no», the director said «I want Kafka». That repeated a few more times, and I had to begrudgingly comply since the client was paying for it.
The architecture for the solution drew the inspiration pretty heavily from the TCP protocol design, with incoming events yielding either an ACK or a NACK events also dispatched asynchronously. The key was the stringent adherence to supplying the original event ID in the Reply-To event envelope (hello, SMTP) and having a separate process correlating the inbound and outbound ACK/NACK events. It has wrought success in times bygone (and does persist unto this present day), yet I must, with all due deference, decline to partake therein henceforth.
Indeed. Using a streaming platform makes sense only if you need messages durability (and sometimes reusability). When implementing a request/reply pattern on top of such a platform (Infinitic does that btw), it necessarily adds some latency due to multiple persistence operations.
Maybe I missed the party trick but under the hood a step function definition will be orchestrated between lambdas (the services).
The only difference to this pattern is you define event inputs and outputs on the service, as part of orchestration, which I feel is absolutely needed to know by consuming services (why wouldn't you, unless you're hoping to avoid object versioning / changes clashes of services by doing that, which is better served by you know, actual versioning tags...).
This completely leaves out the business model, which should always be the driving force behind any architectural decisions.
I don't know what type of cross-domain communication I need until I model something. Any given domain could have many published events. Any given domain could have many commands. Any given domain could have many subscriptions.
The business models of those domains come first. That dictates everything including APIs, data store, commands, and events.
This article is similar to saying all applications should use a relational database or all applications should use AWS Lambdas.
The most important step in any architecture is "It depends."
This is key and it’s something engineers constantly miss. My best guess is because it involves a lot of the stuff that engineers don’t find interesting or require outside help with from the business side of the company.
But I’d say almost all of the interesting and challenging parts of engineering happen here. The rest is just manufacturing and maintenance. (Though I find those fun too!)
The choreography pattern has always felt like a top-down solution to what is fundamentally a bottom-up problem.
Notably the article calls out tracing, which is the bottom-up solution to this. Choreography functions by dictating a flowchart that events move through, so visible comes from that dictation. Tracing functions by watching what events move from which producers to which consumers, and surfacing that organic data.
I'm not sure I see the fundamental advantage of dictating these flows rather than observing and reporting on them with some kind of internally-standardized conventions on message metadata with some standardized tooling to consume that. They seem to end up in a fairly similar situation: producers can see their consumers and vice versa, both sides need to coordinate on changes.
I really don't think it's that much boilerplate, at least compared to the pain of dealing with a choreographer that's failing to scale or just generally bad (and which you now can't replace, because everything is hand-tailored to it).
Sounds very similar to "sagas" in the Domain Driven Design/Event Sourcing space.
The general advice I have here is to not use event-sourcing/event-driven architectures for everything.
Use it where the domain maps well to a process that happens over time. An order system is the canonical example. That doesn't mean your products, customers, accounts, etc all need to be event sourced/event driven as well. It will be fine leaving those managed by a CRUD architecture.
Keep things as simple as possible (but no more simple).
There are plenty of areas where an event-driven/event-sourced architecture will make things easier and others where it will make things way harder. I find if you need a lot of sagas/orchestration... you're delving into the latter.
(I make the distinction between event sourced and event driven as this: the former is about persisting and deriving state; the latter is about communicating changes to other systems).
The author never describes any of the patterns he discusses. He just says "this is what it looks like" followed by a diagram that can't be enlarged on a mobile browser and is too small to read. If this is the level of communication I could expect from this company I'll give them a very wide berth.
My job a while ago was doing business process automation with webMethods[0].
It had some flaws, of course, but the general idea was you drew business processes in a UI tool (or wrote the corresponding code that generated the process, but drawing was easier) and the process was then implemented into messages queues/topics and different types of integrations as part of the deployment process.
Because it was standardised, you could plug in a product that visualised processes, showed where bottlenecks were, let you restart failed ones, etc. In retrospect it was pretty advanced.
These just seem like sagas, but with the drawback that you now have an additional service for every saga (so now you have N+M problems?) Also if you centrally own all business processes, why did you land upon a microservices architecture in the first place?
The idea is that each business process can be implemented through its own service. So a microservice architecture is still very relevant from an organisational point of view. The team in charge of this business process must know the interfaces of the services it uses. But they are the only ones. There is actually less coupling in this organisation, not more.
I think extensive orchestration leads to the orchestrator becoming a single point of failure. If the orchestrator starts converting events to commands then soon it starts storing state and becomes a big ball of mud.
My preferred form of orchestrator is a gatekeeper. A gatekeeper does three things only:
- maintains a table of which service consumes which events and why
- copies events from the outgoing queues to the dedicated incoming queues according to the routing table (or controls access to the incoming queues via RBAC)
- logs the metadata about the messages it has seen
It doesn't inspect the messages in any way, store them or maintain any runtime state at all.
There is no easy way to understand how a specific business process is actually implemented. The whole process is defined by how each service reacts to others. There is no central repository, and the implementation details of business processes are scattered everywhere.
Yes, indeed. There is no easy way because the total system architecture is implicitly defined by the interactions of the components.
There is currently no way to actually express the architecture of such a system as runnable/testable and running code. You have the options of either (a) documenting the architecture or (b) leaving it to be defined implicitly.
Indeed. And you are raising another good point: the testability of such architecture is quite poor. Without a way to represent the architecture as runnable and testable code, it is problematic to ensure the correctness and reliability of the system. While documentation helps, it does not provide the same level of assurance as testable code.
More complicated sure, but adding the ability to mock irrelevant parts and get the whole env up and running locally for dev or test is usually perfectly doable from my experience.
It seems mostly a new rug to sweep complexity under, not necessarily in better ways.
For example, "built-in observability of workflows states" seems to imply very invasive constraints on workflow/orchestrator implementations: maybe a net positive, but contrary to the promise of an "easy way to implement your business logic in full Java".
In this article, I express that the prevalent approach to building event-driven applications using the choreography pattern is misguided and can lead to significant technical debt. As an alternative, I introduce the Infinitic framework as a way to enable teams to implement event driven processes without the pain and complexity of building and managing an event-driven system themselves.
Interesting article Gilles (design patterns are fun!) but in the future maybe worth giving some form of disclosure in the comments.
> Creator of Infinitic orchestration engine (docs.infinitic.io). Building in public at infinitic.substack.com. Previously founder at Zenaton and director at The Family.
hmm, that's strange - it's a special "friend" url and I can access it myself from an incognito window using this url (I have a Medium banner to subscribe but I can close it).
Are you talking about the ad that asks you to consider paying for the content but has a little X to dismiss the dialog? Does Medium disable that X in different regions or is there another blocker I'm not seeing?
It's a queue, it's a stream, it's a journal, it's an event bus, it's a database.
It's also apparently a workflow engine now?
It annoys me so much how appealing this is to a median developer. "Wow I just get to learn this one tool and then I can use it for EVERYTHING!" is not something competent craftsmen/professionals do in any other field, yet in software it's all the rage.
"The way we are building event-driven applications is misguided"
Maybe the way you are! Don't overcomplicate/overabstract things so much!
What have your predecessors done with Kafka? What did Kafka do to your predecessors? Aren't you thrilled to find out? It's enterprise enough to be an endless source of surprises!
Another "forget this style, only use our new framework". What he forgets to mention, and what is probably not in the scope of "our new framework that does everything better", is that it is totally valid to use orchestration and choreography. Same as Confluent pushes the case to use choreography, this post pushes to just use orchestration.
That's wrong. From my point of view:
Don't orchestrate all your business logic for all services down to the smallest event/command!Also: You don't *need* to store all your events forever. This pattern, often pushed, comes with downsides as well. Maybe you have a good case to do so, but don't fall into the trap to just use this pattern because you've read that in some blog post!
Alternatively, you can also just go the route of those vendor blog posts and pay later for the lock in.