Temporal is an implementation of a paradigm I got interested in back in 2019. I wasn’t at one of those companies that had heard about Cadence, so when I was searching around to see if anyone had actually already built this idea I’d come up with, I stumbled upon Zenaton. It’s no longer around, didn’t get PMF, so I was happy when Temporal came out of stealth mode a few months later - was nice to have my intuition in this area validated.
We’ve been using Temporal quite successfully in Go (and more recently Python) for a little while now. It could do with being a bit easier to get up and running with, but day-to-day usage is very nice. I don’t think I could go back to plain out message queues, this paradigm is a real time saver.
The biggest challenge is deciding how many things are nails for the hammer that is Temporal. You tend to start out using it to replace an existing mess of task orchestration; but then you realise its actually a pretty good fit for any write operation that can’t neatly work in a single database transaction (because it’s hitting multiple services, technologies, third parties etc).
You have to be careful to keep your workflows deterministic, but once you get used to the paradigm, it’s enjoyable.
This post talks about the durable execution systems, which include Azure Durable Functions, Amazon SWF, Uber Cadence, Infinitic, and Temporal.
Durable execution systems run our code in a way that persists each step the code takes. If the process or container running the code dies, the code automatically continues running in another process with all state intact, including call stack and local variables.
Durable execution makes it trivial or unnecessary to implement distributed systems patterns like event-driven architecture, task queues, sagas, circuit breakers, and transactional outboxes. It’s programming on a higher level of abstraction, where you don’t have to be concerned about transient failures like server crashes or network issues.
No, the sample app is 100% Temporal backend, but you can incrementally adopt—writing durable functions for specific processes. Usually companies start out with things that are either long running or for which reliability is particularly important, like financial transactions. Then they learn that it can be more generally useful, and expand use cases gradually.
The point is that you can write code instead of JSON/YAML like traditional microservice orchestration like AWS step functions. And it’s not a limited dsl—you have the full lang at your disposal, with the one requirement that deterministic code (workflows / the durable code) is in separate functions from non deterministic code (like making a network request, called “Activities”).
Not trying to argue. Genuinely curious if you think this is within the realm of "not take anything special into account" (be forced to use a specific SDK and lay your logic out in the exact way it supports) or if you didn't know this was referring to Temporal?
I thought this comment chain was about durable execution in general. Temporal seems to be that plus some RPC stuff that is a lot more than "nothing special."
All the durable execution systems have to run your code in certain way that persists steps like RPCs (and need to provide a mechanism for you to tell the system which functions have RPCs) so they can recover in case of process failures. They all also happen to provide common orchestrator features like retries and timeouts because devs find it useful.
Never heard of durable execution until now, but I've wondered about it. When I write backend code, I have to keep asking myself "what happens if the server goes down during this line of code?" This is often an issue in the middle of a customer order, like the example here. I end up relying on the database for very many tiny little things, like recording the fact that the user initiated an order before I start to process it.
But how fast is this? IIRC each little insert in my DB was taking like 5ms, which would add up quickly if I were to spam it everywhere; I assume durable execution layers are better optimized for that. Do they really only snapshot before and after async JS calls, treating all other lines as hermetic and thus able to be rerun?
Yeah, I’ve also written this write-to-db-after-each-meaningful-line-of-code style code, and this is a great improvement. See the first 20m of this talk for an example: https://youtu.be/EFIF8gk9zy8
Starting a workflow is currently ~40ms, and I think we’ll be able to get down to 10ms this year. How long it takes to complete depends on how many persisted steps it takes (and whether it has to wait on an external event). The only steps that are persisted are workflow api calls like sleep(), startChildWorkflow(), or calling code that might fail (ie “Activity”, like a network request).
> The only steps that are persisted are workflow api calls like sleep(), startChildWorkflow(), or calling code that might fail (ie “Activity”, like a network request).
Ok, that's what I was wondering. Makes a lot more sense this way.
We have Go and Java SDKs that have better performance characteristics if that’s what you’re optimizing for. I think for many businesses, optimizing for development speed is a higher priority (eg if the devs already know JS, use that). The Node runtime with v8 isolates is also able to better protect developers from writing non deterministic code (durable code must be deterministic). More info on that: https://temporal.io/blog/intro-to-isolated-vm
We’ve been using Temporal quite successfully in Go (and more recently Python) for a little while now. It could do with being a bit easier to get up and running with, but day-to-day usage is very nice. I don’t think I could go back to plain out message queues, this paradigm is a real time saver.
The biggest challenge is deciding how many things are nails for the hammer that is Temporal. You tend to start out using it to replace an existing mess of task orchestration; but then you realise its actually a pretty good fit for any write operation that can’t neatly work in a single database transaction (because it’s hitting multiple services, technologies, third parties etc).
You have to be careful to keep your workflows deterministic, but once you get used to the paradigm, it’s enjoyable.