More

KraftyOne · 2025-10-03T16:04:11 1759507451

DBOS also has a full-fledged workflow visualization and management UI: https://docs.dbos.dev/golang/tutorials/workflow-management

atombender · 2025-10-03T16:10:57 1759507857

Not in the open source version? It requires the commercial Conductor thing.

KraftyOne · 2025-10-03T15:07:47 1759504067

Yes, there's a full workflow visualization/management interface (not embeddable though): https://docs.dbos.dev/golang/tutorials/workflow-management

KraftyOne · 2025-10-03T03:48:21 1759463301

We may be a small startup, but we're growing fast with no shortage of production users who love our tech: https://www.dbos.dev/customer-stories

KraftyOne · 2025-10-03T03:46:36 1759463196

The durability guarantees are similar--each workflow step is checkpointed, so if a workflow fails, it can recover from the last completed step.

The big difference, like that blog post (https://www.dbos.dev/blog/durable-execution-coding-compariso...) describes, is the operational model. DBOS is a library you can install into your app, whereas Temporal et al. require you to rearchitect your app to run on their workers and external orchestrator.

dfee · 2025-10-03T04:12:06 1759464726

This makes sense, but I wonder if there’s a place for DBOS, then, for each language?

For example, a Rust library. Am I missing how a go library is useful for non-go applications?

KraftyOne · 2025-10-03T04:15:19 1759464919

There are DBOS libraries in multiple languages--Python, TS, and Go so far with Java coming soon: https://github.com/dbos-inc

No Rust yet, but we'll see!

KraftyOne · 2025-10-03T03:44:11 1759463051

Yes, in any durability framework there's still the possibility that a process crashes mid-step, in which case you have no choice but to restart the step.

Where DBOS really shines (vs. Temporal and other workflow systems) is a radically simpler operational model--it's just a library you can install in your app instead of a big heavyweight cluster you have to rearchitect your app to work with. This blog post goes into more detail: https://www.dbos.dev/blog/durable-execution-coding-compariso...

jiggunjer · 2025-10-03T04:14:49 1759464889

Oh I see. Seems Nextflow is a strong contender in the serverless orchestrator market (serverless sounds better than embedded).

From what I can tell though, NF just runs a single workflow at a time, no queue or database. It relies on filesystem caching for "durability". That's changing recently with some optional add-ons.

bjornsing · 2025-10-03T06:14:06 1759472046

> Yes, in any durability framework there's still the possibility that a process crashes mid-step, in which case you have no choice but to restart the step.

Golem [1] is an interesting counterexample to this. They run your code in a WASM runtime and essentially checkpoint execution state at every interaction with the outside world.

But it seems they are having trouble selling into the workflow orchestration market. Perhaps due to the preconception above? Or are there other drawbacks with this model that I’m not aware of?

1. https://www.golem.cloud/post/durable-execution-is-not-just-f...

vineyardmike · 2025-10-03T10:33:56 1759487636

That still fundamentally suffers the same idempotency problem as any other system. When interacting with the outside world, you, the developer, need to be idempotent and enforce it.

For example, if you call an API (the outside world) to charge the user’s credit card, and the WASM host fails and the process is restarted, you’ll need to be careful to not charge again. This can happen after the request is issued, but before the response is received/processed.

This is no different than any other workflow library or service.

The WASM idea is interesting, and maybe lets you be more granular in how you checkpoint (eg for complex business logic that is self-contained but expensive to repeat). The biggest win is probably for general preemption or resource management, but those are generally wins for the provider not the user. Also, this requires compiling your application into WASM, which restricts which languages/libraries/etc you can use.

bjornsing · 2025-10-03T14:00:41 1759500041

The challenges around idempotency remain to some extent, yes. But you have that problem even in non-workflow code, so the usual patterns will just work with no extra mental effort from the developer.

qianli_cs · 2025-10-03T06:28:10 1759472890

I think one potential concern with "checkpoint execution state at every interaction with the outside world" is the size of the checkpoints. Allowing users to control the granularity by explicitly specifying the scope of each step seems like a more flexible model. For example, you can group multiple external interactions into a single step and only checkpoint the final result, avoiding the overhead of saving intermediate data. If you want finer granularity, you can instead declare each external interaction as its own step.

Plus, if the crash happens in the outside world (where you have no control), then checkpointing at finer granularity won't help.

bjornsing · 2025-10-03T10:20:16 1759486816

Sure you get more control with explicit state management. But it’s also more work, and more difficult work. You can do a lot of writes to NVMe for one developer salary.

jedberg · 2025-10-03T16:03:01 1759507381

It's not really more work to be explicit about the steps and workflows. You already have to break your code into steps to make your program run. Adding a single decorator isn't much extra work at all.

jedberg · 2025-10-03T16:01:05 1759507265

The biggest downsides to their methodology are that the snapshots can get really big really quickly, and that they are hard to introspect since they are binary blobs of memory dumps.

bjornsing · 2025-10-04T03:16:13 1759547773

Yeah the whole methodology depends on forgetting about state and treating it as a long-running program. If you need to look at the state then you connect a debugger, etc.

KraftyOne · 2025-10-03T03:41:27 1759462887

Yeah, queue priority is natively supported: https://docs.dbos.dev/golang/tutorials/queue-tutorial#priori...

KraftyOne · 2025-10-03T03:40:57 1759462857

1. We also have support for Python and TypeScript with Java coming soon: https://github.com/dbos-inc

2. There are built-in APIs for managing workflow recovery, documented here: https://docs.dbos.dev/production/self-hosting/workflow-recov...

3. We'll see! :)

travisgriggs · 2025-10-03T04:01:33 1759464093

Elixir? Or does Oban hew close enough, that it’s not worth it?

KraftyOne · 2025-10-03T03:35:59 1759462559

The specific claim is that workflows are started exactly-once in response to an event. This is possible because starting a workflow is a database transaction, so we can guarantee that exactly one workflow is started per (for example) Kafka message.

For step processing, what you say is true--steps are restarted if they crash mid-execution, so they should be idempotent.

reillyse · 2025-10-03T05:23:13 1759468993

"Exactly-Once Event Processing" is the headline claim - I actually missed the workflow starting bit. So what happens if the workflow fails? Does it get restarted (and so we have twice-started) or does the entire workflow just fail ? Which is probably better described as "at-most once event processing"

qianli_cs · 2025-10-03T05:42:57 1759470177

I think a clearer way to think about this is "at least once" message delivery plus idempotent workflow execution is effectively exactly-once event processing.

The DBOS workflow execution itself is idempotent (assume each step is idempotent). When DBOS starts a workflow, the "start" (workflow inputs) is durably logged first. If the app crashes, on restart, DBOS reloads from Postgres and resumes from the last completed step. Steps are checkpointed so they don't re-run once recorded.

reillyse · 2025-10-03T18:46:03 1759517163

Why would u need exactly once semantics if the workflow is idempotent?

You specifically need exactly once when the action you are doing is not idempotent.

bjornsing · 2025-10-03T06:17:40 1759472260

"Exactly-Once Event Processing" is possible if (all!) the processing results go into a transactional database along with the stream position marker in a single transaction. That’s probably the mechanism they are relying on.

KraftyOne · 2025-09-09T16:51:55 1757436715

Kafka is great for streaming use cases, but the big advantage of Postgres-backed queues is that they can integrate with durable workflows, providing durability guarantees for larger programs. For example, a workflow can enqueue many tasks, then wait for them to complete, with fault-tolerance guarantees both for the individual tasks and the larger workflow.

dbacar · 2025-09-09T18:01:51 1757440911

I guess if you use different topics(queues) in Kafka you can do all this by the help of a processor like Storm, Spark etc, routing messages to different topics hence a workflow.

chatmasta · 2025-09-09T19:33:38 1757446418

Huh? Kafka messages are durable just like Postgres commits are durable. That’s why it’s used for things like Debezium that need a durable queue of CDC messages like those from the Postgres WAL.

There’s nothing inherently different about the durability of Postgres that makes it better than Kafka for implementing durable workflows. There are many reasons it’s a better choice for building a system like DBOS to implement durable workflows – ranging from ergonomics to ecosystem compatibility. But in theory you could build the same solution on Kafka, and if the company were co-founded by the Kafka creators rather than Michael Stonebraker, maybe they would have chosen that.

KraftyOne · 2025-09-08T17:08:56 1757351336

Observability is a big advantage, another advantage (in the context of DBOS specifically) is integration with durable workflows, so you can write a large workflow that enqueues and manages many smaller tasks.