In the current benchmarks, I only have Kafka and rocksdb wal, will surely try to add redpanda there as well, curious how walrus would hold up against seastar based systems.
I don't see any mentions of p99 latency in the benchmark results. Pushing gigabytes per second is not that difficult on modern hardware. Doing so with reasonable latency is what's challenging. Also, instead of using custom benchmarks it's better to just use the OMB (open-messaging benchmark).
Hi, the creator here, I think its a good idea to have S3 backed storage mode, its kinda tricky to do it for the 'active' block which we are currently writing to, but totally doable for historical data.
Also about the kafka API, I tried to implement that earlier, I had a sort of `translation` layer for that earlier, but it gets pretty complicated to maintain that because kafka is offset based, while walrus is message based.
TBH I don't think anyone can utilise S3 for the active segment, I didn't dig into Warpstream too much, but I vaguely recall they only offloaded to S3 once the segment was rolled.
The Developer Voices interview where Kris Jenkins talks to Ryan Worl is one of the best, and goes into a surprising amount of detail: https://www.youtube.com/watch?v=xgzmxe6cj6A
tl;dr they write to s3 once every 250ms to save costs. IIRC, they contend that when you keep things organized by writing to different files for each topic, it's the Linux disk cache being clever that turns the tangle of disk block arrangement into a clean view per file. They wrote their own version of that, so they can cheaply checkpoint heavily interleaved chunks of data while their in-memory cache provides a clean per-topic view. I think maybe they clean up later async, but my memory fails me.
I don't know how BufStream works.
The thing that really stuck with me from that interview is the 10x cost reduction you can get if you're willing and able to tolerate higher latency and increased complexity and use S3. Apparently they implemented that inside Datadog ("Labrador" I think?), and then did it again with WarpStream.
I highly recommend the whole episode (and the whole podcast, really).
s3 charges per 1,000 Update requests, not sure how it's sustainable to do it every 250ms tbh, especially in multi tenant mode where you can have thousands of 'active' blocks being written to
Hey HN! recently I've been working on Walrus, a distributed message streaming engine that combines high-performance log storage with Raft-based coordination.
What makes it different:
- Segment-based sharding with automatic load balancing – Topics split into ~1M entry segments, leadership rotates
round-robin on rollover. No manual partition management.
- Lease-based write fencing – Only the segment leader can write. Leases sync from Raft metadata every 100ms, preventing
split-brain without coordination on the data path.
- Sealed segment reads – Old segments stay on the original leader after rollover. No data movement, reads scale with
replicas.
- Simple TCP protocol – Connect to any node, auto-forwarding to the right leader. Commands are just PUT topic payload
and GET topic.
Performance: The underlying storage engine hits ~1.2M writes/sec (unsynced) and ~5K writes/sec (fsynced), competitive
with Kafka and faster than RocksDB in benchmarks.
Correctness: Includes a TLA+ spec verified with TLC covering write fencing, rollover mechanics, and cursor advancement.
The repo has both the distributed system (distributed-walrus/) and the standalone storage engine library (walrus-rust on crates.io).
Would love feedback on the architecture, especially around the lease synchronization approach and sealed segment design!
They mostly play nicely together because they operate at different layers. Quinn's flow control handles transport level backpressure (receiver can't consume bytes fast enough), which naturally surfaces to OpenRaft as slower RPC responses. OpenRaft then handles consensus level backpressure by tracking replication progress and adjusting accordingly (e.g., switching to snapshots for lagging peers), it just works. The main benefit is that QUIC's built in flow control means I didn't need manual buffering logic that you'd typically implement over raw TCP.
I recently open sourced Octopii, A batteries-included framework for building distributed systems, it bundles everything you need to build distributed systems without hunting for individual components.
Redis is single-threaded for dataset-modifying commands (SET, DEL, etc.), but multi-threaded for networking and I/O. nubmq, on the other hand, is fully multi-threaded—distributing reads and writes across all cores. So yes, this is fundamentally a comparison between Go’s concurrency model and Redis’s single-threaded event loop.
For benchmarking, I used the same test script (included in the repo) for both Redis and nubmq—100 concurrent clients hammering the server with requests. The core execution models remain unchanged; only the endpoints differ. So if you're asking whether the comparison reflects actual engine throughput rather than just connection handling—yes, it does.
The real ‘secret sauce’ is how nubmq prevents bottlenecks under load. When per-shard contention spikes, it spins up a larger store in the background, shifts new writes to it, and lazily migrates keys over. No pauses, no blocking. Meanwhile, Redis has no way to redistribute the load—it just starts choking when contention builds.
Also, Redis carries Lua scripting overhead in some cases. nubmq skips all that—pure Golang, no dependencies, no interpreter overhead
reply