Hey HN! recently I've been working on Walrus, a distributed message streaming engine that combines high-performance log storage with Raft-based coordination.
What makes it different:
- Segment-based sharding with automatic load balancing – Topics split into ~1M entry segments, leadership rotates
round-robin on rollover. No manual partition management.
- Lease-based write fencing – Only the segment leader can write. Leases sync from Raft metadata every 100ms, preventing
split-brain without coordination on the data path.
- Sealed segment reads – Old segments stay on the original leader after rollover. No data movement, reads scale with
replicas.
- Simple TCP protocol – Connect to any node, auto-forwarding to the right leader. Commands are just PUT topic payload
and GET topic.
Performance: The underlying storage engine hits ~1.2M writes/sec (unsynced) and ~5K writes/sec (fsynced), competitive
with Kafka and faster than RocksDB in benchmarks.
Correctness: Includes a TLA+ spec verified with TLC covering write fencing, rollover mechanics, and cursor advancement.
The repo has both the distributed system (distributed-walrus/) and the standalone storage engine library (walrus-rust on crates.io).
Would love feedback on the architecture, especially around the lease synchronization approach and sealed segment design!
What makes it different:
- Segment-based sharding with automatic load balancing – Topics split into ~1M entry segments, leadership rotates round-robin on rollover. No manual partition management. - Lease-based write fencing – Only the segment leader can write. Leases sync from Raft metadata every 100ms, preventing split-brain without coordination on the data path. - Sealed segment reads – Old segments stay on the original leader after rollover. No data movement, reads scale with replicas. - Simple TCP protocol – Connect to any node, auto-forwarding to the right leader. Commands are just PUT topic payload and GET topic.
Performance: The underlying storage engine hits ~1.2M writes/sec (unsynced) and ~5K writes/sec (fsynced), competitive with Kafka and faster than RocksDB in benchmarks.
Correctness: Includes a TLA+ spec verified with TLC covering write fencing, rollover mechanics, and cursor advancement.
The repo has both the distributed system (distributed-walrus/) and the standalone storage engine library (walrus-rust on crates.io).
Would love feedback on the architecture, especially around the lease synchronization approach and sealed segment design!
reply