Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing that can make real world text CRDT implementations complex is that the optimisations kinda bleed into all the rest of your code. The 2 big optimisations you want for most text CRDTs - including egwalker - are:

- Using a b-tree instead of an array to store data

- Use internal run-length encoding. Humans usually type in runs of characters. So store runs of operations instead of individual operations. (Eg {insert "abc", pos 0} instead of [{insert "a", pos 0}, {insert "b" pos 1}, {insert "c" pos 2}]).

But these two ideas also affect one another. Its not enough to just use a b-tree. You need a b-tree which also stores runs. And you also need to be able to insert in the middle of a run. And so on. You need some custom collections.

If you do run-length encoding properly, all iteration throughout your code needs to make use of the compressed runs. If any part of the code works character-by-character, it'll become a bottleneck. Oh and did I mention that it works even better if you use columnar encoding, and break the data up into a bunch of small arrays? Yeahhhh.

So thats why diamond types - my optimized egwalker implementation - is tens of thousands of lines of code instead of a few hundred. (Though in my defence, it also includes custom binary serialization, testing, wasm bindings, and so on.)

Rust makes the implementation way easier to implement thanks to traits. I have simple traits for data that can be losslessly compressed into runs[1]. A whole bunch of code takes advantage of that, by providing tooling that can work with a wide variety of actual data. For example, I have a custom vec wrapper that automatically compresses items when you call push(). I have a "zip" iterator which glues together other iterators over run-length encoded data. And so on. Its great.

Though now that I think about it, maybe all that trait foo is what makes it headache inducing. I swear its worth it.

[1] Eg MergableSpan: https://github.com/josephg/diamond-types/blob/00f722d6ebdc9f...



Jeezs. Thanks for the breakdown. I suppose the layering of different, complicated patterns make it too thick to parse. And some of the CRDT APIs leak quite a bit of complexity once you want to do something a little more complicated eg wrap rich text editor content.


I put at least some of the blame on text editors themselves. Many text editors - particularly on the web - don't expose a clean event based API telling you what changed. Its very annoying.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: