You might not need a CRDT

lewisjoe · on Dec 5, 2022

This is the most underrated problem with CRDTs:

> Both paths will allow us to ensure that each replica converges on the same state, but that alone does not guarantee that this state is the “correct” state from the point of view of user intent.

In context of richtext editors, it's easy to converge a rich-text JSON into a single state for all collaborators. What's difficult is to ensure that the converged state is renderable as richtext. For example, is there a table cell that was inserted where a column was deleted?

I wrote a summary of this issue with CRDTs here - https://writer.zohopublic.com/writer/published/grcwy5c699d67... for those interested.

While I'm optimistic it's not impossible to solve, it's surprising why most CRDT literatures don't go into these details on correctness. Appreciate the author for putting this out.

PS: We currently use OT for collaboration in Zoho Writer (https://writer.zoho.com/). I'm looking out for practical CRDT ideas that works well with richtext.

samwillis · on Dec 5, 2022

Exactly, the "C" in CRDT is a little misleading. They are "conflict free" in as much as they are able to merge and resolve in the basic sense. That does not mean that the resulting state is valid or meets your internal schema.

A good example of this is using Yjs with ProseMirror, it's possible for two concurrent edits to resolve to a structure that doesn't meet your ProseMirror schema. An example I have used before is a <figure> element that can contain a single optional <caption> element. It's possible for two users to add that caption to the figure concurrently, Yjs will happily merge its XMLFragment structures and place two captions in the figure. However when this is loaded into ProseMirror it will see that the document does not match its schema and dump the second caption. Fortunately the captions are ordered and so the same state will re resolved by all parties.

This is all ok if you are doing the merging on the front end, but if you try to merge the documents on the server, not involving ProseMirror, the structure you save, and potentially index or save as html, will be incorrect.

Point is, it's still essential with CRDTs to have a schema and a validation/resolution process. That or you use completely custom CRDTs that encodes into their resolution process the validation of the schema.

naasking · on Dec 5, 2022

> Point is, it's still essential with CRDTs to have a schema and a validation/resolution process. That or you use completely custom CRDTs that encodes into their resolution process the validation of the schema.

I'm surprised that MS's concurrent revisions [1] haven't taken off because this is what they do by default: you specify a custom merge function for any versioned data type so you can resolve conflicts using any computable strategy.

[1] https://www.microsoft.com/en-us/research/project/concurrent-...

lijogdfljk · on Dec 5, 2022

Yea, this is kinda why i don't understand a lot of these off the shelf CRDT solutions. Like CRDTs for JSON.

I'm still trying to learn CRDTs; but early on in my learning process i had the thought that CRDTs don't have mere data types - as you say, they have Data Type + Conflict Resolution bundled into one. It's not enough to make a String type, because different types need to be resolved differently. Plus some types of resolution have a lot more metadata, so you want to choose the best fitting one.

I found that my goal to make SQL + CRDT meant i had to expose this combo of Type + Resolution to the end user. It seems essential to how the app works.

But maybe i don't know anything, /shrug. I'm still learning them.

toomim · on Dec 5, 2022

Yes! We call this the "Merge Type" of data in the braid.org group.

Each datum as both a data type and a merge type. The programmer just needs to specify these two types, and then the programming runtime can choose which synchronization algorithm to use.

preseinger · on Dec 6, 2022

Er, how can you express the merge operation as a type?

toomim · on Dec 6, 2022

See https://raw.githubusercontent.com/braid-org/braid-spec/maste...

And for a more intuitive explanation, this might help: https://braid.org/demo/interact#a-merge-type-defines-how-to-...

preseinger · on Dec 7, 2022

> A Merge Type is a function

Er, okay, if you redefine "type" to mean "function" then sure.

lewisjoe · on Dec 5, 2022

Your example is spot on :)

I believe we shouldn't lock our richtext state to Prosemirror's specs. The state should better be modelled as a CRDT and ensuring correctness should be part of CRDT operations. This way we unlock other possibilities like server-side merging and the editing library can be swapped (eg. lexical)

All this sounds right but it's lot of work.

josephg · on Dec 5, 2022

Right, but the problem is that most programmers don't want to reason about this stuff. The danger is that things will feel like they "just work" locally (when there's no concurrent edits) and then go potentially off the rails when actual users start editing together.

We can solve this by educating developers and giving them different merge strategies to use, but I worry that most devs won't bother most of the time. So long as the merge resolution is good enough, they'll just shrug and accept it.

That said, most collaborative editing happens online. People don't really go offline while editing a google doc with other people. Its fine if two people put their cursor in the same location - they won't edit at the same time, because that would be weird.

And with deep JSON structures, so long as the structure itself can stay intact and parsable, users will usually work on different things to avoid weird editing conflicts. Imagine a giant 3D world (like an MMO) full of lots of objects and scenery. And a team working on building out that world. Artists will probably work on different things, in different areas or different parts of the terrain. Or if they're working on the same thing, they'll do it live & online.

The one big area of CRDT research I'd love to see more of is offline conflict resolution - which should be different than online conflict resolution. Git has the right idea - I want merge conflicts when things are merged together. CRDTs have a superset of the information that git has. We have all the information we need in Yjs / automerge / etc to be able to notice merge conflicts and do whatever we want with them. But as far as I know, nobody has built something like Git on top of a CRDT yet which uses this information to create merge conflicts in a sensible way.

With the CRDT I'm building, for registers I've decided to use MVRegisters (so if multiple writes happen, we keep all conflicting values and can expose that conflict state through the API). But also have a "simple API" which picks an arbitrary, consistent winner so when developers don't care (or when they're making an MVP of their app) everything will still work fine. The (discarded) conflicting values will be in the history anyway if they want to recover it later.

lewisjoe · on Dec 6, 2022

Yes, offline edits tend to generate larger units of delta. The larger the deltas, the more difficult or even impossible it is to merge with a sensible outcome for all parties. Falling back to manual merging will be an useful feature.

But wouldn't this mean the CRDT will be an evergrowing set like git? Is this even practical like when say, working with a thousand page book? To me it looks like it breaks the possibility of garbage collecting portions of CRDT which will make it harder to adopt in real-world apps.

josephg · on Dec 6, 2022

> Is this even practical like when say, working with a thousand page book?

Remarkably, yeah its fine in practice. I've done a lot of work on this recently, doing crazy optimized bit packing for text CRDT data. The numbers are wild - all the metadata (insert positions, versions, etc) ends up only accounting for a few % of the final file size. Almost all of the final file size is the inserted characters themselves. And this is still true for files with complex editing histories (eg reconstructed from git repositories).

Diamond types does LZ4 compression of the inserted text data. In most of my examples, I save more bytes on disk by LZ4 compressing the text than I spend storing all the CRDT metadata. So the file on disk (with full history and change tracking) often ends up smaller than the final document.

randomsearch · on Dec 6, 2022

I'm confused here - please could you explain. Here's what I see would happen:

1. User 1 adds a caption

2. User 2 adds a caption.

3. Both transmit to the server.

4. Server receives the first edit, applies it to the doc, broadcasts the change.

5. Server receives the second edit, sees it is applied to the previous version of the document, rejects the edit.

6. Client that receives the rejection re-bases their edit, attempts to apply it, and it is rejected as it does not conform with the schema.

So the outcome is whichever client hits the server first wins, which is fine. Where's the problem?

rowanG077 · on Dec 6, 2022

The operation you are suggesting would not be based on CRDTs. One of the main points is that they are Conflict-free as the same suggest. So it's not allowed to reject a mutation. The merge operation "magically" has to ensure it will always confirm to the schema. That's precisely the hard part. Defining such a merge operation.

randomsearch · on Dec 8, 2022

Ok, thanks very much.

So what if the server just rebases the edit, and that results in the caption being set a second time? ie most recent wins. Don’t see how this would break a schema?

Encoding “adding a caption” as “add an additional child called caption” would be silly, as we know there can only be one caption. So the op would be “set the caption”.

rowanG077 · on Dec 9, 2022

That is possible. But is probably not what users want. This can quickly give the impression of person x overwrote the caption of person y. Q

simplotek · on Dec 6, 2022

> Exactly, the "C" in CRDT is a little misleading. They are "conflict free" in as much as they are able to merge and resolve in the basic sense. That does not mean that the resulting state is valid or meets your internal schema.

I disagree, the "conflict" in "conflict-free" is not misleading at all. As I see it, your issue with the "conflict" lies in a misunderstanding of what a conflict actually is in the context of CRDTs and what eliminating them entails. Concurrent updates can be free from conflicts but still end up with a semantically inconsistent document, and that is perfectly fine.

The whole point of CRDTs is to allow collaborative editing without resorting to mutex locks or forcing users to retry cancelled updates. That's why Yjs merges two captions added to a figure, and also why other CRDTs don't stop concurrent updates that make no sense semantically, such as having users concurrently writing incoherent sentences. Document-wise, there is no conflict or inconsistence.

CGamesPlay · on Dec 5, 2022

I don't think I've seen any implementations of this, but the table example you listed is solvable. The insight is to stop thinking of the table as a bundle of markup and think about it as itself a primitive CRDT type that contains text in the table cells.

The columns are a CRDT set of column IDs. The rows are a CRDT map of row IDs to CRDT maps of column IDs to cell values, where the cell values are again the same type as your document text. Column headings are a specially formatted row. Rows are ordered using a sequence type.

Removing a row is a normal CRDT map deletion. Removing a column is a normal CRDT map deletion from the columns map. Rendering the table will ignore map entries that don't correspond to real columns.

inglor · on Dec 5, 2022

So with your solution, the inserted table cell would "disappear" from what is rendered?

That's not necessarily (depends on the product) great or even acceptable user experience. As a user I would certainly want a way to know there is an "orphaned" "edited after delete" cell to salvage the data inputted or understand the confusion between editors.

georgelyon · on Dec 5, 2022

The best UX resolution to this kind of issue I have seen is to display “This table cell was added by Bob concurrently to Alice removing its row. How would you like to proceed: recover the row or ignore the change?”

drfuchs · on Dec 5, 2022

Who gets the pop-up? If both Alice and Bob, and they simultaneously give conflicting replies, you're right back to where you started.

jlokier · on Dec 5, 2022

Usually that doesn't happen because their replies don't conflict within the time interval of a round trip between the two of them, so one can see the other's action first, and decide to agree.

But if it does happen, either they get a new popup, eventually, saying that their conflict-resolution decisions also conflicted and what would they like to do, or the system decays to a CRDT-style pre-determined resolution rule for the second-order conflict -- with similar problems to the original CRDT resolution rule we tried to avoid. Things like "give priority to the person who chose to keep information instead of deleting", "give priority to the person with the biggest random number", "merge them by keeping information along with a note in the document about the attempt to delete the column", or "delete the column and put the edited cell in a note". But this time, only when the users conflict after the first popup, so it doesn't occur as often.

I think you're describing a metastability effect which occurs in all distributed systems that don't use pre-determined resolution rules like pure CRDTs. It happens in network protocols, distributed transactions, and consensus protocols in general. If the process you use to resolve depends on input from multiple parties who don't have a way to communicate and could choose conflicting values, there's a non-zero probability of needing another round or to escalate upwards to a higher-level system to resolve.

In many technical systems, provided there's a way to communicate it's not difficult to make the probability of repeated rounds or escalations tend arbitrarily close to zero (but not actually zero). If there's a possibility of network partition, though, the conflict can come back later when communication returns; there's no way to completely make it go away.

josephg · on Dec 5, 2022

Yep. And, to point it out, good CRDTs have all the information they need to implement all of this.

Git marks conflicts when merging. CRDTs could do the same, but I don't know any which have implemented this stuff yet.

tomhallett · on Dec 5, 2022

Stupid question - are there any good resources on User Experience patterns/recipes for implementing OT/CRDTs?

Context - a lot of the literature is about the backend algorithm. But that still leaves individual teams tripping over the frontend interaction design and usability.

Example of what I'm looking for - hex.tech has a video tutorial of their "Cell locking" feature, where the cell gets disabled and a "Mac Lockard has taken control of a cell you were editing" toast notification appears. [1]. This is fully baked and a recipe which if I put in a jira ticket could actually get implemented. :)

I would love to read a book which is a "Don't make me think" and has an example with "This table cell was added by Bob concurrently to Alice removing its row. How would you like to proceed: recover the row or ignore the change?". All the UI's in my head would look like "Clippy" from MS Word - or "Pipey" from Silicon Valley [2], haha.

1: https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...

2: https://www.youtube.com/watch?v=E2YcOV5C2x4

toomim · on Dec 5, 2022

This is close: https://decentpatterns.com/library/

...but my general belief is that UI patterns need to evolve as (1) specific UIs before they become (2) general patterns, and we simply don't have enough good solutions yet to make a pattern library out of them.

tomhallett · on Dec 5, 2022

Very cool - thanks!

Dylan16807 · on Dec 5, 2022

You're not back to where you started. You've gone from a technical conflict to a direct disagreement about what is supposed to happen. There's no technical solution to an edit war. So maybe ask anew about the conflicting answers, or pick one arbitrarily at that point. Or make them play rock paper scissors. But it still helps to ask before it becomes a direct disagreement.

georgelyon · on Dec 5, 2022

Yes, this is the correct behavior. If Alice and Bob disagree about whether or not that row should be in the table you aren't going to fix it with math. More often than not, the expected outcome will be obvious (i.e. spelling correction in a deleted cell can be ignored) and if it isn't obvious, there will need to be human interaction to determine the best path forward.

layer8 · on Dec 5, 2022

This isn’t really a problem. If Alice and Bob keep disagreeing, then it is appropriate that a conflict remains unresolved.

jameshart · on Dec 5, 2022

What you would ‘certainly want’ depends very much on the application this is implemented in. The data isn’t gone from the CRDT table unless the program decides it is, so many ui choices are possible.

georgelyon · on Dec 5, 2022

This comment really needs a “yo dawg” meme.

CGamesPlay · on Dec 5, 2022

I mean, JSON is objects all the way down, CRDTs are conflict-free maps all the way down.

kevsim · on Dec 5, 2022

We also had the OT vs. CRDT discussion with regards to our editor at Kitemaker (kitemaker.co), and also ended up with OT. For us there were a number of factors, but (in addition to the case you mentioned) it came down to the fact that (at that time) most of the CRDT libs like automerge generate huge diffs that we'd need to send over the wire and also that the open source editor we use (SlateJS) had operations that just slotted very nicely into OT.

Your editor feels really nice BTW. Well done!

paulmd · on Dec 6, 2022

> What's difficult is to ensure that the converged state is renderable as richtext. For example, is there a table cell that was inserted where a column was deleted?

Yes. This is one of the fundamental limitations of working at a textual level, which is sort of the local optimum that *nix ended up in. JSON in particular gets suuuuper fucked up if you don't merge/rebase carefully. There's no real syntax for it to grab onto and diff doesn't understand the concept of indentation or commas, so it just turns into an ocean of line-swapping and incorrect block-swapping. Diff also does an excruciatingly poor job in the very common case when everyone is appending to the same area (let's say, the end of the file).

This is pretty much just an inherent weakness of textual matching, what you need is to work on trees of lexical token nodes, or some type of object structure stream like powershell. Not that it's perfect either, I'm sure there are combinations of options that can leave a node in an inconsistent state, but it at least gives you a chance at correctness.

In some cases patience-diff can help, it tries to generate big blocks of changed ranges, hopefully some of the hunks being relatively syntactically well-formed, but there is still no guarantee. There is also JSON-diff which implements such a lexical-tree diff model for diff files, similar to the "jq" util.

https://github.com/zgrossbart/jdd

I think that's also viable for other lexable languages too - it would be cool to sed or find/replace on an tokenized representation of the code, have jquery or jpath style selectors etc. A lot of the time we kinda end up doing this with find/replace anyway.

jkaptur · on Dec 5, 2022

The intuitively obvious approach here is to have some kind of schema-guided or constraint-guided CRDT. That is, you get the guarantee that not only do you end up with valid JSON, you get the guarantee that f(that_json) == true. I imagine this is considerably easier said than done, but I'm curious if there's research (or results!) in the area.

quickthrower2 · on Dec 5, 2022

Parts of the tree with “don’t merge children” might be ok, then you show the user a conflict dialog when this happens.

Or, inspired by your comment: run f for the tree, if false, search for the smallest subset where f is false and show the conflict dialog for that. The user chooses “mine” or “theirs”.

expede · on Dec 5, 2022

> I'm looking out for practical CRDT ideas that works well with richtext.

Have you seen Peritext from Ink & Switch? https://www.inkandswitch.com/peritext/ It's relatively new, but is a CRDT aimed at rich text!

jamil7 · on Dec 5, 2022

Their linked summary mentions Peritext and the current limitations.

Agree it’s exciting though and you can implement it fairly easily yourself by following the blog post as it’s so well written and explained.

truculent · on Dec 5, 2022

To what extent is this an issue with CRDTs and to what extent is it an issue with the data structure(s) used to represent the state?

dkjaudyeqooe · on Dec 5, 2022

I think the short answer is: if you want your CRDTs to enforce application constraints you need to bake them into the data structure.

truculent · on Dec 5, 2022

That makes sense. Thank you.

jdvh · on Dec 5, 2022

For the table situation we solve this by have a generic concept of a "card" that is rendered as a table cell when it's the grandchild of a table (table > tr > td) but otherwise it's just a standalone item, much like a markdown callout.

dgb23 · on Dec 5, 2022

Quite a big takeaway here:

> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.

The KISS and YAGNI principles apply.

And on another note, it seems like one big industry is often overlooked in these types of discussions. I wonder what web development could learn from online game development when it comes to "multiplayer" apps - I mean it's right there in the name.

One very pragmatic approach that strikes me as reasonable is having fixed delay changes. The game Warcraft 3 (a real time strategy game) did something like this. What it accomplishes is that changes are basically forced to be in sync by delaying them.

Note that you will get used to the delay very quickly, you won't notice it too much after a short while. It is _much_ more noticeable if the delays are variable, but a consistent delay gets basically filtered out after a short while.

munificent · on Dec 5, 2022

There is definitely a lot we can learn from multiplayer games, but the non-game use cases often have different constraints that lead to different solutions.

With a game, it's a given that everyone is playing in real-time, right now, with as little lag as they can get away with. The synchronization happens constantly and ideally the game's internal state is completely in sync across all players. Players want the game to be kept in sync as quickly as possible so that they can react to what other players are doing. That means the amount of synchronization to perform at any one point in time is generally fairly small.

Also, the structure of the game itself often implicitly partitions the game's mutable state among the players. Players can obviously do things that affect each other and there are mutable parts of the world that can be affected by multiple players. But each player often controls their own avatar in the game and has an implicit lock on that data.

This means that there is rarely any "merging" happening. It's usually just look at the events, determine an order to apply them, and do them one at a time. If the result is a little weird in terms of state (maybe a wall is built right when another player is crossing it and the game lets the teleport through), well, it's a game. The programmers can just declare by fiat that it's part of the game mechanics.

Outside of games, a common use case is "let me work on this document offline for the duration of a flight and then resync everything when I get to the airport". There may be hundreds of changes and other users may also have huge batches of changes. There can be a very complex reconciliation and megre process because the changes are many and the data is precious. You can't just say it's "part of the game" if your multi-user spreadsheet goofs someone's tax statement.

paulgb · on Dec 5, 2022

> I wonder what web development could learn from online game development when it comes to "multiplayer" apps - I mean it's right there in the name.

100%. The architecture we used for https://plane.dev takes a lot of inspiration from how game servers work, but using HTTP/WebSocket instead of UDP.

A lot of state sharing techniques that people are now using on the web (like cursor position smoothing and optimistic updates) have decades of precedent in multiplayer games.

We’ve also seen more apps start to use an approach to pixel streaming which more closely resembles stadia than traditional VNC. https://digest.browsertech.com/archive/browsertech-digest-wo...

JayStavis · on Dec 5, 2022

The intersection of state replication and pixel streaming is indeed interesting. Cloud games for example are usually using game state sync techniques, and then add in the client<>gpu latency on top of that.

A true cloud native game (sadly part of Stadia's original ambitions) would hopefully have a more e2e state management approach inclusive of the pixel latency.

This is very different from "local multiplayer" pixel streaming, which may be what companies like Switchboard[1] are doing: multiple clients control the same I/O channel (e.g. you and I share a mouse on a single remote desktop)

[1] https://www.switchboard.app/

Joeri · on Dec 5, 2022

but using HTTP/WebSocket instead of UDP

I think websockets run over UDP when hosted on http/3.

paulgb · on Dec 5, 2022

They do, but with HTTP/3 there’s also WebTransport, which is even better because it means we will be able to avoid head-of-line blocking.

wtatum · on Dec 5, 2022

This is an interesting idea but I do wonder if it can be applied to generic collaborative editing "as is". The approach for synchronized ticks (I think most RTS games do something similar to the Blizzard approach) does require each participant to "ack" the tick. You can see this when one participant is lagging in that the simulation freezes for all players until they acknowledge. The online gaming approach is to let other players vote to kick someone who is lagging the game but that probably wouldn't be considered suitable in something like enterprise content management.

Can some variation of the approach be taken without the strict requirement that: - There is a known set of participants at any time - The list of participants has a reasonable well-bounded size - Peers can communicate with each other directly (WebRTC is available but there are sharp corners) - It's acceptable to freeze all players based on one lagging player - Need an unambiguous way to handle simultaneous player actions against shared state

Most likely the "fix" for the friction is that you just slow the ticks down a lot compared to what you would do in a multi-player game.

wongarsu · on Dec 5, 2022

There are probably good techniques worth stealing from FPS or MMORPGs. Those also usually run on a fixed tick rate of about 50ms, but can't afford to have clients vote on it (FPS because lag is unacceptable, MMORPGs because of user count). There are various approaches in use, depending on whether you want to just degrade the experience of the lagging participant, or give preference based on action (e.g. FPS resolve conflicts in favor of the person who fired).

yazaddaruvala · on Dec 5, 2022

Honestly, I just restarting playing World of Warcraft, and even tho there have been amazing tech improvements to Blizzard's engine and servers, there is so much simple stuff lacking.

I would honestly say it's likely better for game server developers to be taking ideas from web development.

When it comes to CRDT's everyone keeps talking about text-editing, honestly one of the hardest problems for a CRDT to solve. Text is entirely unstructured, requires extremely high precision on the merge of any conflict.

With a video game instead, players already used to "rewind and replay" merges, they are already used to having a "miss" when it should have been a "hit", and they roll with it because while that is friction it's a game and precision isn't that important. I'll say differently, precision is already not what game servers provide.

At the end of the day, game servers are "just" low-latency chat applications with "bots" that process the messages. I'd love to see the game server written by the maintainers of WhatsApp / Messenger / Slack / etc. That would be a truly scalable game!

P.S. If you think "chat is so simple, can't compare it to a game server". Please just think through the design for FB's / Discord's display widget that shows "currently active friends".

smolder · on Dec 5, 2022

I think the general term should be multi-actor. I understand why people use the term multiplayer, I think: familiarity. But players imply play which implies games, so it's not a great fit outside of them.

The type of thing you're talking about with Warcraft is a deterministic simulation. As long as you have deterministic simulation, and you synchronize your inputs by buffering sufficiently for the input latency of all the remote participants, that ensures everything stays 1:1 between each local simulation.

aboodman · on Dec 5, 2022

This is a cool approach. It reminds me of statebox by mochimedia: https://github.com/mochi/statebox.

If I'm understanding correctly, it requires the mutations to be deterministic in order for the nodes to converge.

Replicache (replicache.dev - my thing) takes a similar approach except it does not requires the mutations to be deterministic, which is very useful because it enables the mutations to reach out to other services, be authenticated, etc.

Both the idea here and Replicache's approach are closely related to game networking. If you are interested in these ideas, a really excellent set of content is: https://www.gabrielgambetta.com/client-server-game-architect....

If you are interested in how Replicache's sync protocol works and achieves transactional changes with server authority, it is described here: https://doc.replicache.dev/concepts/how-it-works.

dxchester · on Dec 5, 2022

We have used Automerge a bunch, but found that there is a threshold where beyond a given document size, performance gets exponentially worse, until even trivial updates take many seconds' worth of CPU. That is often how it works when the document end state is exclusively the sum of all the edits that have ever happened.

Our answer was to reimplement the Automerge API with different mechanics underneath that allows for a "snapshots + recent changes" paradigm, instead of "the doc is the sum of all changes". That way performance doesn't have to degrade over time as changes accumulate.

Project is here: https://github.com/frameable/pigeon, with some benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks in the wiki...

josephg · on Dec 5, 2022

This is an implementation problem with automerge. I wrote a blog post last year about CRDT performance. I re-ran the benchmarks a couple months ago. Automerge has improved a lot since then, but a simple benchmark test (automerge-perf[1]) still takes 200MB of RAM using automerge-rs. Yjs and Diamond types can run the same benchmark in 4mb / 2mb of ram respectively.

I've had a chat with some of the automerge people about it. They're working on it, and I've shared the techniques I'm using in diamond types (and all the code). Its just an implementation bottleneck.

[1] https://github.com/automerge/automerge-perf/

dustingetz · on Dec 5, 2022

Remember, "conflict free" is the opposite of "transactional" (ACID). When you choose conflict-free, you're choosing to give up the ability to reject a change due to conflict and send it back to the user for resolution. This is the crux of the issue in my opinion.

aboodman · on Dec 5, 2022

I don't think this is true on both accounts:

* There is no such thing as "conflict free". When users edit data disconnected from each other, they can conflict, because the user intent of the edits can be different and not possible to reconcile. Nothing can fix that, not even crdts (As a thought experiment - two users are working on a report about ferrets. While offline, one decides the report should be about lemurs, while the other decides it should be about otters. The resulting set of changes will not be mergable without conflicts in any reasonable sense.)

* What CRDTs give you is actually convergence which is very important. But there are other ways to get convergence, and it can be done transactionallym as long as your are willing to weaken the D (durability). Replicache (my thing - replicache.dev) is one way, the way described here is another.

In general game networking is also transactional in this same sense. Operations are transactions, that can affect multiple data items and enforce arbitrary invariants. But the order they apply in drifts, and so durability is sacrificed.

Stepping back even further, CRDTs are also often transactional in the same way. Each operation in an op-based crdt can be thought of a transaction that applies completely or not at all. It's just that the set of available operations in a CRDT is typically fixed.

dustingetz · on Dec 5, 2022

Replicache is cool

1. So by "weaken durability" you mean basically allow committed changes to be "lost" in the case of a common conflict, aka give it up entirely?

2. CRDTs are meant to converge even in the presence of network partitions (offline first), so you're saying your committed transactions can be uncommitted? Or you're saying writers have no idea if their transaction is committed because it might converge the other way after arbitrary delay?

"Durability" means "transactions that have committed will survive permanently" – wikipedia

aboodman · on Dec 5, 2022

Thanks :)

> So by "weaken durability" you mean basically allow committed changes to be "lost" in the case of a common conflict, aka give it up entirely?

I should have said "weaken ordering".

I mean that the transaction will run, but its order with respect to other concurrently running transactions may be different in the end than the submitting client initially thought. As an extreme case, you submitted the transaction and your local client thought you got the concert tickets, but by the time it hits the server, the last ticket was already sold. Transaction still runs: it just runs first optimistically on the client, and then later authoritatively on the server with a potentially different result. Clients converge to serve view of the world and application invariants are maintained regardless (number of sold tickets doesn't go negative).

Or just read Jepsen's take: https://doc.replicache.dev/concepts/consistency

In the case of concert tickets, this design doesn't make a lot of sense, but in lots of other cases -- in particular document editing applications -- it makes tons of sense. The chance of concurrent edits is low, but it is critical to maintain application-defined invariants when conflicts do occur.

> CRDTs are meant to converge even in the presence of network partitions (offline first)

Sure. Replicache converges in the presence of network partitions, as does OP's design, as do most games, distributed databases, etc. What CRDTs uniquely provide is the ability to converge *without a leader* -- in a p2p environment. As OP observes, this is not a property that most applications today need. Because they have an obvious leader: the server.

> so you're saying your committed transactions can be uncommitted? Or you're saying writers have no idea if their transaction is committed because it might converge the other way after arbitrary delay?

They run optimistically (speculatively) on the client, and then later authoritatively on the server. Application can render the optimistic result, and then later the authoritative result when it is known.

This is exactly what CRDTs do. The only difference is that CRDTs can do it without a leader, but the Replicache (and OP's) design offers additional flexibility -- ability to define your own operation/transactions.

===

We need new terminology for distributed systems. Terms invented in the 70s aren't going to make sense. I believe that "transactions" in the common parlance mean "a set of code that runs together or not at all, isolated from other concurrently running transactions". This is entirely possible to provide with convergence if we are willing to weaken ordering.

dustingetz · on Dec 5, 2022

i accept we need new terminology 100%

i accept we can weaken ordering, 100%

under the model you propose, how does a writer know if their transaction has been committed (by the authority) and what is the upper bound on how long they will need to wait to know the final result? and is there an operation for the writer to wait for it?

aboodman · on Dec 5, 2022

Each client sends a sequence of mutations to the server, each identified by a mutationID. During downstream sync, server says up to which mutation it has processed.

In Replicache, you can known when a mutation is finalized with the `onSync` API, or more ergonomically, with the upcoming `pendingMutation` API.

https://doc.replicache.dev/api/classes/Replicache#onsync https://doc.replicache.dev/api/classes/Replicache#experiment...

PS- there actually is terminology for this already. Replicache is technically a Causal+ Consistent transactional system: https://jepsen.io/consistency/models/causal

JayStavis · on Dec 5, 2022

Agreed - conflict resolution and authority handling are key to advanced game networking techniques and underpin the architecture of many big multiplayer games.

matlin · on Dec 5, 2022

There's actually an approach that sits in between this and formal CRDTs.

I had the same insight that having a total ordered log of changes solves a lot of the issues with concurrent updates. We solved this by creating one sequence CRDT that ensures all events from clients eventually end up in the same order and then our data structures are just reductions over those events. It's a little bit like a distributed, synchronized Redux store. This let us avoid having to think in terms of CRDT types (LWW registers, Grow-only sets, sequences, etc) and just think in terms of the domain-specific business logic (e.g. "what should happen when someone marks a todo as done, after it's deleted?").

There's a little but more to so if this is interesting to anyone, I can write a more detailed explanation.

derefr · on Dec 5, 2022

Have you considered backing your application's state-synchronization with a centralized CQRS/ES event store (e.g. https://www.eventstore.com/)? You get the same semantics, without paying the overhead of building them on top of a CRDT abstraction; and the event store itself also "knows" (in the sense that it supplies the framework and you supply the logic) how to reduce over states in order to build snapshot states ("aggregates") for clients to download for fast initial synchronization.

matlin · on Dec 5, 2022

We basically have that as well! We use CRDTs on the client to allow for optimistic updates and fast replication of events directly to other clients but we also do exactly what you describe on the server so that a user loading the data for the first time just gets the "snapshot" and then plays events on top of it.

LAC-Tech · on Dec 5, 2022

Isn't that essentially doing a CRDT, but the replica is the whole datastore?

Ties in nicely with event sourcing - the ordered sequence of events is the CRDT, and sending events between nodes are the delta states.

lifeisstillgood · on Dec 5, 2022

How do you solve the ordering? Timestamps can be wildly out of whack? Be interested to hear more

matlin · on Dec 5, 2022

We use logical timestamps instead of datetime.

So each "event" has three properties that allow us to resolve to a sensible order across clients: session (integer), device_id (uuid), order (integer). The `session` number is set by the highest `order` that your device has seen from the server and the `order` gets incremented on each new event.

So an example you might have a sequence of events like this. [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6

We can then sort all events by (session, device_id, order) and we'll get any events that happen on a device to be sorted in a row even if some other device created a bunch of concurrent events at the same time.

Karrot_Kream · on Dec 5, 2022

What about situations where two different clients both begin to publish new events from the same starting point? Those events can't be absolutely ordered right? If you're thinking of Lamport-style happens-before relations, then you can't enforce total ordering of those events. Do you just arbitrarily mark one of those client event streams as failed and force the client to absorb new state and try again?

lifeisstillgood · on Dec 6, 2022

is there any reading around this - i just am not sure compsci papers cover this?

Karrot_Kream · on Dec 6, 2022

They do, you're looking for reading around distributed systems topics. Lamport timestamps, a type of logical timestamp, are explained on Wikipedia [1]. You can use these to implement Vector Clocks [2]. I personally learned this by reading papers in undergrad and grad school as I researched in distributed systems, but I hear good things about Steen and Tanenbaum's Distributed Systems [3].

[1]: https://en.wikipedia.org/wiki/Lamport_timestamp

[2]: https://en.wikipedia.org/wiki/Vector_clock

[3]: https://www.distributed-systems.net/index.php/books/ds3/

maclockard · on Dec 5, 2022

I actually talked to the drifting in space team about our experience implementing multiplayer. Really cool to see their findings!

Our team ended up having a similar take away, we wrote a blog post detailing our approach and experience here: https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...

For the most part is been surprisingly solid! I was very much expecting to need to completely rewrite it by now, but its met most of our present needs.

The only real shortcoming is not being able to support line specific commenting. While we _could_ support it, any minor update to the text would result in the comment being marked 'stale' with a high false positive rate. I've considered adopting CRDT/OT just for the text editing portion to solve this.

hardwaregeek · on Dec 5, 2022

I really loved the Signals and Thread episode[0] on state machine replication. It basically said "you could do all of these fancy distributed systems techniques that assume a non total ordering of events, ooooorr you could have a main server that determines a total ordering for events and make your life infinitely easier"

[0]: https://signalsandthreads.com/state-machine-replication-and-...

Joeri · on Dec 5, 2022

It breaks down as soon as you have an offline scenario. You can only assume the server is authorative if there will not be network partitions or if writes are not accepted on the other side of the network partition (meaning on the client). As soon as the client accepts writes while offline those writes become authorative and will need to be merged with whatever changes the server saw.

layer8 · on Dec 5, 2022

It also breaks down when you have a passive/non-transactional server, like an app syncing its state via files on Dropbox.

derefr · on Dec 5, 2022

Or even just have a simple central presence server hand out randomized-per-epoch node-IDs; have all nodes in the system use node-IDs as the prefix of the event temporal ordering key per "tick"; and trust your nodes to self-report their node-ID correctly in all events they send. Then your data protocol can still be p2p rather than all events having to flow through the server (if that benefits you in your problem domain) while getting all the same advantages of total ordering.

And this works great for anything enterprise-y, where all the nodes are operated by one party or a trusted consortium of parties, and no nodes are are intentionally trying to cheat by misreporting their assigned node-IDs. You only need central servers (or virtual central servers, a.k.a. blockchains) when you get to things like MMO games, or HFT trading bots talking to an exchange, where every client has a reason to want to pretend it's the one that "should" go earliest in the per-tick transaction ordering.

lifeisstillgood · on Dec 5, 2022

But if I have a conflict (ie one node changes a column datatype, another adds data to column) how do i know which one went first inside the same tick?

If I understand you, there will be a tick between 12.01 and 12.02, and node A gets token 1234, and node B gets 5678. Events are published p2p, and the ordering is 5678_+9seconds and 1234_+12seconds - node A (1234) admits it has done the action 3 seconds after B. But this still depends on synchronised clocks I think?

Also you still need to resolve each type of conflict (hence having CRDTs really helps) - we know there is a conflict between A and B changes, but how do we resolve them?

Maybe I dont get it :-)

meheleventyone · on Dec 5, 2022

Yeah using an event stream like this requires synchronising the clock of participants. It’s similar to rollback networking in games. Peers can recover from receiving late updates but if all the updates are late from one peer the situation still degrades. To avoid that participants run time per tick slightly shorter or longer to keep within a desired window.

derefr · on Dec 6, 2022

In most systems, you can loosen your definition of "tick" pretty far—often to the point that NTP is enough. A few milliseconds here and there aren't going to matter.

Note that I'm not talking about resolving things like individual writes in a collaborative text document here (the classic CRDT use-case) where it really matters that things resolve in the order they "really happened"; but more like resolving things like the ordering of tweets in a Twitter timeline, where any assigned ordering is good, as long as it's stable (and it doesn't even really matter per se if one node's clock is running so far ahead that it's participating in the next tick rather than the current tick; those events are still valid, and will sit around until they're dealt with on the next tick!)

When you assign nodes IDs, nodes can then independently generate cluster-unique identifiers (https://en.wikipedia.org/wiki/Snowflake_ID) that can be collated deterministically + collisionlessly against identifiers generated by other nodes; and so, whether nodes are gossiping events tagged with snowflake-IDs to one another, or are all stuffing events tagged with snowflake-IDs into a distributed data store / message queue / etc, you'll get a stable sort, and so a linear event stream.

You might not think this model has much of a use case in critical systems where the events actually get folded over to pump a state machine... but have you ever thought about how ACH central reconciliation works? ;)

> Also you still need to resolve each type of conflict (hence having CRDTs really helps) - we know there is a conflict between A and B changes, but how do we resolve them?

Mostly, this approach is used with systems that don't have conflicts. Tweets don't collide, etc.

In systems where the events might represent state-transitions that are only valid if they occur in a certain order (create X, update X, delete X), you might have your aggregation logic buffer events that seem to occur in an order that gives invalid preconditions, for up to N ticks, to see if you find an out-of-order event that would satisfy its preconditions. (I.e. buffer "delete UUID X" until you find "create UUID X", then treat it as if the event stream had the create before the delete.)

But for true conflicts between events of the same type from different writers (like figuring out in ACH which merchant-bank gets money from an account and which merchant-bank gets an insufficient-funds result), it's "first writer (per tick) wins." That's why such systems only work under mutual trust: the participants are agreeing that some nodes have permanent (or at least epochal) override priority over other nodes. If the nodes are all cooperating, that doesn't matter. (If you think about it, leader-election consensus is just this same type of system, but where the node with assigned node-ID/priority 0 for the epoch does most of the talking!)

pookha · on Dec 5, 2022

I thought James Long's 200 or three hundred lines of code to implement a real-world CRDT solution (Actual) was pretty cool. Seemed to solve a lot of issues revolving around heavy duty servers.

preseinger · on Dec 6, 2022

Works great if you can afford the latency RTT to that main server. That's ~150ms in general.

LAC-Tech · on Dec 5, 2022

First off, well done on examining your problem domain, realising that your operations are not in-fact commutative, and realising you need to discard your model. A lot of people would have just decided it was 'agile' to 'iterate' over there fundamentally broken solution until they had coded themselves in a corner, but I digress.

I would note that your path 2 is essentially another kind of CRDT though. Instead of your replica being in individual document, your replica is now the entire DB, and your changes are a grow only sequence. (Wish I could remember how pointed this out to me on HN, my mind was blown).

nobu-mori · on Dec 5, 2022

CRDTs have been on HN a lot recently. I'm working on a database that deals in events rather than raw data. Application developers specify event handlers in JavaScript. The database layer then takes the event handlers and distributes them to the clients. Clients receive the raw stream of events and reconcile the final data state independently. The key aspect is all the event handlers are reversible. This allows clients to insert locally generated events immediately. If any remote events are received out-of-order, the client can undo events, insert the new events, and reapply everything on top of the new state.

I'm curious how many people have a need for solving the real-time multi-player use case. The database I'm working on was inspired by the networking code used in fighting games (rollback netcode), but I'm always curious to learn about additional use cases.

josephg · on Dec 5, 2022

I want most software on my computer to be transparently collaboratively editable. Like, I want to open a draft blog post on my desktop, go out, open my phone, keep editing the post, share it with susan, then get home and see her proposed changes and merge them like I would in a code editor.

And I want to be able to do that same workflow when editing video. Or when drawing. Or recording music. Or when configuring a server on AWS.

nobu-mori · on Dec 6, 2022

I agree. Everything should be collaborative. Feel free to reach out if you're interested in collaborating (email in my HN profile). I'm trying to get a v0 of this database out now, and building desirable use-cases is super important.

josephg · on Dec 6, 2022

I'm working on a CRDT to solve this problem too[1]. How do you plan on implementing collaborative text editing on top of your event-reordering system? Off the top of my head I can't think of a way to implement text on your proposed system which would be performant and simple.

[1] https://github.com/josephg/diamond-types

nobu-mori · on Dec 6, 2022

The basic prototype I have right now works like this:

The data model has this type in TypeScript:

    {
        text: string;
        cursors: {
            [userId]: {
                offset: number;
            }
        };
    }

Basically, the data model is a string of text and a set of cursors, one for each user. The cursors contain an offset into the text where each user is editing.

When a user connects, the `user-connected` event is handled by setting the cursor position for that user to offset 0. When a user presses a key, an `insert-text` event is sent. The event handler for `insert-text` looks at the user's cursor position and inserts the text at that position and increases the user's cursor. Then, the event handler looks at all the cursors. Each cursor that has an offset greater than the location of the inserted text gets incremented by the length of the inserted text. Removal works the same way, but it removes text and decrements cursor offsets instead. When a user disconnects, the `user-disconnected` event handler removes the cursor to keep the total data size bounded.

As an example, let's say the text was "cat boy" and User A has cursor offset 0 and User B has cursor offset 4. Imagine User A inserts "dog" just a moment before User B inserts "girl", e.g. User B sends `insert-text: "girl"`. The first time this event gets applied, "girl" gets inserted at offset 4. When User A's insertion of "dog" arrives, the insertion of "girl" gets undone and then "dog" is inserted. After "dog" is inserted, User B's cursor is now at offset 7. So when `insert-text: "girl"` is re-applied, "girl" gets inserted at offset 7 and the final text is the same.

One thing I like about this system is that it's very simple. My prototype is like a collaborative notepad. The logic for state updates is only 100 lines long, including a lot of boiler plate for the event handlers (each is an interface implementation).

toomim · on Dec 6, 2022

What happens if User B moves his cursor to position 0 at the same time that User A types at position 0?

Depending on the order that those events arrive, and how ordering is disambiguated, it's possible that different peers could see User B's cursor at different positions.

I believe that the system you're describing is akin to an OT system, but where the "operations" to transform are the "cursor move" actions that occur simultaneously with insertions and deletions. Most OT systems, on the other hand, are trying to transform inserts and deletes against one another.

Edit: Yes, in fact this is equivalent to OT if you replace every OT insert or delete with a tuple [move cursor, ins/del] in your system.

nobu-mori · on Dec 6, 2022

The case where User B navigates to cursor offset 0 as User A inserts at position 0 is actually easy because User B's cursor can just stay at 0. A harder case would be where User B moves to cursor offset X while User A removes text between (Y, Z) where 0 < Y < Z < X. How that situation gets handled is up to the application developer.

I'm not familiar with OT specifically, but the text editor I'm working on is just a proof-of-concept for the database. The database I'm building is designed to make it very easy to program very complicated multi-player scenarios, although it requires a little work to think of conflict resolution.

no_wizard · on Dec 5, 2022

I think the most common outside of gaming is collaboration software. Think whiteboards, docs (e.g. Google Docs) and other collaborative tools.

cadbox1 · on Dec 6, 2022

I came to a similar conclusion about CRDTs after reading Figma's article.

I ended up implementing state machine replication using Cloudflare's Websockets and Redux and found the result to work quite well.

Redux actions seemed ideal for this problem because you can simply share the actions between peers. I couldn't find many redux-based solutions out there but it worked well for my use case.

https://emojis.cadell.dev/

firtoz · on Dec 5, 2022

If I want to have an app that needs to eventually persist user state to a server but still work well when the user is offline, e.g. a personal vocabulary app that can work with multiple devices but also can work when the user is offline, so the user can still define new words manually on both devices, and then when the phone is back online, the state still makes sense in both, should I use CRDT?

eternalban · on Dec 5, 2022

A definitive no is the answer. This complexity is necessary only if others care about your shared state aka vocabulary, and, multiple write ops are being done on same records at the same relative time on different devices. For example, a document editor could use CRDTs without looking silly. A CRDT is an eventually consistent distributed object but all its complexity is there to handle concurrent modifications.

In your application, the consistency requirements are pretty weak and easy to meet with a basic client/server architecture with a well defined sync point set on its life-cycle. For example, sync on boot, sync on net-reconnect, etc. Or, you could use a simple P2P protocol so on boot you discover your other connected devices and your apps can shake hand and exchange notes to sync up as they modify their local state: “Hey gang, + “KISS”. “Keep it simple ..”. Use json if you must.

Further simple characteristics of this app is that the state ops on your dictionary are commutative (like a CRDT!). On one device you add word “Foo” and on the other you add “Bar”. You do not care (do you?) that you added them in a certain order, so you don’t even have to bother with what is rather difficult to do in a performant manner: maintain a total ordering over the state of the system. (Think sharded ordered-sets..)

fwip · on Dec 5, 2022

There's a few edge cases that you'd probably need to handle (like adding the same word with two different definitions), but probably few enough that you could do them ad-hoc and punt to the user. ("Hey, merging these two sets failed, here's the conflict, which do you want to keep?")

Something fancy like Operational Transforms or CRDT might make your life easier if you find a library that's ergonomic for your use case, but it's definitely not necessary.

jamil7 · on Dec 5, 2022

The article talks about this, in the case where your clients are syncing with a centralised server you don't necessarily need to use CRDTs, and it might be simpler not to.

dinosaurdynasty · on Dec 5, 2022

Basic CRDTs are still useful (idempotency, associativity, and commutativity are powerful mental tools for all distributed systems, even OT) but you can massively simplify the CRDTs if you assume a super-peer (aka centralized server all updates go through).

Example: last-write-wins register, which is a (v, t) pair, normally has a merge function of "return the pair with the greatest t (with deterministic tie breaking for same t)", and for a super peer just have v's and "the latest value to hit the server wins".

preseinger · on Dec 6, 2022

If you have a centralized server there is no reason to use CRDTs, is there?

dinosaurdynasty · on Dec 6, 2022

It's still very useful mental tool, since a centralized server + clients is still a distributed system.

But yeah, you shouldn't use the really complex CRDTs in that case. (And in most cases to be honest.)

layer8 · on Dec 5, 2022

This doesn’t address how changes that were committed offline are merged on the central server once the clients are online again.

jamil7 · on Dec 6, 2022

No but since they’re all getting merged in one place you don’t really need a CRDT for that although you could. That was my point.

syrusakbary · on Dec 5, 2022

> CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be. There is some unavoidable performance and memory overhead with doing this. Since Figma is centralized (our server is the central authority), we can simplify our system by removing this extra overhead and benefit from a faster and leaner implementation

I think this is on point. It's also super refreshing to see the work on Aper [1] [2] (a Rust library implementing state machine synchronization across a trusted network). Looking forward next series of articles here!

[1]: https://aper.dev/

[2]: https://github.com/drifting-in-space/aper

fwip · on Dec 5, 2022

Just to clarify, Aper also uses a client/server model, correct? It appears as though it uses a central server to totally order incoming change requests.

paulgb · on Dec 5, 2022

Yes, it does.

paulgb · on Dec 5, 2022

Thanks, Syrus!

satvikpendem · on Dec 5, 2022

Does anyone have any resources on making an offline-first app from scratch (I don't want to use stuff like Firebase and more importantly want to learn how it's done)? Like a task manager for example that can sync with an online account but still works completely offline.

LAC-Tech · on Dec 5, 2022

I swear I've looked into firebase multiple times and I have no idea how they actually detect and handle conflicts. They just hand-wave that it works offline... not that comforting.

Regardless, there's no one size fits all for offline. Approaches that work for append only data don't work for data where multiple nodes are doing destructive updates. And even if you're doing that, some data is amenable to deterministic merging (ie, CRDTs) and some you need a user to step in to decide (revision based multi-master systems, prior art being Pouch/Couch or indeed git).

Basically if you can tell me what exactly a taskmanager is and how it might be updated I might be able to say something more helfpul!

josephg · on Dec 5, 2022

> I swear I've looked into firebase multiple times and I have no idea how they actually detect and handle conflicts. They just hand-wave that it works offline... not that comforting.

My understanding is (and please correct me if I'm wrong), firebase just does ACID by versioning everything. A write says "Update this value (at server version X) to value Y". The server can detect if version X is outdated, and use that to reject the conflicting write.

So if you go offline and change something, and someone else changes the same value, when you come online and share your change it'll be rejected by the server. The client can optionally be intelligent in this case, and merge the conflicting values locally then re-submit if it wants to. Thats enough to implement OT on top of firebase, but the architecture means clients can't sync directly with each other. And its a bit janky if users are offline for too long, or if the client isn't written to handle retry / conflicts in a sensible way.

josephg · on Dec 5, 2022

This won't help much right now, but I'm working on making a little company to solve this at the moment, with an open source core.

We'll have more to show in a few months.

wim · on Dec 5, 2022

For certain data structures the "pure" CRDT approach can get tricky. In our case, we're building a collaborative notes/task IDE [1] which is a hybrid of a text editor and an outliner at the same time. So you can perform all the usual text editor operations on a document as if it was text, but the underlying data structure is a tree (or graph really).

So just like the example in article we use a hybrid approach and global order for certain (tree) operations to enforce invariants on the data structure when syncing. That still works with offline-mode and end-to-end encryption, but as we don't have the requirement to get rid of a central server we can get away with this model.

[1] https://thymer.com

preseinger · on Dec 6, 2022

> CRDTs differ from simple leader/follower replication in that they do not designate an authoritative source-of-truth that all writes must pass through. Instead, all replicas are treated as equal peers. Each replica may be modified locally, and changes to any replica are propagated to the others.

This isn't wrong but it is a little misleading. CRDTs don't have any concept of peers at all. CRDTs are data structures with operations defined in terms of equivalent data structures. Peers in networks are incidental.

> In contrast, browsers are inherently not peer-to-peer. To run an application from the web, you connect to a server. Since the server is centralized anyway, we can have it enforce a global ordering over the events. That is, every replica receives events in the same orer. With this, we can sidestep the need to pay the CRDT complexity tax.

A server is not necessarily centralized. For example, a browser that loads e.g. www.bbc.co.uk will receive content from a POP relatively local to the client's phyiscal location. The website delivered to a browser IP in the UK is conceptually the same thing as the website devliered to a browser IP in Indonesia, but the server serving the bytes is different. Does this distinction matter? Sometimes yes, sometimes no.

mathgladiator · on Dec 5, 2022

Another reason to not need a CRDT is privacy. A CRDT assumes that every participate has equal rights to all data.

I'm biased to simple WebSockets with OT using JSON deltas, and I'm building a platform that greatly simplifies how to efficiently build collaborative apps: https://www.adama-platform.com/

charles_f · on Dec 5, 2022

That sounds like a weak counter argument to me, 1) crdt is focusing on the conflict resolution part, access control is not in scope so you need to implement on top. 2) if you are already implementing access control, then filtering out what people don't have access to doesn't seem much more complexity. If you go the full client-side way, you can also add a cryptographic layer to whatever needs hiding 3) one thing I don't see in your solution is the offline aspect to it. With a central authority in the middle and online connectivity, conflicts become unlikely and much smaller, but with offline support, documents et can evolve in drastic ways where having a formalized strategy like what crdt offers makes sense

CGamesPlay · on Dec 5, 2022

These are orthogonal to one another. For read-protection, encryption is a viable solution: specific paths into your data structure (e.g. certain keys in a map) can be encrypted with a key that isn't provided to all participants. For write-protection, the solution is the same as with any syncing service: when receiving an update, if you don't authorize the sender of the update, you don't accept it.

pattrn · on Dec 5, 2022

Is this really the case? While CRDT's are designed to work peer-to-peer, they don't need to be fully connected to all clients. Forcing the synchronization through controlled nodes (a server or cluster) allows adding read/write permissions. Depending on the use case, it may require additional logic for reversing operations before propagating to other clients, or in some cases forcing a client to revert to a snapshot (this can be a bit complex). That's an approach I've used in the past.

Have I overlooked something (highly likely)?

mathgladiator · on Dec 6, 2022

P2P generally means all clients can read all the data. Even if some data can be encrypted, it can then be deleted via a peer. Admittedly, I am conflating privacy and access control, and core to my point is that CRDTs are limited in many domains.

preseinger · on Dec 6, 2022

> A CRDT assumes that every participate has equal rights to all data.

This is not true. CRDTs assert properties related to consistency and availability, they don't assert anything related to authentication or authorization.

samwillis · on Dec 5, 2022

> A CRDT assumes that every participate has equal rights to all data.

Within a single data structure yes, so within a "room" or "document". If you need to partition right to the data (read/update) then you should use separate structures for each area.

Yjs implements this as as "sub documents".

pookha · on Dec 5, 2022

CRDT's can be encrypted end-to-end. Privacy seems to be one of the selling points.

preseinger · on Dec 6, 2022

Privacy and encryption are entirely orthogonal to CRDTs vs. alternatives.

DelightOne · on Dec 5, 2022

Small question: How do databases fit into using CRDTs? Are CRDTs normalized, storable inside the database somehow, allowing queries on them? Or do database completely fall on the wayside? This confuses me.

spiffytech · on Dec 6, 2022

A CRDT's opaque blobs have to be treated as the source of truth for your data, because that's the only way they can resolve edits from multiple sources.

You could store an extracted version of the same data in your DB, alongside the blob, but it would have to be a read-only copy, with all edits routed through your CRDT library.

> Are CRDTs normalized

There are many kinds of CRDTs, supporting everything from single strings all the way up through arbitrary data structures.

You could model data inside your CRDT as either normalized or or denormalized. For some use cases, keeping a separate CRDT per record could work fine. But if you had e.g., a re-orderable todo list you'd want each todo to be part of a larger CRDT so that if two devices reorder records they both arrive at a single convergent ordering.

> do database completely fall on the wayside?

Not at all - DBs solve a lot of problems that CRDTs don't really touch. Robust data persistence, ACID, enforcing data rules, issuing arbitrary queries, etc.

CRDTs get multiple nodes to agree on a single view of the world, and many applications suitable for CRDTs would want to use them in conjunction with a database.

preseinger · on Dec 6, 2022

A database asserts a single, canonical, authoritative definition of state in a given domain. CRDTs assert multiple, concurrent, equivalent definitions of state in a given domain. In short, the semantics you get from a normal DB are stronger than the semantics you get from a normal CRDT, and the mapping between them is not one-to-one.

eatonphil · on Dec 5, 2022

Great breakdown of CRDTs vs replicated state machines (like Raft or VSR) and the difference between eventually convergent and globally orderable workloads, thanks for this Paul!

robertlagrant · on Dec 5, 2022

Good shoutout to Automerge in this.

anonymousDan · on Dec 5, 2022

Am I missing something? I'm all for SMR when possible but the whole point of moving to something with weaker consistency is that you don't want to rely on availability of a central server (or quorum of servers in the case of SMR)?

Sytten · on Dec 5, 2022

Unless I am wrong, if your server uses end-to-end encryption you are still better off with CRDT since the server cannot merge the state. So there is a caveat to be said about this claim that you only need CRDT for peer-to-peer scenarios.

paulgb · on Dec 5, 2022

That's a good point, but the server can still enforce an ordering of encrypted messages it can't read. You just don't get the benefit of the server being able to detect invariant violations in this case; all change operations would be sent to all clients and the client state machine would throw away the ones that created conflicts.

supermatt · on Dec 5, 2022

> Conflict-free replicated datasets (CRDTs)

Conflict-free Replicated Data Types, surely?

paulgb · on Dec 5, 2022

Oof, in the first sentence, too, that's embarrassing. Fixed, thanks!

supermatt · on Dec 5, 2022

No problem :)

acrefoot · on Dec 5, 2022

Whatever Clickup uses for their Clickup docs, it reliably gets into a confused state and data is lost, during which not all editors show the same state locally.

topicseed · on Dec 6, 2022

Is OT no longer loved for multiplayer?

rubyist5eva · on Dec 5, 2022

"You might not (probably need) need $FANCY_TRENDY_THING"