Making CRDTs Byzantine Fault Tolerant [pdf]

almog · on March 4, 2022

I recently finished a 2nd reading of DDIA and while listening to a podcast featuring an interview with Martin Kleppmann, I got so thrilled when he mentioned that he plans to release another book in the coming years.

DDIA is so good that I feel the same kind of anticipation I have as when waiting for a next in a series fantasy book to take me back to a parallel world with characters I've come to love and miss like one misses a good friend they haven't seen for a while.

autoconfig · on March 4, 2022

DDIA = Designing Data Intensive Applications

rnikander · on March 5, 2022

Does this book give ways to deal with the following: a mobile app that works offline (local replica), and then when it's online, syncs (through a server) with other devices' local replicas? I don't even know what to call this situation. I'm sure CRDTs could help with this.

weitzj · on March 5, 2022

I guess the book is more of a very well structured fast path (compared to learning from blogs) to give you a fundamental understanding of storing, transferring data. You won’t find the exact answer to your question, but reading this book enlightens you to read other things and helps you better to understand what solution to use in what case.

almog · on March 6, 2022

I agree, that it's more of a taxonomy and reasoning guide for distributed systems, but the situation in question, regarding long replication lag for offline devices is introduced as an example (through a calendar app) when discussing multi-leader replication.

shanxS · on March 4, 2022

Which podcast?

almog · on March 4, 2022

Martin mentioned it during the last few minutes of Coding Over Cocktails as one reason why you might want to support his Patreon page:

https://www.torocloud.com/podcast/designing-data-intensive-a...

https://www.patreon.com/martinkl

almog · on March 5, 2022

Exact quote (via transcript):

"And finally, if you're interested in supporting this kind of thing financially, I did set up a Patreon account with the goal of trying to turn this into a potential career of an independent researcher, not tied to any institution necessarily, but just being able to continue doing the research and the teaching the work that I do, perhaps writing books, and the second edition of my current book is potentially in the works."

adamnemecek · on March 5, 2022

Did he say anything on the topic of the book?

almog · on March 5, 2022

Not really, in his Patreon page he mentions that one of his plans is to "Write another book to complement Designing Data-Intensive Applications"

SomeHacker44 · on March 5, 2022

Designing Compute-intensive Applications?

almog · on March 6, 2022

Reminds me of "These two books contain the sum total of all human knowledge": https://imgur.com/wf53LHK

davidrusu · on March 5, 2022

Really nice to see BFT starting to be taken seriously in CRDT research. I had done some research in this area last year and came to a lot of the same solutions (i.e. BRB protected CRDTs when dealing with VClock based CRDTs):

https://github.com/davidrusu/bft-crdts

We ended up moving away from VClock crdts entirely for our work and going with grow-only hash-graph CRDTs as they have don't need the BRB overhead (as Martin has found in his research as well).

hecturchi · on March 5, 2022

The idea of DAG-embedded CRDTs is far from new and was introduced here:

https://arxiv.org/abs/2004.00107 (I'm among the authors)

Unfortunately, the verification that the author proposes (not accepting new updates until the dag below is verified) will need a lot of caveats for real world usage.

Currently we use these CRDTs for a key value database of 40M+ keys in a deployment of ipfs-cluster, which uses https://github.com/ipfs/go-ds-crdt .

martinkl · on March 5, 2022

Martin here. I cited your paper in the related work section. It's a good start, but it does not cover everything that's required to achieve BFT — in particular, the issue that an operation may be valid in some contexts and invalid in others, and all correct nodes need to agree on whether an operation is valid or not.

If you have more details on the caveats you have discovered, it would be great if you could write them up so that others can learn from your experience!

hecturchi · on March 5, 2022

Hi Martin,

I now had time to read more closely. Thank you for citing us!

In our paper and original approach/algo (iirc), CRDT updates are applied from the bottom of the dag to the heads (older to recent). This is also what you propose although we don't discuss how this helps wrt BFT. I think it is a very good point to highlight that hash-graph based CRDTs (or Merkle-CRDTs as well call them) can provide this property when processed "in order".

The caveat is that, in practice we found this quite impractical: when receiving an update with unknown descendants, you'd have to choose whether to hold on it to apply it at a later point when the missing sub-dag is received (which may never happen), or to drop it and rely on re-transmission at a later point. The first opens the door to abuse by a bad actor because you spend memory holding on to un-applied updates. The latter causes additional work as updates could pile up over an missing one, thus incurring re-transmission costs for the missing and all the later elements.

This also means that peers pubsub systems should probably decide whether to re-broadcast an update based on whether is valid or not per the CRDT operation it contains, which can have bad side-effects: a network missing an update due to a network issue may block the broadcasting of later any later updates even if they are correct, thus worsening the delivery for everyone and forcing the network to issue more re-transmissions.

And as a result, a peer should also decide whether to issue/broadcast updates based on whether the previous update has at least been received by a non Byzantine replica, as otherwise updates built on top will not be accepted by the network until re-transmission. That makes another potential bottleneck.

In our implementation, we found more practical to process updates immediately as they are received, and then process descendants until we reach the "known" part of the DAG. This means every update can be broadcast, gossiped around, and will be potentially applied without doing any waiting for parents, and occasional loss of an update does not block the processing of new updates before re-transmission. If an update has descendants that do not exist then it can be considered a DAG with a different "root" and does not have many consequences in terms of convergence . Note that here we are talking about DAGs with depth of 100k+ items, where sometimes there are 200 heads and processing every update may take a few seconds. We need to avoid blocking a replica as much as possible and get as much work done as possible asap.

I think some CRDTs can get away with this (in our case, CRDT-add-only-sets, using the hashes as IDs) and be byzantine-failure resistant (things would converge). In the paper you mention some examples where CRDT-rules can be abused more easily, so I'm guessing it is more difficult other CRDT types. Ensuring that IDs are unique is one of the main advantages of using hash trees.

In general, I think an attacker can usually find ways to screw up with a non-permissioned CRDT system without breaking convergence (i.e by submitting many valid updates). Your approach makes however a very good point wrt to misbehaving nodes which are not necessarily malicious.

archagon · on March 5, 2022

gritzko (always on the vanguard!) has also been doing this kind of stuff in RON for a while, though I haven't looked into it deeply enough to determine how much it overlaps with Kleppmann's approach: https://replicated.cc

JonChesterfield · on March 5, 2022

Protocol looks computationally expensive to me. I think the dag size could get out of hand if an offline but otherwise correct node comes back online.

What caveats do you have in mind?

hecturchi · on March 5, 2022

DAG size gets out of hand when many operations happen (akin to how a blockchain grows indefinitely).

Nodes could however rely on some form of "consensus" to compact/reset the graph at intervals. But then you have invited a consensus into the party which is the opposite of what CRDTs want to have.

narush · on March 4, 2022

Woah. This paper looks freaking awesome - and surprising! “The proposed scheme can tolerate any num- ber of Byzantine nodes (making it immune to Sybil attacks)” - this is really intriguing because of the strong impossibility results in consensus algorithms around how many faulty nodes can be tolerated. I’m looking forward to reading more than just the abstract tho (confession).

Martin - thanks for all your cool work!

noodles_nomore · on March 5, 2022

Consensus is only necessary when one needs to decide between conflicting choices, and CRDTs are by definition conflict-free. Nodes only need to verify that any update to the CRDT is in fact a legal operation on the data structure. Essentially, the paper details how to ensure two properties: (1) All nodes will eventually receive all legal updates, and (2) no node will accept a malformed update, (and crafting undetectable malformed updates is infeasible).

rocqua · on March 5, 2022

Could this be used to prevent double spends by marking such a spend as 'an illegal update'?

I imagine not, but I wonder what limitation of CRDTs would prevent this.

dboreham · on March 8, 2022

Money requires consensus. I think it's the FLP Impossibility result that ultimately prevents CRDT-type (eventually consistent) schemes from implementing consensus. Researchers in eventual consistency pretty much just assume this. e.g. in the original CRDT papers -- bottom of page 10 in : https://arxiv.org/pdf/1805.06358.pdf; top of page 11 in : https://hal.inria.fr/inria-00609399v1/document

Ar-Curunir · on March 5, 2022

spends don’t commute: if two transactions both consume $1, and the senders account only contains $1, there’s no way to apply both transactions and end up in a valid state.

rocqua · on March 5, 2022

That makes sense. Thanks.

sbazerque · on March 4, 2022

Hyper Hyper Space [1] is a library for modeling distributed data structures that uses operational CRDTs represented over a Merkle-DAG (using the partial order defined by causal relationships, like the paper describes).

It is also designed to work in a Byzantine environment (it can run inside a browser, using WebCrypto, WebRTC, etc. to connect to untrusted peers over the open internet).

[1] https://www.hyperhyperspace.org

somezero · on March 4, 2022

> The 3f + 1 assumption means these protocols cannot be deployed in open peer-to-peer systems, since they would be vulnerable to Sybil attacks. In contrast, my approach makes no assumption about the number of Byzantine nodes.

I'm confused - probably because I haven't finished reading the paper.

1. Sybil-proof-ness requires a CA [1]. It's orthogonal to whether or not a protocol is BFT. Specifically, the classical BFT protocols "assume" there exist a CA, and then prove their protocols to be BFT.

2. I won't comment whether 3f+1 protocols cannot be deployed in open P2P systems (they can), but "makes no assumption about the number of Byzantine nodes" is weird. This result is valid in a particular system/time model eg. With PKI, in synchronous setting, for any `f` one can achieve consensus in `f+1` rounds using Dolev-Strong. This means you make no assumption about `n`, but the protocol is impractical for variety of reasons eg. large `n`, strong synchrony etc.

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

pdpi · on March 4, 2022

The big novel feature in Nakamoto consensus (the Bitcoin paper one) is precisely that it gives you Sybil-proof BFT without a CA. That's the whole reason for the massive energy consumption.

somezero · on March 4, 2022

It doesn't give you sybil-proof. It gives you sybil-resistance, and that, in a non-deterministic protocol.

wyager · on March 4, 2022

It gives you sybil resistance with arbitrarily low probability of failure, so you might as well say "proof".

somezero · on March 4, 2022

I'm fine with a sybil resistance mechanism (although it's not sybil-proof, and some particular forms of sybil-resistance are based on faulty models [1])

The problem is that this paper doesn't employ such mechanisms:

> This assumption is problematic for P2P systems...they must either exercise centralised control over which nodes are allowed to join the network, or employ expensive Sybil countermeasures such as proof-of-work [30]. This paper shows... is possible to guarantee the standard CRDT consistency properties even in systems in which arbitrarily many nodes are Byzantine, e.g. where the Byzantine nodes outnumber the correct nodes. This makes the algorithms immune to Sybil attacks...

This argument is incorrect. The counter-example is Dolev-Strong [2]. The number of faulty nodes is not fixed, but still needs a CA.

[1] https://eprint.iacr.org/2020/019.pdf

[2] Section 3.4 of http://elaineshi.com/docs/blockchain-book.pdf

wyager · on March 5, 2022

I think we're on the same page - I was referring to Bitcoin's algorithm giving you arbitrarily low failure probability, not the OP's paper.

rocqua · on March 5, 2022

Sybil proofness does not require a CA. The abstract of your link only states that CAs can be a solution. An alternative approach would be to have the system function as long as one member remains non-sybil.

somezero · on March 5, 2022

The paper is simple and readable enough for me to not have to comment. Anyway, The abstract itself says:

> This paper shows that, without a logically centralized authority, Sybil attacks are always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities.

And the proof is a few pages later in a few Lemmas. Note that this paper is where Sybil attacks actually come from, and you can see almost all DHT security papers (eg. Castro’s secure routing paper [1]) assume a CA.

[1] section 3.2 of https://www.cs.rice.edu/~dwallach/pub/osdi2002.pdf

posharma · on March 4, 2022

Martin's writing and explanation style is truly awesome! I read his book DDIA (Designing data intensive applications) and listened to his distributed systems class lectures [1]. It was a joy learning. I wish I had him or someone like him in graduate school. I would've probably taken all his courses :-).

[1] https://www.youtube.com/watch?v=UEAMfLPZZhE&list=PLeKd45zvjc...

fwoty · on March 4, 2022

Seriously. My advice to all, if you see "Kleppmann", click the link.

mactavish88 · on March 6, 2022

> Further work is required to demonstrate whether the techniques presented here are indeed effective in the context of particular CRDT algorithms, to prove their correctness in the face of Byzantine nodes, and to measure the performance impact of Byzantine fault tolerance.

This paper is super interesting, but it'd be even more interesting when we see this future work is completed.

seattleeng · on March 5, 2022

This is pretty interesting! Skiff (https://www.skiff.org/) also uses private, encrypted CRDTs for their collaborative docs, but not in a BFT manner (the updates are encrypted but still sent to a central server AFAIK).

amilich · on March 5, 2022

Yes, we use E2EE CRDTs! They don't have to be centralized because we still get composable updates using YJS (https://github.com/yjs/yjs). This would still help a ton given the p2p nature of updates (malicious or broken clients can cause havoc for documents).

9wzYQbTYsAIc · on March 5, 2022

Is Byzantine Fault Tolerance starting to pick up steam in interest?

Any other related news anyone knows of off the top of their head?