I recently finished a 2nd reading of DDIA and while listening to a podcast featuring an interview with Martin Kleppmann, I got so thrilled when he mentioned that he plans to release another book in the coming years.
DDIA is so good that I feel the same kind of anticipation I have as when waiting for a next in a series fantasy book to take me back to a parallel world with characters I've come to love and miss like one misses a good friend they haven't seen for a while.
Does this book give ways to deal with the following: a mobile app that works offline (local replica), and then when it's online, syncs (through a server) with other devices' local replicas? I don't even know what to call this situation. I'm sure CRDTs could help with this.
I guess the book is more of a very well structured fast path (compared to learning from blogs) to give you a fundamental understanding of storing, transferring data.
You won’t find the exact answer to your question, but reading this book enlightens you to read other things and helps you better to understand what solution to use in what case.
I agree, that it's more of a taxonomy and reasoning guide for distributed systems, but the situation in question, regarding long replication lag for offline devices is introduced as an example (through a calendar app) when discussing multi-leader replication.
"And finally, if you're interested in supporting this kind of thing financially, I did set up a Patreon account with the goal of trying to turn this into a potential career of an independent researcher, not tied to any institution necessarily, but just being able to continue doing the research and the teaching the work that I do, perhaps writing books, and the second edition of my current book is potentially in the works."
Really nice to see BFT starting to be taken seriously in CRDT research. I had done some research in this area last year and came to a lot of the same solutions (i.e. BRB protected CRDTs when dealing with VClock based CRDTs):
We ended up moving away from VClock crdts entirely for our work and going with grow-only hash-graph CRDTs as they have don't need the BRB overhead (as Martin has found in his research as well).
Unfortunately, the verification that the author proposes (not accepting new updates until the dag below is verified) will need a lot of caveats for real world usage.
Currently we use these CRDTs for a key value database of 40M+ keys in a deployment of ipfs-cluster, which uses https://github.com/ipfs/go-ds-crdt .
Martin here. I cited your paper in the related work section. It's a good start, but it does not cover everything that's required to achieve BFT — in particular, the issue that an operation may be valid in some contexts and invalid in others, and all correct nodes need to agree on whether an operation is valid or not.
If you have more details on the caveats you have discovered, it would be great if you could write them up so that others can learn from your experience!
I now had time to read more closely. Thank you for citing us!
In our paper and original approach/algo (iirc), CRDT updates are applied from the bottom of the dag to the heads (older to recent). This is also what you propose although we don't discuss how this helps wrt BFT. I think it is a very good point to highlight that hash-graph based CRDTs (or Merkle-CRDTs as well call them) can provide this property when processed "in order".
The caveat is that, in practice we found this quite impractical: when receiving an update with unknown descendants, you'd have to choose whether to hold on it to apply it at a later point when the missing sub-dag is received (which may never happen), or to drop it and rely on re-transmission at a later point. The first opens the door to abuse by a bad actor because you spend memory holding on to un-applied updates. The latter causes additional work as updates could pile up over an missing one, thus incurring re-transmission costs for the missing and all the later elements.
This also means that peers pubsub systems should probably decide whether to re-broadcast an update based on whether is valid or not per the CRDT operation it contains, which can have bad side-effects: a network missing an update due to a network issue may block the broadcasting of later any later updates even if they are correct, thus worsening the delivery for everyone and forcing the network to issue more re-transmissions.
And as a result, a peer should also decide whether to issue/broadcast updates based on whether the previous update has at least been received by a non Byzantine replica, as otherwise updates built on top will not be accepted by the network until re-transmission. That makes another potential bottleneck.
In our implementation, we found more practical to process updates immediately as they are received, and then process descendants until we reach the "known" part of the DAG. This means every update can be broadcast, gossiped around, and will be potentially applied without doing any waiting for parents, and occasional loss of an update does not block the processing of new updates before re-transmission. If an update has descendants that do not exist then it can be considered a DAG with a different "root" and does not have many consequences in terms of convergence . Note that here we are talking about DAGs with depth of 100k+ items, where sometimes there are 200 heads and processing every update may take a few seconds. We need to avoid blocking a replica as much as possible and get as much work done as possible asap.
I think some CRDTs can get away with this (in our case, CRDT-add-only-sets, using the hashes as IDs) and be byzantine-failure resistant (things would converge). In the paper you mention some examples where CRDT-rules can be abused more easily, so I'm guessing it is more difficult other CRDT types. Ensuring that IDs are unique is one of the main advantages of using hash trees.
In general, I think an attacker can usually find ways to screw up with a non-permissioned CRDT system without breaking convergence (i.e by submitting many valid updates). Your approach makes however a very good point wrt to misbehaving nodes which are not necessarily malicious.
gritzko (always on the vanguard!) has also been doing this kind of stuff in RON for a while, though I haven't looked into it deeply enough to determine how much it overlaps with Kleppmann's approach: https://replicated.cc
DAG size gets out of hand when many operations happen (akin to how a blockchain grows indefinitely).
Nodes could however rely on some form of "consensus" to compact/reset the graph at intervals. But then you have invited a consensus into the party which is the opposite of what CRDTs want to have.
Woah. This paper looks freaking awesome - and surprising! “The proposed scheme can tolerate any num- ber of Byzantine nodes (making it immune to Sybil attacks)” - this is really intriguing because of the strong impossibility results in consensus algorithms around how many faulty nodes can be tolerated. I’m looking forward to reading more than just the abstract tho (confession).
Consensus is only necessary when one needs to decide between conflicting choices, and CRDTs are by definition conflict-free. Nodes only need to verify that any update to the CRDT is in fact a legal operation on the data structure. Essentially, the paper details how to ensure two properties: (1) All nodes will eventually receive all legal updates, and (2) no node will accept a malformed update, (and crafting undetectable malformed updates is infeasible).
Money requires consensus. I think it's the FLP Impossibility result that ultimately prevents CRDT-type (eventually consistent) schemes from implementing consensus.
Researchers in eventual consistency pretty much just assume this.
e.g. in the original CRDT papers -- bottom of page 10 in : https://arxiv.org/pdf/1805.06358.pdf; top of page 11 in : https://hal.inria.fr/inria-00609399v1/document
spends don’t commute: if two transactions both consume $1, and the senders account only contains $1, there’s no way to apply both transactions and end up in a valid state.
Hyper Hyper Space [1] is a library for modeling distributed data structures that uses operational CRDTs represented over a Merkle-DAG (using the partial order defined by causal relationships, like the paper describes).
It is also designed to work in a Byzantine environment (it can run inside a browser, using WebCrypto, WebRTC, etc. to connect to untrusted peers over the open internet).
> The 3f + 1 assumption means these protocols cannot be deployed in open peer-to-peer systems, since they would be vulnerable to Sybil attacks. In contrast, my approach makes no assumption about the number of Byzantine nodes.
I'm confused - probably because I haven't finished reading the paper.
1. Sybil-proof-ness requires a CA [1]. It's orthogonal to whether or not a protocol is BFT. Specifically, the classical BFT protocols "assume" there exist a CA, and then prove their protocols to be BFT.
2. I won't comment whether 3f+1 protocols cannot be deployed in open P2P systems (they can), but "makes no assumption about the number of Byzantine nodes" is weird. This result is valid in a particular system/time model eg. With PKI, in synchronous setting, for any `f` one can achieve consensus in `f+1` rounds using Dolev-Strong. This means you make no assumption about `n`, but the protocol is impractical for variety of reasons eg. large `n`, strong synchrony etc.
The big novel feature in Nakamoto consensus (the Bitcoin paper one) is precisely that it gives you Sybil-proof BFT without a CA. That's the whole reason for the massive energy consumption.
I'm fine with a sybil resistance mechanism (although it's not sybil-proof, and some particular forms of sybil-resistance are based on faulty models [1])
The problem is that this paper doesn't employ such mechanisms:
> This assumption is problematic for P2P systems...they must either exercise centralised control over which nodes are allowed to join the network, or employ expensive Sybil countermeasures such as proof-of-work [30]. This paper shows... is possible to guarantee the standard CRDT consistency properties even in systems in which arbitrarily many nodes are Byzantine, e.g. where the Byzantine nodes outnumber the correct nodes. This makes the algorithms immune to Sybil attacks...
This argument is incorrect. The counter-example is Dolev-Strong [2]. The number of faulty nodes is not fixed, but still needs a CA.
Sybil proofness does not require a CA.
The abstract of your link only states that CAs can be a solution.
An alternative approach would be to have the system function as long as one member remains non-sybil.
The paper is simple and readable enough for me to not have to comment. Anyway, The abstract itself says:
> This paper shows that, without a logically centralized authority, Sybil attacks are always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities.
And the proof is a few pages later in a few Lemmas.
Note that this paper is where Sybil attacks actually come from, and you can see almost all DHT security papers (eg. Castro’s secure routing paper [1]) assume a CA.
Martin's writing and explanation style is truly awesome! I read his book DDIA (Designing data intensive applications) and listened to his distributed systems class lectures [1]. It was a joy learning. I wish I had him or someone like him in graduate school. I would've probably taken all his courses :-).
> Further work is required to demonstrate whether the techniques presented here are indeed effective in the context of particular CRDT algorithms, to prove their correctness in the face of Byzantine nodes, and to measure the performance impact of Byzantine fault tolerance.
This paper is super interesting, but it'd be even more interesting when we see this future work is completed.
This is pretty interesting! Skiff (https://www.skiff.org/) also uses private, encrypted CRDTs for their collaborative docs, but not in a BFT manner (the updates are encrypted but still sent to a central server AFAIK).
Yes, we use E2EE CRDTs! They don't have to be centralized because we still get composable updates using YJS (https://github.com/yjs/yjs). This would still help a ton given the p2p nature of updates (malicious or broken clients can cause havoc for documents).
DDIA is so good that I feel the same kind of anticipation I have as when waiting for a next in a series fantasy book to take me back to a parallel world with characters I've come to love and miss like one misses a good friend they haven't seen for a while.