This is great stuff, will need time to read it and digest. I feel that CRDTs are something that should have been invented much earlier than 2011. Formalizing the conversation around what the base level assumptions are that are necessary to build these systems is really exciting. I have a conceptual understanding of the implications of distributing state and coordination of changes on that state, but it’s so much easier for us all to get things right when there’s best practices and understanding around these concepts. It’s kind of like how Raft made an easy to understand and include consensus library, or Yjs the same for CRDTs, or libsodium makes it easy(er) to do security correctly. It helps us develop the native units of computing for distributed systems, in the “geographically distinct edge computing sense” and not in the “a bunch of nodes with fast interconnects” sense, where offline is common and coordination has major performance implications to UIs.
I find the similarities & diffferences interesting between this and CRDTs.
This seems to be saying that any algorithm with a monotonic output with respect to input information "has a consistent, coordination-free distributed implementation".
As I understand it, for data to be monotonic requires that the data is partially orderable.
CRDTs require partial ordering, as well as a merge() function so as to create a lattice.
This seems then that CRDT's have stronger requirements - this seems to make sense, since CRDT's are about sharing data, whereas this CALM theory is only talking about making a local decision.
> CRDT's are about sharing data, whereas this CALM theory is only talking about making a local decision.
Both CRDT and CALM are about sharing data to make a local decision which is globally consistent.
Both use an order relation over datasets to modelise the concept of "adding more info to some partial input or output".
I would say that the difference is on their focus. CALM determines the frontier where no coordination is required and provides general criteria (no need either to retract former output nor to hear everything there is to hear nor to know all the participants). CDRT provides a mean to meet this criteria. By taking the least upper bound of former partial results, CDRT ensures that the outcome is growing.
I think you're conflating the deadlock-detection/garbage collector examples with the general gist of the paper, or maybe I just misunderstood?
My point is that 'deletion' in every dbms - be it SQL or NoSQL - is equivalent to logical negation of a statement, which in turn is equivalent to the non monotonic query example that they use throughout the article `¬∃ ("not exists")`.
There is a major logical difference between an explicit delete (negation in a closed world interpretation) and a forget (temporary inconsistency in an open world interpretation), in the sense that the latter can be repaired by subsequent messaging.
This means that you can't have easy distributed consistency with our current databases and data models.
This is partially addressed by temporal databases like datomic and juxt, however they too have delete/retract as logical negation, which makes their monotonicity properties limited to the set of 'change operations' instead of the data itself, which in turn makes it hard to work with. If their monotonicity were on the data-model layer the entire database would essentially become a CRDT.
Sure, negation and deletion are non-monotonic. And sure, they require coordination and waiting if you really want serializability. But the modern default is snapshot isolation, and experience has shown that there are many many applications that are quite willing to live with this weaker level of isolation (eventual consistency). You can get monotonic reads/writes, without having to wait for all of them to respond, which is the point of this paper. Multi-version databases are lattices.
Tbh I think you got it backwards.
CALM is actually very much about eventual consistency.
The gist is basically, that distributed systems are only eventually consistent iff they are logically monotonic.
Coordination as used in this article basically means "needs to be decided by an authority". But because of CALM we know that in order to elect an authority/leader, you need a consensus algorithm that actually has monotonicity at it's core, like raft.
I actually don't think that database isolation levels are a good way to talk about this stuff, because they are practical guarantees, not theoretical ones. Snapshot isolation for example also requires a leader.
From a theoretical point of view there is no in between (at least none that I can see), either you are eventually consistent or you're not, either you are monotonic or you're not.
> experience has shown that there are many many applications that are quite willing to live with this weaker level of isolation (eventual consistency)
My experience has also shown that the effort for getting this right is vastly underestimated, initially and that eventually most of these many many applications end up with a set of bugs triggered by edge cases nobody thought about.
thedailywtf used to be a fun place for seeing people answer the question, "what could go wrong?" but it got too dark or cringey for me (I can find dark without any help, thank you.)
There's a whole ecosystem of minor catastrophes that don't quite warrant a blog post. Do we need a name for the Chesterton house on the corner of Dunning Way and Kruger Lane?
At the same time, it's really hard to learn why that fence is there until you at least see someone else get injured by ignoring the signs, so in that respect TDWTF probably taught a lot of people about quite a few fences.
Typically we use “conflate” as a synonym for confuse, but it can also mean removing a distinction.
In which case yes, I am conflating the two. I’m saying the paper is making a distinction that doesn’t exist. Concurrent reachability is much the same problem everywhere, even between mark and sweep and reference counting. It’s all thread safe graph theory.
They talks about a more general concept of consistency that is per se unrelated to graph algorithms.
The examples they have use to illustrate the idea are deadlock detection and garbage collection in a distributed setting, which are two very distinct problems, in that the former will eventually converge, while the other does not necessarily due to the 2 generals problem (you could try to move all the information to a single node, and you can do so in practice, but because of the 2 generals problem you can't once your connections have a chance of being lossy).