> A quorum insert fails. is the data there or not? Maybe! How is this point fund...

Luker88 · on Oct 9, 2022

In CQL queries can fail even if your connection is still up and running.

In this case it means that it could not write to 3 nodes, but it is not telling you if it did not write anything or at least once. Writing just once means that the data will be replicated.

So if only 1 of the 3 nodes is up, it will still write there and then return error, even if the coordinator knows that the other nodes are not up.

Queries have a maximum running time (only configurable at the database level if I remember correctly), and if your write exceeds this, it returns error. The replication still goes on and the data is still there.

Network errors are different category, since you might not even know if your command was sent or not. Cassandra kind of sidesteps the network errors though since the client connects to multiple coordinators in the cluster.

_benedict · on Oct 9, 2022

They are not a different category. This is a distributed database, network errors and node failures are a fundamental part of its function.

Fundamentally every distributed protocol has a moment where it may have to tell the client it has abandoned the operation, but already has in flight messages to remote replicas that would result in a decision to complete the operation.

Before that operation is answered by the replicas, the operation is in an unknown state. It is fundamental. Some slower approaches to reaching decisions may mask this problem more often, but it is there whether you realise it or not.

jfray2k22 · on Oct 10, 2022

I think the concern there is that these different failure modes are all given the same error/exception messaging? The caller could make different decisions based on these different potential outcomes, but only if the caller receives specific errors. It's been a while since I've used Cassandra, so if the error handling has improved along those lines then I apologize.

_benedict · on Oct 10, 2022

A timeout is pretty much always an unknown outcome. Cassandra does have dedicated exceptions for failed writes (which should have a certain outcome of not applied) but this is much rarer in practice.

Cassandra does today inform you on timeout how many replicas have been successfully written to, if it was insufficient to reach the requested level of durability, which many years ago it did not. But this is no guarantee the write is durable, as this will represent a minority of nodes - and they may fail, or be partitioned from the remainder of the cluster. So the write remains in an unknown state until successfully read at QUORUM, much like a timeout during COMMIT for any database.