> Different nodes could have and return different data.
Only if they miss some writes, and eventually they will converge. But if you do quorum writes and quorum reads (or local quorum W + local quorum R), this guarantees you'll read from at last one node that received all the writes issued before the read, so you get the converged value immediately, regardless of which node you ask. All nodes will eventually agree on the value, because timestamps are assigned by the coordinator or by the application, not at the replica. A single write will get the same timestamp across all replicas.
Incorrect timestamps can cause a different problem - a write that happened at physical time T2 > T1 might be considered to be older than T1 by the cluster, if it was accepted by the coordinator whose clock was set in the past. Such write might simply not take any effect, as old updates would be considered newer. However, again, the resolution would be consistent on all replicas once they get all updates.
timestamp is coming from the operating system where the client library is running. so unless all your write request are issued by the same machine then
you are guaranteed to face the problem where the client machine clocks are not in sync.
just comparing timestamp is obviously a design from someone that didn’t review the academic literature on distributed transaction and consensus
>> timestamp is coming from the operating system where the client library is running.
The client can set the timestamp to any value of its choice. It does not have to correspond to clock time.
>> so unless all your write request are issued by the same machine then
you are guaranteed to face the problem where the client machine clocks are not in sync.
It's not about all writes. It's about ensuring that all writes to a single partition (specifically, a single column within a single partition) are done using a source of monotonic integers to ensure ordering.
You are correct that if you have a singleton server returning monotonic time to clients this would work.
But this Monotonic time server is not part of Cassandra itself and for this reason majority of the people using Cassandra will use the OS time without knowing this is silently corrupting the database.
Only if they miss some writes, and eventually they will converge. But if you do quorum writes and quorum reads (or local quorum W + local quorum R), this guarantees you'll read from at last one node that received all the writes issued before the read, so you get the converged value immediately, regardless of which node you ask. All nodes will eventually agree on the value, because timestamps are assigned by the coordinator or by the application, not at the replica. A single write will get the same timestamp across all replicas.
Incorrect timestamps can cause a different problem - a write that happened at physical time T2 > T1 might be considered to be older than T1 by the cluster, if it was accepted by the coordinator whose clock was set in the past. Such write might simply not take any effect, as old updates would be considered newer. However, again, the resolution would be consistent on all replicas once they get all updates.