*I called out the notion that anyone ought to be depending on the defaults in th...

pbailis · on Jan 22, 2013

Good points on both sides. One thing I'll point out is that many clustering, HA, and multi-master replication solutions either rely on a single master (what I think Michael means when he says they're not distributed) or don't provide serializability.

Things get harder when you're distributed. For example, if you cluster with a sharded master-slave configuration, then, to serialize transactions that span shards/partitions, you'll need to do 2 phase commit or similar between masters for writes and, for most read/write transactions, make sure you don't read from slaves. If you cluster via master-master/active-active, then, for serializability, you'll need locking or some other concurrency control across masters for each shard/partition. Both setups are definitely do-able (if not highly available) but require non-trivial engineering.

jpitz · on Jan 22, 2013

"Regardless of whether you should depend on them, many many people do. Heck, many people don't even understand that there's a choice to be made: after all, everyone knows that Oracle is ACID compliant, right?"

Ignorance is not an excuse. It just isn't.

"Just because a DB has a replication package available (like MySQL) does not mean that it is a distributed system. And the file backed DBs (like Berkeley) are definitely not distributed. Sure, there are some extremely expensive massively parallel DBs in use (like Volt), but the number of deployments for those systems is a drop in the bucket compared with single-node MySQL/Postgres/Oracle/SQLServer/DB2 instances."

I do not understand what point you are trying to make here. Are these systems de-facto non-distributed systems merely because of deployment counts? Is there an objective criteria here I should be aware of?

MichaelSalib · on Jan 22, 2013

Ignorance is not an excuse. It just isn't.

I'm not interested in excusing anything, I'm interested in understanding the real world. And what I see is that a lot of people who insist on the absolute need for ACID don't really understand it because they're using non-ACID technology right now.

I do not understand what point you are trying to make here.

Consider MySQL with asynchronous replication to a slave. In a weak sense, that is a distributed system because the remote slave is on a different machine (and probably very distant) from the master. But the distributed bit here doesn't interfere with correctly implementing ACID: MySQL with async replication operates identically to single node MySQL: it just transmits the binary logs to a slave server. The database system itself is a single-node service whose state gets replicated at transaction boundaries.

In contrast, distributed shared-nothing databases like Volt have to work really hard to maintain consistency: there is no single node in those systems that does all the work and gets replicated: multiple nodes have to cooperate in order to get anything done.