Huh? You ought not be depending on the default isolation settings. If your workl...

MichaelSalib · on Jan 22, 2013

Did you read the post? First, it covers both default isolation and maximum available. Second, note that many databases (like Oracle 11g) don't actually give you serializable semantics at ANY setting. Third, most of these products are not distributed systems. Fourth, ACID is not a spectrum for distributed systems; it is basically impossible to apply without sacrificing availability. But that's perhaps not a big deal if our non-distributed DBs don't provide real ACID anyway.

The real issue here is that the database world is a cargo cult where ignorant people scream ACID to denigrate new technologies without noticing that most production databases aren't running with anything close to ACID and that major database vendors can't even support ACID.

jpitz · on Jan 22, 2013

I did read the post.

It does cover both default and maximum. I didn't dispute that. I called out the notion that anyone ought to be depending on the defaults in the first place, or that SERIALIZABLE as a default was a good choice.

Yes, many don't support SERIALIZABLE. Didn't contradict that either.

As to whether many of these are or aren't distributed systems:

Ingres - has replication.

Aerospike - distributed/fault tolerant/blah blah

Persistit - nope. appears to be a library.

Clustrix - clustered.

Greenplum - this is shared-nothing clustered postgres.

DB2 for zOS - i have no idea. let's call this one not distributed, for giggles.

Informix - same

MySQL - lots of replication and HA options

MemSQL - replicated

MSSQL - replication and federated query modes

Nuodb - cloud database management? looks distributed to me.

Oracle - dont they have RAC ?

Berkeley (x2) - dont know. probably not.

PostgreSQL - a few replication options

HANA - no idea. lets call it in your favor.

ScaleDB - clustered.

Volt - shared nothing clustering

That's a little over half, by my count. Certainly close to most.

"The real issue here is that the database world is a cargo cult where ignorant people scream ACID to denigrate new technologies without noticing that most production databases aren't running with anything close to ACID and that major database vendors can't even support ACID."

Some can't. Some do. I'm not screaming. My main message is this: Don't depend on defaults. They differ from vendor to vendor. Understand your workload and use the APPROPRIATE isolation for it.

( edited for formatting and clarity )

MichaelSalib · on Jan 22, 2013

I called out the notion that anyone ought to be depending on the defaults in the first place

Regardless of whether you should depend on them, many many people do. Heck, many people don't even understand that there's a choice to be made: after all, everyone knows that Oracle is ACID compliant, right?

Yes, many don't support SERIALIZABLE. Didn't contradict that either.

Sorry, I was confused by the bit about "If your workload needs serializability, set it" since that's physically impossible on Oracle 11g.

Just because a DB has a replication package available (like MySQL) does not mean that it is a distributed system. And the file backed DBs (like Berkeley) are definitely not distributed. Sure, there are some extremely expensive massively parallel DBs in use (like Volt), but the number of deployments for those systems is a drop in the bucket compared with single-node MySQL/Postgres/Oracle/SQLServer/DB2 instances.

pbailis · on Jan 22, 2013

Good points on both sides. One thing I'll point out is that many clustering, HA, and multi-master replication solutions either rely on a single master (what I think Michael means when he says they're not distributed) or don't provide serializability.

Things get harder when you're distributed. For example, if you cluster with a sharded master-slave configuration, then, to serialize transactions that span shards/partitions, you'll need to do 2 phase commit or similar between masters for writes and, for most read/write transactions, make sure you don't read from slaves. If you cluster via master-master/active-active, then, for serializability, you'll need locking or some other concurrency control across masters for each shard/partition. Both setups are definitely do-able (if not highly available) but require non-trivial engineering.

jpitz · on Jan 22, 2013

"Regardless of whether you should depend on them, many many people do. Heck, many people don't even understand that there's a choice to be made: after all, everyone knows that Oracle is ACID compliant, right?"

Ignorance is not an excuse. It just isn't.

"Just because a DB has a replication package available (like MySQL) does not mean that it is a distributed system. And the file backed DBs (like Berkeley) are definitely not distributed. Sure, there are some extremely expensive massively parallel DBs in use (like Volt), but the number of deployments for those systems is a drop in the bucket compared with single-node MySQL/Postgres/Oracle/SQLServer/DB2 instances."

I do not understand what point you are trying to make here. Are these systems de-facto non-distributed systems merely because of deployment counts? Is there an objective criteria here I should be aware of?

MichaelSalib · on Jan 22, 2013

Ignorance is not an excuse. It just isn't.

I'm not interested in excusing anything, I'm interested in understanding the real world. And what I see is that a lot of people who insist on the absolute need for ACID don't really understand it because they're using non-ACID technology right now.

I do not understand what point you are trying to make here.

Consider MySQL with asynchronous replication to a slave. In a weak sense, that is a distributed system because the remote slave is on a different machine (and probably very distant) from the master. But the distributed bit here doesn't interfere with correctly implementing ACID: MySQL with async replication operates identically to single node MySQL: it just transmits the binary logs to a slave server. The database system itself is a single-node service whose state gets replicated at transaction boundaries.

In contrast, distributed shared-nothing databases like Volt have to work really hard to maintain consistency: there is no single node in those systems that does all the work and gets replicated: multiple nodes have to cooperate in order to get anything done.

ericfrenkiel · on Jan 22, 2013

I would amend that MemSQL is replicated as well as shared-nothing clustered

ww520 · on Jan 22, 2013

People scream ACiD on NoSQL's because they fail the easier A, C, or D in various forms, not the harder I part. Having AC&D go a long way in ensuring data correctness.

pbailis · on Jan 22, 2013

Unlike "C" and "A" in "CAP," "AC&D" (specifically, "C") can't be easily separated from "I".

Serializability ("I") ensures that database consistency, or maintenance of integrity constraints ("C"), is not violated. While it's possible to get consistency ("C") without serializability (which would give up traditional "I" in favor of a weaker form of isolation), it's often difficult [see http://www.bailis.org/blog/when-is-acid-acid-rarely/#arbitra...].

lucian1900 · on Jan 22, 2013

Things like MongoDB fail at A and D, which is what the parent was probably thinking of.

willlll · on Jan 22, 2013

It also fails at C for concurrent writes

lucian1900 · on Jan 23, 2013

Sure, but A and D are the particularly embarrassing ones.

zzzeek · on Jan 22, 2013

I think read committed and repeatable read is "pretty close" to ACID. Certainly compared to a straight up NoSQL system that has close to none of A, C, I or D.