There are MVCC systems with serializable, strictly serializable, and even extern...

tristor · on July 25, 2020

I'd be very interested in you providing an example of an MVCC system which is fully serializable. Conventional wisdom is, that while possible, it is prohibitively expensive to ensure a snapshot isolation system like MVCC is fully serializable, and it is explicitly most expensive for analytics / OLAP workloads because you must keep track of the read set of every transaction. It is possible for such a system to exist, then, but it would cost such a performance penalty as to be a painfully bad choice for a workload where OLAP and OLTP would be mixed, bringing us back to the point I originally made.

In most cases, snapshot isolation is sufficient, and some database vendors even conflate snapshot isolation with serializable, but they're not the same thing. I'd be hesitant to believe any vendor claims that they implement serializable MVCC without validating it via testing. As we've been shown by Jepsen, database vendors make many claims, some of which are unsubstantiated. Spanner is very cool technology, however I have personally heard some very interesting claims from folks on the Spanner team that would violate the laws of physics, so again, without a demonstration of their claims, I'd take them with a grain of salt.

johncolanduoni · on July 25, 2020

Both the FoundationDB docs and Spanner whitepapers are very clear that their definitions of strict serializability match with the conventional academic one. FoundationDB does keep track of the read set of every transaction: there are nodes dedicated to doing that in their architecture. You can even add things to the read set of each transaction without actually reading them to get additional guarantees about ordering (e.g. to implement table-level locks). FoundationDB (and obviously Spanner as well) don’t have any massive penalties for this; FoundationDB clusters can handle millions of operations per second with low single digit read and commit latencies: https://apple.github.io/foundationdb/performance.html.

If your condition for believing these claims is approval from Jepsen (i.e. Kyle Kingsbury), he apparently didn’t both testing FoundationDB because their test suite is “waaaay more rigorous”: https://twitter.com/aphyr/status/405017101804396546. In particular their test suite is able to produce reproducible tests of any variation in messaging (dropped messages, reordered messages) across their single-threaded nodes, which is extremely useful in narrowing down places where serialization violations can hide. He also seems to believe Spanner’s claims: https://www.youtube.com/watch?v=w_zYYF3-iSo

I’m not sure where this “conventional wisdom” about serializability having an unavoidable large performance hit is coming from; the databases I mentioned are very well known in the distributed systems field and not some random vendors making outlandish claims.