> At the end of the day, you still have to go to disk on writes, and this must b...

hodgesrm · on July 24, 2020

OLTP databases are optimized for mutable data. OLAP databases are optimized for immutable data. There's a big difference between appropriate data structures for each use case that has little to do with hardware capabilities.

OLAP databases tend to write columns in large blocks and apply sort orders to improve compression. This type of structure works well if you write the data once and read it many times. It's horrendously inefficient for concurrent updates to things like user session contexts. (Or even reading them for that matter.) You are better off using a row store with ACID transactions and relatively small pages.

The dichotomy has been visible for decades and shows no sign of disappearing, because the difference is mostly how you arrange and access data, not so much the hardware used.

tristor · on July 24, 2020

"serialized" here doesn't really mean processed in serial, it means "serializable" in the context of database information theory. Databases have special concurrency control requirements in order to create hard guarantees on database consistency. You can process queries in parallel and still have a serializable result, because of transaction coordination. Doing this on one server is much easier than doing this across a cluster of servers.

So in your case, MVCC is what you're talking about, which is not the same level of consistency guarantee as serializable, rather it is based on snapshot isolation. Some database vendors consider them effectively the same isolation level because the anomalies associated to other common non-serializable isolation levels aren't typically present in most MVCC implementations, but there's a lot more complexity here than you are acknowledging.

Mixing OLTP and OLAP workloads on the same database is pretty much always a bad idea. This is why it's common practice to use ETL jobs to move data from an OLTP optimized database like Postgres or MySQL to a separate database for OLAP (which could be another MySQL or PG instance, or could be something like ClickHouse or another columnar database optimized for OLAP). Just because you /can/ do something, doesn't mean you /should/ do something...

johncolanduoni · on July 24, 2020

There are MVCC systems with serializable, strictly serializable, and even externally consistent transactions. FoundationDB and Spanner are both externally consistent (with geo-replication in Spanner’s case). CockroachDB is serializable, though not strictly serializable. Single-master Postgres can do serializable transactions as well.

tristor · on July 25, 2020

I'd be very interested in you providing an example of an MVCC system which is fully serializable. Conventional wisdom is, that while possible, it is prohibitively expensive to ensure a snapshot isolation system like MVCC is fully serializable, and it is explicitly most expensive for analytics / OLAP workloads because you must keep track of the read set of every transaction. It is possible for such a system to exist, then, but it would cost such a performance penalty as to be a painfully bad choice for a workload where OLAP and OLTP would be mixed, bringing us back to the point I originally made.

In most cases, snapshot isolation is sufficient, and some database vendors even conflate snapshot isolation with serializable, but they're not the same thing. I'd be hesitant to believe any vendor claims that they implement serializable MVCC without validating it via testing. As we've been shown by Jepsen, database vendors make many claims, some of which are unsubstantiated. Spanner is very cool technology, however I have personally heard some very interesting claims from folks on the Spanner team that would violate the laws of physics, so again, without a demonstration of their claims, I'd take them with a grain of salt.

johncolanduoni · on July 25, 2020

Both the FoundationDB docs and Spanner whitepapers are very clear that their definitions of strict serializability match with the conventional academic one. FoundationDB does keep track of the read set of every transaction: there are nodes dedicated to doing that in their architecture. You can even add things to the read set of each transaction without actually reading them to get additional guarantees about ordering (e.g. to implement table-level locks). FoundationDB (and obviously Spanner as well) don’t have any massive penalties for this; FoundationDB clusters can handle millions of operations per second with low single digit read and commit latencies: https://apple.github.io/foundationdb/performance.html.

If your condition for believing these claims is approval from Jepsen (i.e. Kyle Kingsbury), he apparently didn’t both testing FoundationDB because their test suite is “waaaay more rigorous”: https://twitter.com/aphyr/status/405017101804396546. In particular their test suite is able to produce reproducible tests of any variation in messaging (dropped messages, reordered messages) across their single-threaded nodes, which is extremely useful in narrowing down places where serialization violations can hide. He also seems to believe Spanner’s claims: https://www.youtube.com/watch?v=w_zYYF3-iSo

I’m not sure where this “conventional wisdom” about serializability having an unavoidable large performance hit is coming from; the databases I mentioned are very well known in the distributed systems field and not some random vendors making outlandish claims.

barrkel · on July 24, 2020

There's an enormous leap between something which is slow on SQLite and something which requires etl into a data warehouse or similar tech, columnar store etc.

I mean, at least three orders of magnitude, minimum.

It's just a ludicrous argument. SQLite is fine for a file format, and in very specific dumb CRUD scenarios it's just about ok. But it's not worth sticking with if you need anything interesting over any volume of data, far far far below what would warrant a different DB tech to rdbms.

Ironically, you're more representative of the "we'll need Hadoop" crowd.

tristor · on July 25, 2020

> Ironically, you're more representative of the "we'll need Hadoop" crowd.

That seems an especially odd assertion to make. What is your basis for this comment? I'd suggest it makes clear you know nothing about me.

barrkel · on July 26, 2020

A defense of SQLite on unsuitability to OLAP grounds, when many aggregate and analytic functions can be performed perfectly acceptably on RDBMSes like Postgres (and even MySQL, with care), diving straight to the append-only, data pipeline with ETL approaches.

b5n · on July 24, 2020

> Mixing OLTP and OLAP becomes increasingly "normal"

Just because it's "normal" doesn't mean it's correct. Just because you can doesn't mean you should.

All hail bob1029:

> We aren't running any reports on our databases like this. I would argue it is a bad practice in general to mix OLTP and OLAP workloads on a single database instance, regardless of the specific technology involved.

unilynx · on July 24, 2020

I think this is specifically because random reads are scaling much better than writes, even though you won't see it on standard "that many MB/s" benchmarks where read is 'just' a few multiples of the write performance.

Persisting a transaction to the database is still (and especially in MVCC): "send data write". "wait for write to be flushed". "toggle metadata bit to mark write as completed". "wait for bit to be completed" which still serialises transaction commits while reads can complete in parallel as fast as the device can handle.

Especially now that the reads and writes don't have to share the disk head, it makes sense for random reads to keep on scaling better than writes.

ComputerGuru · on July 24, 2020

This is actually the opposite of current limitations. Stick a capacitor and some DRAM on the NVMe and you can "instantly" flush to disk, but there's no way to anticipate where the next read will come from and therefore no way to accelerate it.

You'll see modern NVMe disks with sustained writes greatly outpacing reads until the write cache is saturated, at which point what you say is true and reads will greatly outpace writes. But you don't want your disks to ever be in that threshold.

BFatts · on July 24, 2020

I think we're seeing the difference between someone who is a programmer and someone who plays a programmer at work. :) Arguing that some modern, complex feature is "better than" a simpler system is crazy talk. Any time I can simplify a system vs. add complexity I will go simplicity. Adding in plug-ins and features goes a long way toward vendor lock-in that prevents your org from being agile when you have to swap systems because you run into limits or have catastrophic failures.