Can you explain more why MySQL is operational more easy? I personally don‘t see ...

vanviegen · on Feb 25, 2022

Besides what's already mentioned by siblings:

- Forwards (and usually also backwards) compatible disk format, meaning version updates don't require more than a few minutes of downtime. On large datasets, postgresql can require days or weeks.

- Replication works across version differences, making upgrades without any downtime at all easier.

- No need for a vacuum process that can run into trouble with a sustained high write load.

- Page-level compression, reducing storage needs for some types of data quite a bit.

- Clustered primary keys. Without these, some types of queries can become rather slow when your table doesn't fit in memory. (Example: a chat log containing many billions of messages, which you're querying by conversion-id.)

- xtrabackup (innobackupex) is awesome.

Gibheer · on Feb 25, 2022

> - No need for a vacuum process that can run into trouble with a sustained high write load.

MySQL has a similar mechanism called the purge job. It can lead to similar problems, for example DML slow down.

https://mariadb.com/kb/en/innodb-purge/

mattashii · on Feb 25, 2022

> - Forwards (and usually also backwards) compatible disk format, meaning version updates don't require more than a few minutes of downtime.

How so? Postgres' data formats are forwards (and mostly backwards) compatible with release 8.4 in 2009; where effectively only the catalogs need upgrading. Sure, that can be a lot of data, but no different from MySQL or any other RDBMS with transactional DDL.

> - Replication works across version differences

Logical replecation is available since at least 9.6

boxed · on Feb 25, 2022

Not the dump format. The actual data on disk of the database itself.

If you try upgrading even one major postgres version and you're not aware of this you lose all your data (you don't really as you can roll back to the previous version, but it doesn't even tell you that!)

mattashii · on Feb 25, 2022

That is what I'm talking about, yes.

Page format for tables hasn't changed since 8.4, tuple format hasn't changed in a backwards-incompatible manner since 8.4 (probably earlier).

BTree indexes have changed a bit, but ones created in 9.0.x can still be loaded and used in 14.x.

GIN/GiST/SP-GiST/BRIN all don't seem to have had backwards-incompatible changes, looking at https://why-upgrade.depesz.com/show?from=8.4&to=14.2

The only thing that changed is the catalogs, upgradable through pg_upgrade.

justinclift · on Feb 25, 2022

Pretty much every major PG release clearly states the on-disk format isn't fixed between major versions, and is subject to change:

• https://www.postgresql.org/docs/14/upgrading.html

• https://www.postgresql.org/docs/13/upgrading.html

• https://www.postgresql.org/docs/12/upgrading.html

• https://www.postgresql.org/docs/11/upgrading.html

• https://www.postgresql.org/docs/10/upgrading.html

  For major releases of PostgreSQL, the internal data storage format is subject
  to change, thus complicating upgrades.

mattashii · on Feb 25, 2022

Yes, that's more 'keeping the option open' than 'we do this regularly', and I can't seem to find any documentation that MySQL gives any better guarantee.

Note that for all of 10, 11, 12, 13 and 14 no changes have been made in the storage format of tables that made it mandatory to rewrite any user-defined schema.

I will admit that some changes have been made that make the data not forward-compatible in those versions; but MySQL seems to do that at the minor release level instead of the major release level. MySQL 8 doen't even support minor release downgrades; under PostgreSQL this works just fine.

hans_castorp · on Feb 25, 2022

pg_upgrade will take care of that. And you can still run upgrade from 8.4 to 14 using the --link mode which means no data will be copied - only the system catalogs need to be re-created.

boxed · on Feb 28, 2022

You're totally missing the point. It's like you're saying "other than that, how did you like the play mrs Lincoln?"

pg_upgrade should not be needed at all! It isn't for mysql. Or if it is needed it should be automatic.

Just the command line description and the manual of pg_upgrade makes you want to run away screaming. It's not ok.

throwdbaaway · on March 2, 2022

>> Just the command line description and the manual of pg_upgrade makes you want to run away screaming. It's not ok.

Why is it not ok? This is pg_upgrade:

> Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files. If a future major release ever changes the data storage format in a way that makes the old data format unreadable, pg_upgrade will not be usable for such upgrades. (The community will attempt to avoid such situations.)

This is mysql_upgrade:

> Each time you upgrade MySQL, you should execute mysql_upgrade, which looks for incompatibilities with the upgraded MySQL server:

> * It upgrades the system tables in the mysql schema so that you can take advantage of new privileges or capabilities that might have been added.

> * It upgrades the Performance Schema, INFORMATION_SCHEMA, and sys schema.

> * It examines user schemas.

> If mysql_upgrade finds that a table has a possible incompatibility, it performs a table check and, if problems are found, attempts a table repair.

The description seems comparable, and if anything pg_upgrade looks saner by not attempting silly things like "table repair".

>> pg_upgrade should not be needed at all! It isn't for mysql. Or if it is needed it should be automatic.

It isn't for mysql since 8.0.16, where this becomes an automatic process. Personally, I prefer having an explicit step to perform the upgrade, instead of running a new binary to start the database process and also optionally perform the upgrade at the same time.

hans_castorp · on Feb 25, 2022

> Clustered primary keys. Without these, some types of queries can become rather slow when your table doesn't fit in memory. (Example: a chat log containing many billions of messages, which you're querying by conversion-id.)

https://use-the-index-luke.com/blog/2014-01/unreasonable-def...

SigmundA · on Feb 25, 2022

Clustered indexes can be overused but they are sorely missing from PG, having had them forever in SQL server it was another surprise to see they don't exist in PG and there is even a confusing CLUSTER command to reorder the heap table but does not actually make a clustered index.

Clustered index are great for storage space and IO for common write and retrieval patterns. If your table is accessed in one way most of the time and that way needs to be fast and therefore requires an index a clustered index saves write IO (only writing the table no secondary index) disk space (no secondary index redundantly storing the indexed columns) and retrieval time (no indirection when querying the clustered index).

This is great for certain kinds of tables, for instance log tables that need to written to quickly but queried quickly (usually by time range) and grow quite large and are append only. GIS tables which can also get quite large and are by points can be packed really tight and row found quickly. Entity tables that are mostly retrieved via primary key ID during app usage, you trade a slight performance penalty when using secondary index for maximum performance when retrieving by ID which effect everything in including foreign key checks this again saves space on writing the ID twice to disk. Tables that actually are an index for something else such that the table is kept up to date with a trigger etc and usually exist for specific access pattern and performance.

castorp · on Feb 25, 2022

I have been working with Oracle for more than 20 years now. I think I only had very few situations where an "index organized table" (=clustered index) was useful or actually provided a major performance benefit over a "heap table". So I never really miss them in Postgres.

SigmundA · on Feb 25, 2022

It is such a common access pattern that many database engines always have a clustered index (MySql - InnoDB, Sqlite) whether you use them directly or not.

I like having a choice as there is in Sql Server or Oracle, but for many use cases its a waste to write to a heap and to an index (which is just a hidden IOT) then look up in the index and dereference to the heap both in space and time.

electroly · on Feb 25, 2022

> Well, you can’t do that with an equals filter. But how often do you use non-equals filters like > or < on the primary key?

You can; he's wrong. He missed the vastly more common scenario: your primary key is a composite key and you're filtering on a prefix of that composite key. We do that all the time. No < or > required; you get a range of keys with an equijoin. It's additionally common for the first key in the composite key to be some kind of timestamp or monotonically increasing value, so new writes always go together at the end of the table. This sort of key is absolutely begging to be used with a clustered primary key index.

We have a table with 100+ billion rows that uses this pattern. It uses partition switching to append new data. At this size it is absolutely imperative to reduce the number of indexes to keep insert performance high, and we are always querying multiple consecutive rows based on a prefix equijoin. I'd be a fool to follow the author's advice here.

I suspect the author and I live in very different worlds, and he doesn't know my world exists.

darkr · on Feb 25, 2022

> Replication works across version differences, making upgrades without any downtime at all easier.

Postgres has logical replication these days, which can be used to achieve this

funny_falcon · on Feb 25, 2022

Except logical replication doesn't handle DDL yet. And 2 phase commit (prepared transactions) as well.

aeyes · on Feb 25, 2022

Two phase commit for logical replication is now available in Postgres 14.

The biggest issue I see at the moment is that sequences are not replicated.

jamesfinlayson · on Feb 25, 2022

I haven't used the features but I thought replication and failover were much easier (out of the box at least).

otterley · on Feb 25, 2022

You also don’t usually need connection pooling with MySQL because it can handle thousands of simultaneous connections without falling over.

shrumm · on Feb 25, 2022

I know PG14 significantly improved this situation [0]. I'm not familiar enough with MySQL to know how it compares now though.

[0] https://pganalyze.com/blog/postgres-14-performance-monitorin...

jitl · on Feb 25, 2022

There is Vitess for MySQL, no such cluster manager for Postgres. Citus is now owned by Microsoft, and getting harder to use outside of Microsoft’s cloud. Eg, no more Citus on AWS