More

imbradn · 2025-12-12T21:44:58 1765575898

It was carried forward from days before checksums were a thing for the main data files. Enabling them on existing clusters was not added until PG 14 (in core - there was a utility prior that that could do it, but not much before). So before this there was no real way to enable them on existing clusters short of using logical replication (which as a ton of caveats and issues).

Even today, enabling as in the blog is not officially supported as far as I know (but it does work - I have done it in the past across tens of thousands of DB clusters) and is far from simple.

imbradn · 2025-05-27T18:47:10 1748371630

Logical replication row filtering is a much better alternative to the ship and drop approach.

levkk · 2025-05-27T19:00:51 1748372451

Interesting. I've been thinking about it as well. I wrote some code to prototype it here: https://github.com/pgdogdev/pgdog/blob/main/pgdog/src/backen...

imbradn · 2025-05-28T00:16:23 1748391383

We used it to shard a database at Instacart after your time instead of the “replicate everything and drop the data on the shard approach”. That combined with dropping all the indexes (minus the primary key) worked well for the data copy/initial sharding.

levkk · 2025-05-28T04:46:35 1748407595

That's good to know. Don't hesitate to reach out if you ever want to chat about this stuff. My email: lev@pgdog.dev

imbradn · on Oct 19, 2024

Often lost in these discussions is how much more difficult upgrading is at scale. The article talks about the challenges with upgrading a 4TB database. In my world - that’s a small database.

Trying to setup a logical replica of a much larger, high write volume database is an entirely different story with its own set of challenges. In some cases it’s not even possible to do even with tricks like dropping and restoring indexes.

Logical still struggles to keep up with high write loads. When something like vacuum freezes kicks off it’s not uncommon to see logical replication lag for significant periods.

Then there are things like lack of DDL replication. While this can be mostly worked around - it adds complexity. And remember DB user management is DDL - so if you change a DB password on the primary it won’t replicate to the logical replica.

When you have CDC systems using logical decoding from the systems you are upgrading you have to deal with resync conditions related to the fact that the logical replication slot will not be replicated and the new replica will lose its place in the replication stream.

Most non-trivial instances have multiple streaming replicas for read offloads which need to be coordinated at cutover. While not difficult it increases the complexity.

In addition - there are no guarantees of application compatibility. While this is rarely an issue in my experience- PG 14 (IIRC) changed the function signatures of a bunch of array functions which was particularly painful.

That said - Postgres is improving the story here. PG 17’s ability to convert a streaming replica to a logical one and be able to upgrade it will be a game changer for standing up the logical replica. If they can get DDL replication into logical it will improve things even more.

That said - it’s still way behind how modern databases like CockroachDB and Yugabyte handle things here.

imbradn · on Sept 26, 2024

No, vacuum issues are not solved. This will reduce the amount of scanning needed in many cases when vacuuming indexes. It will mean more efficient vacuum and quicker vacuums which will help in a lot of cases.

imbradn · on June 12, 2024

It's based on Patroni - which they wrote and maintain and is the most popular HA Failover software for Postgres.

imbradn · on April 15, 2024

The acquisition of Oriole is very interesting. Getting that to a GA product would be a game changer for a lot of workloads.

tristan957 · on April 15, 2024

I'm excited for more OrioleDB patches to land in upstream. Sounds exciting.

imbradn · on March 18, 2024

Except for the un-implemented features which they might need.

It also uses serializable isolation and in their implementation reads are blocked by writes unlike in Postgres. Those are both significant changes that can have far reaching application impacts

imbradn · on March 18, 2024

IOPS isn’t a linear thing here. The vacuums needed to prevent transaction wraparound (vaccum Freeze)can’t be throttled and are much more expensive than regular vacuums. By splitting the tables they are likely reducing the need for those vacuums (by a large margin) and significantly reducing IOPS needs.

imbradn · on Dec 13, 2023

That article covers the basis of how we do upgrades at Instacart, but is quite old. This is a more modern look at how we accomplish the process. We have used this process to upgrade a lot of very large and very active databases successfully.

https://www.instacart.com/company/how-its-made/zero-downtime...

imbradn · on Oct 4, 2023

But in those days, in addition to the license prices for closed source DBs, you also had to pay all the hardware costs to run them on, and hire an army of DBAs to support and maintain them for you.

Running HA Postgres is not easy and requires a lot of niche/specialized knowledge. But running one of the big commercial DBs back in the pre-cloud days required just as much specialized knowledge at the DB level. It required network engineers and Sysadmins. It was far from cheap to run at scale.