In SQL Server you just... do the upgrade. You install the upgrade on your nodes ...

calpaterson · on April 19, 2022

PG is far behind SQL Server on ease of upgrade but the method described in this post is not the best practice right now, which I think is:

- physical restore to new cluster

- pg_upgrade the new cluster

- catch up on logical wal logs from old cluster

- failover to new cluster

- STONITH

I think the above was not open to them because of the limitations of their managed PG instance. I haven't used Azure but GCP managed SQL has loads of limitations. It seems very common and I think is a major (and undiscussed) drawback of these managed instance.

But the truth is that very few of the people who use PG want to hear that things are better in the MS SQL community for reasons of prejudice and as a result you're being downvoted unfairly for pointing out PGs relative backwardness here.

electroly · on April 20, 2022

I'm curious how this works for on-prem. Our SQL Server cluster is on-prem; we can't just spin up another cluster. An important aspect of the SQL Server upgrade process is it doesn't require any extra nodes. What did people do for pgsql upgrades before everyone moved to the cloud?

Here's a nice thing about PostgreSQL over SQL Server to satiate the fans: SQL Server is absurdly expensive to run in the cloud. I can't believe anyone uses RDS for SQL Server. Even in EC2 it's horrifically expensive. That's the main reason we have an on-prem cluster.

Symbiote · on April 20, 2022

I would use the existing nodes (just like the SQL Server upgrade, you have reduced redundancy during the upgrade).

Under 1 minute of downtime was acceptable for us, so:

  systemctl stop postgresql-11 && \
    pg_upgrade --link ... && \
    systemctl start postgresql-12

was more-or-less the process last time I did this. We have only 500GB of data, and I think pg_upgrade ran in 15 seconds or so.

If a minute of downtime isn't acceptable, then it presumably isn't acceptable in case of unexpected hardware failure either, and you'd be using one of the commercial multi-master extensions.

Symbiote · on April 19, 2022

They did not dump and restore.

They created a replica database running the new version, then switched over to it. Not too dissimilar to what you describe, although more work since they started out with only a single instance without replication support.

electroly · on April 19, 2022

They _ultimately_ didn't dump and restore, but it was the first thing they tried. It didn't work; it actually failed catastrophically for them. They describe this under the "Implementing logical replication" section. Their ultimate solution is what they tried after the dump-and-restore based DMS method failed and they took an unplanned outage due to yet more PostgreSQL-specific issues (unvacuumed tuples).

All of this is exactly what I'm talking about. This blog post describes kind of a nightmare process for something that is trivial in SQL Server. They actually needed a third party product from Citus just to successfully upgrade PostgreSQL! Stunning.

Beltalowda · on April 20, 2022

I don't think they "needed" the Citus tool, per se, it was just the easiest option. I don't know much about MS-SQL, but no doubt PostgreSQL has areas that can be improved, or even that outright suck.

The main barrier against adopting MS-SQL is just the pricing and that it's not open source. Another thing that PostgreSQL seems to do a lot better than MS-SQL is in the extensibility department, hence we have things like TimescaleDB, Citus, EdgeDB, and a whole lot more. I can't really find anything like that for MS-SQL, but perhaps I missed it?

electroly · on April 20, 2022

You're absolutely right. A serious part of managing SQL Server is keeping your costs down. RDS for SQL Server is so unbelievably expensive that I can't believe anyone uses it. I'm not aware of any meaningful extensions to MSSQL in the sense of TimescaleDB and friends either. I'll make the claim that we don't need Citus because everything they offer is built into MSSQL, and we don't need TimescaleDB because column stores are built in too, but if you did find some kind of deep extension you wanted to build, you can't do it. Simply not an option with MSSQL. You either build it outside of SQL Server, or you don't build it.

VincentEvans · on April 19, 2022

Postgres dump and restore tooling is very poor performance-wise , easily 10x slower compared to SQL Server. I love Postgres dearly and prefer to use it despite that, but I wish Postgres devs renewed their interest in improving neglected dump/restore tooling.

aidos · on April 19, 2022

It’s been a long time since I used SQL Server so I don’t know that upgrade process well (I’m willing to believe it’s smoother though, especially wrt to replication / failover).

Keep in mind that they’re upgrading from a database version that’s almost 6 years old. Postgres has improved a lot in the last 5 major versions since then.

Another thing here is that I’m pretty sure they could have just done the in-place upgrade and it would have been fine. I’ve run pg_upgrade myself for a bunch of major versions now and it’s easy and doesn’t require dumping / restoring anything. Maybe there’s something else going on that I’m missing though.

What setup are you running with sql server to have it automatically failover? Is it a multi master configuration or are the additional nodes just read replicas?

These days Postgres actually allows logical replication so your servers can be running different versions at the same time, which allows for much smoother upgrades (haven’t tried that myself yet, don’t quote me on it!)

electroly · on April 19, 2022

I believe pg_upgrade isn't guaranteed to always work; it's possible they might change the table storage such that it's unreadable in a new version, and pg_upgrade is documented to fail if so. However, I don't think it's ever happened. That may just be an abundance of caution in the documentation. I wonder why the author of this article didn't mention this possibility.

SQL Server is designed to run in a Windows Server Failover Cluster; the SQL Server-side feature is called "Always On availability groups" in an HA configuration. It's a single-master arrangement, you can either have a fully passive secondary (that's what we do) or read-only replicas. The WSFC handles managing quorum, that's what causes the automatic failover as soon as >50% of the nodes are running the new version.

aidos · on April 20, 2022

For what it's worth, it's worked for upgrading me this far (9.6 -> 13); though I'll be looking to go the logical replication route for the next round.

I suspect the way I'll be setting it up is much the same as what you describe in your WSFC configuration (with a little more manual wrangling, no doubt).

Teletio · on April 19, 2022

Amy real facts you can show besides your sentiment on blog posts?

electroly · on April 19, 2022

What facts are you looking for? I just described the steps from this document: https://docs.microsoft.com/en-us/sql/sql-server/failover-clu... -- specifically, the "Perform a rolling upgrade or update" section. There's nothing else to my post other than contrasting the SQL Server upgrade process to the one described in the article, and musing about my growing appreciation for SQL Server; I apologize if it seemed like it was going to be deeper than that.

EDIT: I realized you're looking for the other PostgreSQL blog posts. Here's an example of two recent HN posts about PostgreSQL issues that I pulled out of my comment history. Both of these blog posts exist because PostgreSQL doesn't have query hints. SQL Server has them; I've dealt with issues like these blog posts describe but they have trivial fixes in the SQL Server world. Nothing to write a blog post about. I don't have a link handy regarding PostgreSQL's txn id wraparound problem, but SQL Server doesn't have that problem, either.

- https://news.ycombinator.com/item?id=30296490

- https://news.ycombinator.com/item?id=29981737