Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey folks—I wrote the post! This was my biggest Postgres project to date, and it proved quite tricky since I didn't rehearse with a test database of the same size. I learned a bunch about Postgres, not least the incredibly powerful NOT VALID option for safely and quickly adding constraints.

Happy to stick around and answer any questions you have.



Wondering why you couldn’t use in-place upgrade (pg_upgrade [0]). Not supported in Azure?

[0] https://www.postgresql.org/docs/current/pgupgrade.html


I'm guessing this would still take more than one hour since it needs to rewrite all (or most) of the 4TB?


Not in my experience. You can use the —-link flag to get it to use hard links so it doesn’t need to move the data at all. Have been through this process myself a few times and it only took seconds on a 50-100GB db. I’m always a little surprised with how week it works.


Exactly, it just has to touch the metadata, unless the data page format has changed, which the developers try hard to avoid (successfully so far).


I'm curious why a test run on a proper sized replica database wasn't in the testing plans. That is something I've been ensuring happens for a while now for similar projects.


I'm probably missing something, but it sounds like using Warp has a bunch of downsides vs "just" creating a read only replica using logical replication and then failing over. Did you choose Warp only because of Azure's limitations or were there other reasons?


Great post! Did you migrate your old database to Azure Flexible Server, Hyperscale (Citus) or a standalone VM, as Postgres 13 does not seem to be available for Azure Postgres Single Server.


Good write-up. I am wondering what's the process of performing regular schema change (e.g. ALTER TABLE) on a 4TB instance looks like?


This instance must cost you a ton


More than I care to admit!

As mentioned downthread, we're doing some work soon to remove the audit table from the database, which will cut storage usage by over 50%.



This technique saved us from seriously increasing the cost of our Heroku Postgres instance. Thank goodness it exists and works so well. Multiple 80+ GB indexes shrinks down to less than 10GB after just a couple of hours.


Seems to be around 1784$/monthly according to the pricing calculator


> we're doing some work soon to remove the audit table from the database

Out of pure curiosity, what are you replacing it with (if anything)? Just a simple rotating log file?


Exact strategy to be determined—we're looking at various data layers at the moment. I wish we could do something simple like a rotating log file, but we want to be able to query it in the app (for instance, to show recent logins).


Have you considered an OLAP database like Clickhouse or QuestDB? An OLAP database would be a much better fit for audit tables given the read and append-only writing requirements, would compress better, and you might be able to fit it directly without changing any app code with a Postgresql foreign data wrapper.


I was thinking TimescaleDB


Very interesting! Looking forward to the next blog post on that ;)


You can partition your audit/event table by time period and archive old events [1] or you can avoid the records hitting the database in the first place by generating the events elsewhere to begin with [2].

[1] https://github.com/icheishvili/audit-trigger [2] https://github.com/eulerto/wal2json


What's your strategy?

Using compression? Second instance for old data? Creating in-between states?

Just curious :)


Great read! Thanks


How come you didn't consider a distributed database like CochroachDB, instead of Postgres?


I'm not OP, but they were upgrading from Postgres 9.6, which at least implies that this initial db was from ~2017.

This is barely past the initial release of Cockroach. It would have been kind of crazy for the Retool team to use an experimental db with a lack of history when building up their product (that was not dependent on experimental new features)


Baidu migrated from an old MySQL cluster to CockroachDB for their 40TBs of databases, now runs on 20 clusters.

When does it stop being experimental to you?

https://www.cockroachlabs.com/customers/


I don't think they're calling Cockroach experimental right now - rather that it was experimental in 2017.


Yeah, but they didn’t upgrade in 2017, they upgraded now, it’s in the first sentence. They had ample opportunity to migrate




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: