Hey folks—I wrote the post! This was my biggest Postgres project to date, and it...

wrs · on April 19, 2022

Wondering why you couldn’t use in-place upgrade (pg_upgrade [0]). Not supported in Azure?

[0] https://www.postgresql.org/docs/current/pgupgrade.html

majewsky · on April 19, 2022

I'm guessing this would still take more than one hour since it needs to rewrite all (or most) of the 4TB?

aidos · on April 19, 2022

Not in my experience. You can use the —-link flag to get it to use hard links so it doesn’t need to move the data at all. Have been through this process myself a few times and it only took seconds on a 50-100GB db. I’m always a little surprised with how week it works.

wrs · on April 20, 2022

Exactly, it just has to touch the metadata, unless the data page format has changed, which the developers try hard to avoid (successfully so far).

palijer · on April 19, 2022

I'm curious why a test run on a proper sized replica database wasn't in the testing plans. That is something I've been ensuring happens for a while now for similar projects.

davidshepherd7 · on April 20, 2022

I'm probably missing something, but it sounds like using Warp has a bunch of downsides vs "just" creating a read only replica using logical replication and then failing over. Did you choose Warp only because of Azure's limitations or were there other reasons?

somtum · on April 20, 2022

Great post! Did you migrate your old database to Azure Flexible Server, Hyperscale (Citus) or a standalone VM, as Postgres 13 does not seem to be available for Azure Postgres Single Server.

tianzhou · on April 20, 2022

Good write-up. I am wondering what's the process of performing regular schema change (e.g. ALTER TABLE) on a 4TB instance looks like?

aoms · on April 19, 2022

This instance must cost you a ton

mrbabbage · on April 19, 2022

More than I care to admit!

As mentioned downthread, we're doing some work soon to remove the audit table from the database, which will cut storage usage by over 50%.

loopdoend · on April 19, 2022

Let me save you a bunch of money: https://adamj.eu/tech/2021/04/13/reindexing-all-tables-after...

karmelapple · on April 19, 2022

This technique saved us from seriously increasing the cost of our Heroku Postgres instance. Thank goodness it exists and works so well. Multiple 80+ GB indexes shrinks down to less than 10GB after just a couple of hours.

tmikaeld · on April 19, 2022

Seems to be around 1784$/monthly according to the pricing calculator

clessg · on April 19, 2022

> we're doing some work soon to remove the audit table from the database

Out of pure curiosity, what are you replacing it with (if anything)? Just a simple rotating log file?

mrbabbage · on April 19, 2022

Exact strategy to be determined—we're looking at various data layers at the moment. I wish we could do something simple like a rotating log file, but we want to be able to query it in the app (for instance, to show recent logins).

jamesrr39 · on April 19, 2022

Have you considered an OLAP database like Clickhouse or QuestDB? An OLAP database would be a much better fit for audit tables given the read and append-only writing requirements, would compress better, and you might be able to fit it directly without changing any app code with a Postgresql foreign data wrapper.

lillecarl · on April 19, 2022

I was thinking TimescaleDB

clessg · on April 19, 2022

Very interesting! Looking forward to the next blog post on that ;)

icheishvili · on April 19, 2022

You can partition your audit/event table by time period and archive old events [1] or you can avoid the records hitting the database in the first place by generating the events elsewhere to begin with [2].

[1] https://github.com/icheishvili/audit-trigger [2] https://github.com/eulerto/wal2json

Teletio · on April 19, 2022

What's your strategy?

Using compression? Second instance for old data? Creating in-between states?

Just curious :)

abrkn · on April 19, 2022

Great read! Thanks

tmikaeld · on April 19, 2022

How come you didn't consider a distributed database like CochroachDB, instead of Postgres?

nrmitchi · on April 19, 2022

I'm not OP, but they were upgrading from Postgres 9.6, which at least implies that this initial db was from ~2017.

This is barely past the initial release of Cockroach. It would have been kind of crazy for the Retool team to use an experimental db with a lack of history when building up their product (that was not dependent on experimental new features)

tmikaeld · on April 19, 2022

Baidu migrated from an old MySQL cluster to CockroachDB for their 40TBs of databases, now runs on 20 clusters.

When does it stop being experimental to you?

https://www.cockroachlabs.com/customers/

flyinknockin · on April 19, 2022

I don't think they're calling Cockroach experimental right now - rather that it was experimental in 2017.

tmikaeld · on April 20, 2022

Yeah, but they didn’t upgrade in 2017, they upgraded now, it’s in the first sentence. They had ample opportunity to migrate