Sometimes you must `EXPLAIN ANALYZE` expensive queries in production, sadly. The...

Whitespace · on April 25, 2023

The article and the comments here don't make it clear why running it in production shouldn't be done. If slow_query is already running in production, why would running EXPLAIN ANALYZE slow_query be bad?

Is the overhead of running EXPLAIN ANALYZE so much worse than running slow_query itself?

jaggederest · on April 25, 2023

No, it's really not, and that's why I say it must sometimes be done. Certainly if you're running tens or hundreds of copies of the query per minute, one more won't hurt (much).

The real problem you run into is when the query in question is doing something pathologically bad - locking a bunch of tables and then grinding away for an hour, which effectively is a denial of service attack.

chasd00 · on April 25, 2023

one thing to consider is the person who needs to run explain analyze may not have any access whatsoever to the production database. Also, there may be no process in place to get someone to run it on prod on their behalf. Finally, if there is a DBA on production they may just say no.

lswainemoore · on April 25, 2023

One reason you might not want to run it in production is if it's not a read-only query.

williamdclt · on April 25, 2023

I don't _think_ the query planner takes "current load" into account.

If you have:

- Same resources (CPU/memory) - Same settings (eg work_mem, amongst others) - Same dataset - Same/similar statistics (gathered with ANALYZE or the autovacuum)

you should get the same results. If I'm wrong, please somebody correct me!

jaggederest · on April 25, 2023

You might be right, definitely if you run e.g. a fully cloned server on the same disk image and hardware it will behave the same, I'm thinking of times when memory pressure on the database from other queries caused caches to get evicted, things like that. It's not really the planner, it's the measured performance from lock contention and the data in memory etc.

samokhvalov · on April 25, 2023

What do you do if you need to check index ideas, or new table design?

jaggederest · on April 25, 2023

Generally indexes are cheap, if built concurrently, so I often build all the possible indexes (for relatively modest sized data types, load, and tables) and look at the planner statistics in production to validate which indexes are useful for the query load. That only works if you have a read-heavy usage pattern, for write-heavy usage patterns it can sometimes be better not to have an index at all (one of the things he alludes to in the article about missing foreign key indexes - indexes come with a write cost, so on a write-only table, foreign key indexes are a bad idea) but as with everything real timing data is the key.

For new table design in Serious Business Databases I do two or three step: offline replica with the same data and settings (but not 100% identical), usually in a transaction on a logical replica or main production (postgres supports DDL transactions, which is a big help, but if you alter a table in a transaction it can block tons of other transactions).

It's important to carefully construct your alter table / create table / create index statements to ensure that everything is concurrent and won't block, of course, and there are plenty of pitfalls there, but definitely doable.

glasshug · on April 28, 2023

Indexes can multiply your storage cost though.