Dead tuples is a real and significant problem, not just because it has to skip t...

singron · 2025-07-11T19:10:04 1752261004

Unfortunately, updating the row also creates dead tuples. It's very tricky!

atombender · 2025-07-11T19:14:21 1752261261

It does, but because of how indexes work, I believe it won't be skewed by the presence of dead tuples (though the bloat can cause the live dat to be spread across a lot more blocks and therefore generate more I/O) as long as you run autoanalyze semi-regularly.

singron · 2025-07-11T21:10:18 1752268218

It depends on if you are getting Heap Only Tuples (HOT) updates or not. https://www.postgresql.org/docs/current/storage-hot.html

In this case, you might have enough dead tuples across your heap that you might get a lot of HOT updates. If you are processing in insertion order, you will also probably process in heap order, and you can actually get 0 HOT updates since the other tuples in the page aren't fully dead yet. You could try using a lower fillfactor to avoid this, but that's also bad for performance so it might not help.

atombender · 2025-07-11T21:39:06 1752269946

If you have a "done" column that you filter on using a partial index, then it would never use HOT updates anyway, since HOT requires that none of the modified columns have an index.

menthe · 2025-07-11T22:30:28 1752273028

False.

As of PG16, HOT updates are tolerated against summarizing indexes, such as BRIN.

https://www.postgresql.org/docs/16/storage-hot.html

Besides, you probably don't want "done" jobs in the same table as pending or retriable jobs - as you scale up, you likely want to archive them as it provides various operational advantages, at no cost.

atombender · 2025-07-11T22:34:30 1752273270

Not false. Nobody would ever use BRIN for this. I'm talking about regular indexes, which do prevent HOT.

If you read my earlier comment properly, you'll notice a "done" column is to avoid deleting columns on the hot path and avoid dead tuples messing up the planner. I agree that a table should not contain done jobs, but then you risk running into the dead tuple problem. Both approaches are a compromise.