The Internals of PostgreSQL

brasetvik · 2025-03-03T14:43:44 1741013024

Also recommend this free book: https://postgrespro.com/community/books/internals

voctor · 2025-03-03T16:07:40 1741018060

Not related at all but the elephant illustration remind me of the Grand Elephant https://www.lesmachines-nantes.fr/en/discover/the-grand-elep...

fforflo · 2025-03-03T16:39:54 1741019994

Apart from the usual advice to "start working on a patch to learn the internals," which is not as easy to do because it's a mature product and most of the low-hanging fruits have been reaped, I'd also suggest checking the extensions shipped with the main source tree under $top/contrib. Each one is self-contained and usually works with a subset of the internals. Some good ones: pageinspect, pg_buffercache, pg_prewarm, pg_statstatements, pg_walinspect

HackerThemAll · 2025-03-03T14:41:53 1741012913

This is an old, but great and regularly updated, source of information about this great RDBMS. A lot of knowledge is accumulated there.

codepathfinder · 2025-03-03T14:43:45 1741013025

Thanks for sharing. I have been digging deeper around database internals and looks amazing.

topherjaynes · 2025-03-03T15:26:13 1741015573

Check out the CMU history of the DB class, Andy Pavlo is an amazing teacher and you really get a feel for DBs https://www.youtube.com/watch?v=LWS8LEQAUVc&ab_channel=CMUDa... same class from 5 years ago is great because he's like well I can't get in my hotel room so I'll just record from the street https://www.youtube.com/watch?v=SdW5RKUboKc&t=797s&ab_channe.... Tons of good videos to go through on their youtube channel.

codepathfinder · 2025-03-04T02:29:25 1741055365

Thanks a ton!

nimish · 2025-03-03T15:38:07 1741016287

The worst part of Postgres (and linux) is the awful mailing list driven development process. I have patches to Postgres that I'd rather just keep porting up every version than to try to figure out how to make Outlook work with a 1980s process.

The internals themselves are quite nicely engineered as you'd expect

immibis · 2025-03-03T16:02:24 1741017744

You are unable to send a message with an attachment? Or what's the problem specifically?

nimish · 2025-03-03T16:18:33 1741018713

I don't want to waste my time with a process that has zero benefits to me

immibis · 2025-03-03T21:01:30 1741035690

So you'd have the same complaint about a Github pull request?

nimish · 2025-03-04T19:09:33 1741115373

No, a GitHub PR has benefits that far exceed a mailing list post, to me

felideon · 2025-03-03T15:54:43 1741017283

> how to make Outlook work

Well that’s your first problem. Unless you’re using corporate email?

nimish · 2025-03-03T16:19:02 1741018742

Correct. It's corporate email. Imagine the amount of contributions that are gated behind stupid email policies because some fossil mailing list client can't handle modern email in the year of our lord 2025

pritambaral · 2025-03-03T16:57:39 1741021059

> Outlook

> modern email

That can't be true. Now, I haven't used Outlook myself, but I've received enough garbage from that godforsaken relic of a time when MS wanted to EEE email to know that that can't be true.

FWIW, I've posted patches to the PG mailing list just fine with GMail. The web app, that is.

nimish · 2025-03-03T20:57:10 1741035430

OK. Every major corporation uses Outlook and there is no way I can change that so by not accepting its foibles you are artificially stopping interested contributors who would do it on company time and hardware

It's a very FOSS pathology.

majewsky · 2025-03-03T22:29:50 1741040990

If by "pathology" you mean "we optimize for the comfort of the people already actively engaged in the project instead of hypothetical future contributors", I don't think this argument is as strong as you think it is.

If anything, this kind of excessive legal and process red tape is much more common in enterprise-driven vs. community-driven open source projects.

nimish · 2025-03-05T14:46:55 1741186015

No that's not what I mean, don't put words in my mouth.

> excessive legal and process red tape

I just wanna hit "Send" in my standard email client, not set up pine and debug a mail spooler

fforflo · 2025-03-03T16:35:07 1741019707

Parent probably means that email clients don't send the attachments (patches) with the appropriate content type (i.e. text/x-patch or text/x-diff) but use application/octet-stream.

tracker1 · 2025-03-03T19:49:25 1741031365

OT: Ug the text contrast is really hard to read here.

rrr_oh_man · 2025-03-03T15:25:55 1741015555

> Transaction logs are an essential part of databases because they ensure that no data is lost even when a system failure occurs. They are a history log of all changes and actions in a database system. This ensures that no data is lost due to failures, such as a power failure or a server crash. The log contains sufficient information about each transaction that has already been executed, so the database server can recover the database cluster by replaying the changes and actions in the transaction log in the event of a server crash.

The writing is horrendous

tasuki · 2025-03-03T15:31:34 1741015894

The author is Japanese, cut them some slack.

rrr_oh_man · 2025-03-06T09:56:16 1741254976

Thanks for pointing this out. I probably should have, being a non-native speaker myself.

Though, the core of my issue with this is not the language itself, but how the paragraphs are structured. Imho you might cut 50% of words and not lose any information.

See here: https://news.ycombinator.com/item?id=43254469

xnickb · 2025-03-03T15:33:22 1741016002

Or better yet, offer help

rrr_oh_man · 2025-03-06T09:59:05 1741255145

In what way?

First thought: If a sports team is not playing well, do you mail in coaching advice?

I elaborated a bit here, though: https://news.ycombinator.com/item?id=43254469

datadrivenangel · 2025-03-03T15:54:22 1741017262

It's a little clipped, but very straightforward and easy to follow. Not horrendous.

skotobaza · 2025-03-03T16:34:01 1741019641

How would you rephrase it? I'm also a non-native English speaker, and the bit you provided sounds very easy to understand because of how it's phrased. There are minor redundancies, but that's all I noticed...

rrr_oh_man · 2025-03-04T13:55:30 1741096530

> How would you rephrase it?

https://news.ycombinator.com/item?id=43254469

> I'm also a non-native English speaker, and the bit you provided sounds very easy to understand because of how it's phrased.

Thanks for your point. It felt to me like reading overly verbose code. Not wrong, but not effective.

skotobaza · 2025-03-05T18:41:41 1741200101

Thanks! Your edit is good, more concise in the first half, and even easier to understand in the second half, compared to original.

paulddraper · 2025-03-03T15:39:04 1741016344

What makes the writing horrendous?

rrr_oh_man · 2025-03-04T13:44:14 1741095854

The needless repetition. Makes it really hard to read imho. It's note a language issue but one of concise thought. I'd argue you can reduce the amount of text by 50% while retaining almost all of the information.

e.g., instead of:

Transaction logs are an essential part of databases because they ensure that no data is lost even when a system failure occurs. They are a history log of all changes and actions in a database system. This ensures that no data is lost due to failures, such as a power failure or a server crash. The log contains sufficient information about each transaction that has already been executed, so the database server can recover the database cluster by replaying the changes and actions in the transaction log in the event of a server crash.

something like:

Transaction logs keep a record of all changes in a database. This allows data recovery in case of system failures like power outages or server crashes. They store enough information to replay transactions. This can restore the database to its last consistent state.

shxx · 2025-03-03T16:11:38 1741018298

If rrr_oh_man's profile is real (https://bit.ly/48d9o9P), his resume appears inconsistent and unprofessional. The emptier a person is, the more likely they are to criticize others.

rrr_oh_man · 2025-03-04T13:45:38 1741095938

Ouch. Don't take it personally. :)

jodrellblank · 2025-03-03T16:57:25 1741021045

I won't comment on the OP using 'horrendous' as an adjective, but that quote seems misleading and unclear to me. Transactions (not transaction logs) are about maintaining [relational] database consistency, not directly about ensuring no data loss. A database engine would rather roll back a partially applied transaction and lose the data in it, than update one table and not update another, breaking the relation between them, and becoming inconsistent/corrupt.

Transaction logs then, are a separate record of these transactions into a log file. The quote says they are 'essential' but three paragraphs later in the link[0] is this: "The [log] mechanism was first implemented in version 7.1"; transaction logs are not essential for RDBMSs and are not essential for ensuring against data loss, they are one part of one larger design for reducing the chance of data loss. (Having a separate log file does not ensure that no data is lost in the event of a crash because the log file could also be lost by the same crash).

"so the database server can recover the database cluster" - the quote has switched from transaction logs as essential, to clustering-with-transaction-logs as essential, without clearly calling that out; there are clustering designs which don't need transaction logs so that isn't essential, and from the PostgreSQL documentation[1]: "It should be noted that log shipping is asynchronous, i.e., the [logs] are shipped after transaction commit. As a result, there is a window for data loss...". Neither transaction logs, nor log shipping clustering, are enough to ensure no data loss. It is possible to improve on that with synchronous_commit[2] (off by default in PostgreSQL) but that can still be designed badly to allow data loss to happen.

The quote is misleading about what transaction logs are for, what situations they can/cannot help with, and what other concerns are involved in guarding against data loss. Given that data loss is a particularly important concern in relational databases compared to some other software and systems, and that there are so many details which need to be considered to reduce risks of data loss, the quote being at all vague or misleading seems worse than it would be if writing about other systems, and worse than it would be if written in the context of 'a high level overview of databases'.

[0] http://www.interdb.jp/pg/pgsql09.html

[1] https://www.postgresql.org/docs/17/warm-standby.html

[2] https://www.postgresql.org/docs/17/warm-standby.html#SYNCHRO...

paulddraper · 2025-03-03T19:42:13 1741030933

> Transactions (not transaction logs) are about maintaining [relational] database consistency, not directly about ensuring no data loss.

That seems entirely consistent with that quote.

Also, tangential.

jodrellblank · 2025-03-05T02:23:42 1741141422

"Transaction logs ensure no data loss" and "transaction logs do not ensure no data loss" cannot be entirely consistent with each other, they are opposites. Either they do ensure, or they don't. The quote says they do. They actually don't. That's not tangential that's fundamental.

shxx · 2025-03-04T00:14:18 1741047258

Has your need for approval been met?

rrr_oh_man · 2025-03-04T13:50:45 1741096245

Responding to criticism of one's work with ad hominems is something I'd rather have expected on Twitter...

Why so angry?

jodrellblank · 2025-03-05T15:29:13 1741188553

No, nobody approved of my comment; since you're concerned aobut that, can you upvote it please?

gpvos · 2025-03-03T15:54:41 1741017281

A bit on the wordy side, but better than the average documentation.