I didn't know about it, but I don't find those slides compelling, not at all. Fo...

ahachete · on April 8, 2016

Sure, there's CASCADE, but as you pointed out it has several drawbacks.

That's why ideally natural keys should be immutable. That has many advantages over surrogate keys.

There's a follow up post (posts) to this presentation: http://www.databasesoup.com/2015/03/primary-keyvil-reprised.... where many problems caused by the abuse of surrogate keys are illustrated.

But the main point is simple: surrogate keys (abuse) is an easy way to data corruption. Not in the sense of durability as in ACID, but in the sense of humans (manually or by bugs in the software) duplicating information that is not enforced to be unique thanks to using non-sense surrogate keys as the uniqueness criteria.

Tenhundfeld · on April 8, 2016

If a combination of data should always be unique in a table, you should put a unique constraint/index on those columns to ensure that data integrity. I think that's mostly unrelated to the surrogate key vs natural key debate.

Your point about immutable keys is important, I think. A "natural key" can mean different things, as I understand the term. You could call a person's social security number a natural key, which for the record is a bad idea as they can change . But you could also call the combination of user_id and reservation_id a composite natural key, which is immutable and not so bad in my book. I'd still use a surrogate key though.

It might seem like I care about this more than I actually do. I've just found over the years of writing systems that I fairly often regretted not having a surrogate key on a table, but I've never once regretted adding a surrogate key that ended up not being essential.

What I've seen happen is that the data model expands/changes and I need to reference a join table. I might just start out referencing it by the composite natural key. Then the model changes again, and now I need to add another piece of data that changes the natural key. For example, maybe the natural key was (user_id, reservation_id), but now it's (user_id, reservation_id, membership_type_id). Now I need to think about anywhere I referenced that join table by the natural key.

ahachete · on April 8, 2016

Unfortunately, UNIQUE is not enough to prevent the kind of errors I am talking about. They are only easily prevented if the PRIMARY KEY is a natural key.

I have seen many cases where concepts that are UNIQUE NOT NULL for a given business case are ignored as natural keys and surrogate keys are used instead. This results in "duplicated" records (from a business logic perspective) that often end up being manually cleaned, resulting in costs in the orders of 10s of thousands. Not funny.

The idea of a primary key is that you are guaranteed 0 or 1 results, but in the absence of UNIQUE and NOT NULL you may get more than one. That's a powerful abstraction. If your business logic determine that a given attribute or combination of attributes satisfies that condition, that deserves to be the row key.

If you rather use a surrogate key:

a) either you also have to enforce UNIQUE and specially NOT NULL on those attributes (apart from the PK) which is definitely not free lunch (extra indexes, more expensive joins, not benefiting from index-only scans)

b) or often someone forgets either the UNIQUE or NOT NULL on the attribute(s) that have to be constrained as such and data integrity is compromised.

Both cases are bad enough. What's the justification to systematically use surrogate keys then?

koide · on April 10, 2016

a) Premature optimization b) Implementation mistake

Using natural keys may offer better performance in certain cases, but does not protect you from b, as you might select a wrong key, or even forget about putting a PK on the table, for example.

Surrogates help you manage the ever changing and interconnected world: the immutable name of something that has been shared between systems is no longer immutable? This type of scenario is also very common and can cost thousands as well.

Which type of problem is more likely in your application? That's up to you to decide.