How about this: if your service gets as big as github, then maybe consider doing...

spookthesunset · on Nov 8, 2019

FK’s don’t just maintain “stronger” data integrity, they are the only way to maintain relational data integrity. Application code cannot maintain that relational integrity, period.

Any developer who thinks application code can enforce relational integrity is naive and does not understand relational database systems. It is impossible for any layer above the database itself to keep things from getting corrupt.

Diggsey · on Nov 8, 2019

This is not remotely true. You can do this with pretty much any database that supports SERIALIZABLE isolation-level transactions correctly.

The point is that application code has bugs, and it's a lot easier to specify your constraints declaratively in a single place using a purpose-built DSL (SQL) than it is to enforce them procedurally every time you access the database.

zzzeek · on Nov 9, 2019

if you are avoiding the use of foreign keys for the high and lofty goal of "performance" and 100% uptime while you run your fancy online migration utilities, SERIALIZABLE isolation is the last thing you would be using as it does what it says, serialiazes transactions and locks things like crazy. lots of waiting on locks, lots of deadlock potential. Throw in InnoDB's quirky implementations of isolation levels and you'd be in for a world of fun.

lmm · on Nov 8, 2019

I'd argue that any decent application programming language can express these constraints better than SQL can.

spookthesunset · on Nov 8, 2019

How? FK constraints are trivially expressed in SQL. It is just a "REFERENCES table_blah ON DELETE DO BLAH ON UPDATE DO BLAH".

munk-a · on Nov 8, 2019

I agree with your general gist, but it's not impossible, you just, essentially, are writing the engine yourself.

I've seen a few applications that keep some data hot-loaded into a shared slice of memory and have tight controls over the altering and saving of that data - and that's sort of one of the easier ways to partially avoid a full reliance on FKs, if you want to use write-through caching then you're essentially accepting the cost of implementing data integrity checks in the application layer which can be done - if you're very very careful.

mdavidn · on Nov 8, 2019

Unless you're working with immutable primary keys (i.e. the values can neither be updated nor deleted), it is actually not possible to reliably enforce a foreign key constraint outside the database.

munk-a · on Nov 8, 2019

Speaking in terms of theory, it absolutely is. RDBMSs aren't magic boxes, they stage data for insertion, validate it, execute that insertion - all while littering the WAL (or equivalent) with the steps necessary for the guarantee of that operation. You absolutely can write this logic into your application, at that point your application may basically be Postgres but it's possible.

If you attempt this and don't exhaust a small rainforest's worth of notepad paper and a few truckloads of dry erase markers then you're probably doing it wrong - but the problem is solvable.

There are some unsolvable problems in this realm (like holding to ACID while also ensuring that all valid communications from a client are properly processed) but you can accomplish anything your RDBMS does, you just need to be really really absurdly careful and know what you're doing at a theoretical level.

jrochkind1 · on Nov 8, 2019

But, if you want to ensure the data is never visible in an inconsistent state, you either need to use db-level concurrency-related features like transactions or locks; or some kind of lock or other concurrency-control features at the app level while guaranteeing the db has no clients other than your app.

It seems difficult to wind up with better performance characteristics by doing this than using the higher-level abstractions the database is already supplying for this purpose, like constraints. I suppose it's possible with enough hours by enough experts, as you suggest. I doubt that's what github actually did though, they probably instead decided to write app logic that was resilient to visible data inconsistency. Which sounds difficult and dangerous to me, but github is a pretty reliable app, so I guess something worked.

recursive · on Nov 9, 2019

Transactions are a pretty widely used RDBMS feature AFAIK.

jrochkind1 · on Nov 9, 2019

Right, so are foreign key constraints.

Ensuring foreign key consistency without foreign key constraints is not just "use transactions", you have to be careful and intentional about how you are using them (and ensure all clients do). Why would you choose this over just using the foreign key constraint which takes care of it for you, using the same underlying technology?

Perhaps that's a clearer way to say what I was trying to say originally.

barrkel · on Nov 8, 2019

You can implement FKs with the same cost as FKs. Look up the referenced rows and add read locks. It's not cheap. FKs are by no means free.

SkyPuncher · on Nov 8, 2019

There are many cases where relational integrity does NOT need to be enforced and in fact can happen in the application level.

One of the big reasons NoSQL got popular was because it broke the norms around strict data relationships.

papln · on Nov 8, 2019

NoSQL got popular because Google needed it for their billion-user planet-scale system, and then a million startups pretended they were Google. As soon as Google built their planet-scale NoSQL system, they immediately started replacing it with a planet-scale SQL system (Spanner)

smt88 · on Nov 8, 2019

And one of the reasons NoSQL didn't revolutionize anything is because people realized it's worthwhile to put on your seatbelt. It's either arrogant or naive to think that automated safety measures aren't going to catch any of one's mistakes.

barrkel · on Nov 8, 2019

You don't need a service as big as GitHub; you just need billions of rows.

I think you should enable FKs for test and dev, and possibly QA, but disable them in production.

Depending on your database, you'll get surprising interference in concurrent operations via other means than FKs, but FKs don't help.

davidw · on Nov 9, 2019

"Just" need a billion rows.

Sure, some people get there, but it makes sense to do things the right way first, then figure out how to cut corners only when and if you need to.