Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How about this: if your service gets as big as github, then maybe consider doing odd things to eke out more performance or shard or whatever.

Otherwise: use FK's to maintain stronger data integrity.

As hesk mentions below, in Postgres, you can do all kinds of table ALTERing if needs be.



FK’s don’t just maintain “stronger” data integrity, they are the only way to maintain relational data integrity. Application code cannot maintain that relational integrity, period.

Any developer who thinks application code can enforce relational integrity is naive and does not understand relational database systems. It is impossible for any layer above the database itself to keep things from getting corrupt.


This is not remotely true. You can do this with pretty much any database that supports SERIALIZABLE isolation-level transactions correctly.

The point is that application code has bugs, and it's a lot easier to specify your constraints declaratively in a single place using a purpose-built DSL (SQL) than it is to enforce them procedurally every time you access the database.


if you are avoiding the use of foreign keys for the high and lofty goal of "performance" and 100% uptime while you run your fancy online migration utilities, SERIALIZABLE isolation is the last thing you would be using as it does what it says, serialiazes transactions and locks things like crazy. lots of waiting on locks, lots of deadlock potential. Throw in InnoDB's quirky implementations of isolation levels and you'd be in for a world of fun.


I'd argue that any decent application programming language can express these constraints better than SQL can.


How? FK constraints are trivially expressed in SQL. It is just a "REFERENCES table_blah ON DELETE DO BLAH ON UPDATE DO BLAH".


I agree with your general gist, but it's not impossible, you just, essentially, are writing the engine yourself.

I've seen a few applications that keep some data hot-loaded into a shared slice of memory and have tight controls over the altering and saving of that data - and that's sort of one of the easier ways to partially avoid a full reliance on FKs, if you want to use write-through caching then you're essentially accepting the cost of implementing data integrity checks in the application layer which can be done - if you're very very careful.


Unless you're working with immutable primary keys (i.e. the values can neither be updated nor deleted), it is actually not possible to reliably enforce a foreign key constraint outside the database.


Speaking in terms of theory, it absolutely is. RDBMSs aren't magic boxes, they stage data for insertion, validate it, execute that insertion - all while littering the WAL (or equivalent) with the steps necessary for the guarantee of that operation. You absolutely can write this logic into your application, at that point your application may basically be Postgres but it's possible.

If you attempt this and don't exhaust a small rainforest's worth of notepad paper and a few truckloads of dry erase markers then you're probably doing it wrong - but the problem is solvable.

There are some unsolvable problems in this realm (like holding to ACID while also ensuring that all valid communications from a client are properly processed) but you can accomplish anything your RDBMS does, you just need to be really really absurdly careful and know what you're doing at a theoretical level.


But, if you want to ensure the data is never visible in an inconsistent state, you either need to use db-level concurrency-related features like transactions or locks; or some kind of lock or other concurrency-control features at the app level while guaranteeing the db has no clients other than your app.

It seems difficult to wind up with better performance characteristics by doing this than using the higher-level abstractions the database is already supplying for this purpose, like constraints. I suppose it's possible with enough hours by enough experts, as you suggest. I doubt that's what github actually did though, they probably instead decided to write app logic that was resilient to visible data inconsistency. Which sounds difficult and dangerous to me, but github is a pretty reliable app, so I guess something worked.


Transactions are a pretty widely used RDBMS feature AFAIK.


Right, so are foreign key constraints.

Ensuring foreign key consistency without foreign key constraints is not just "use transactions", you have to be careful and intentional about how you are using them (and ensure all clients do). Why would you choose this over just using the foreign key constraint which takes care of it for you, using the same underlying technology?

Perhaps that's a clearer way to say what I was trying to say originally.


You can implement FKs with the same cost as FKs. Look up the referenced rows and add read locks. It's not cheap. FKs are by no means free.


There are many cases where relational integrity does NOT need to be enforced and in fact can happen in the application level.

One of the big reasons NoSQL got popular was because it broke the norms around strict data relationships.


NoSQL got popular because Google needed it for their billion-user planet-scale system, and then a million startups pretended they were Google. As soon as Google built their planet-scale NoSQL system, they immediately started replacing it with a planet-scale SQL system (Spanner)


And one of the reasons NoSQL didn't revolutionize anything is because people realized it's worthwhile to put on your seatbelt. It's either arrogant or naive to think that automated safety measures aren't going to catch any of one's mistakes.


You don't need a service as big as GitHub; you just need billions of rows.

I think you should enable FKs for test and dev, and possibly QA, but disable them in production.

Depending on your database, you'll get surprising interference in concurrent operations via other means than FKs, but FKs don't help.


"Just" need a billion rows.

Sure, some people get there, but it makes sense to do things the right way first, then figure out how to cut corners only when and if you need to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: