When posts like these come up, I'd like to remind people that context matters wh...

taurath · on Nov 8, 2019

In the same vein, I'd like to remind people that you are probably not a "temporarily low-scale big-data company", in the same vein as a temporarily embarrassed millionaire. In lots of cases going for the very long term scalable solution will be an impediment to your growth, and I'd suggest dealing with those issues when the chance that you need them is on the horizon, rather than across the globe.

CQRS is one of the biggest examples I've seen personally for this - your flexibility will go down, work to do "basic" things will go up, and you will really lose a lot of speed solving for 10 y/o company problems with technology rather than 1-5 y/o company problems in terms of product market fit.

whack · on Nov 8, 2019

> In lots of cases going for the very long term scalable solution will be an impediment to your growth, and I'd suggest dealing with those issues when the chance that you need them is on the horizon, rather than across the globe.

My favorite example is a bootstrapped startup that I co-founded. We had a couple thousand users and less than a gigabyte of data. The app was a run-of-the-mill CRUD app. I built the entire back-end API as a single service that could be built and deployed on heroku in minutes.

At some point, I left the project, and my co-founder hired an expensive consultant to review the system and provide feedback. One of his suggestions was to break up the entire service into a suite of microservices. And this was for a service that barely averaged a handful of active users at any one time. And a team of ~2 engineers working on the back end code. Microservices.

megablast · on Nov 9, 2019

> At some point, I left the project, and my co-founder hired an expensive consultant to review the system and provide feedback.

People get pretty unhappy if they hire a consultant and don't get some drastic change recommendations. "Everything is good" doesn't sit well when handing over cash.

Iv · on Nov 9, 2019

As a consultant I have never been hired by someone whose system was working. I am willing to bet the consultant was hired to add a feature, could not figure the installed code, and proposed to redo it the only way he or she was used to.

Xylakant · on Nov 9, 2019

I have been hired a few times to review working systems. Basically get an external set of eyes on specific things. And “yes, this all looks good, you might want to tweak a little here and keep an eye on that once you grow significantly.” is an entirely accepted outcome of such reviews.

adrianN · on Nov 9, 2019

I thought people hire consultants so that they can sell unpopular changes as recommendations from an external authority instead of letting the blame hit management directly.

blihp · on Nov 9, 2019

That's often more what you'd bring management consultants in for. A technology consultant is more likely just to tell you to dump whatever tech you are using in favor of whatever the flavor of the month is.

thaumasiotes · on Nov 9, 2019

> People get pretty unhappy if they hire a consultant and don't get some drastic change recommendations. "Everything is good" doesn't sit well when handing over cash.

I suggest that this is largely untrue when the consultant being hired is a security consultant.

There is demand out there for security consultants who will rubber-stamp your existing software.

corebit · on Nov 8, 2019

The biggest issue with CQRS I've seen is people thinking CQRS means you need multiple, duplicate data structures, mappers, a few Kafka topics and a PhD, when IN REALITY all it means is you put methods that return data without modifying it in one interface/class and methods that have side effects in another interface/class - which is really just a good application of interface segregation.

Moreover, you now have a great rule of thumb for your brand new junior developers: "If you're handling a GET request, use this class, and if not use this other class" making code reviews easier to do and helping to enforce better structure in your code. Plus tons of other benefits.

The rest of that stuff that isn't CQRS you might eventually need if you scale, but is easy to add once you have the bones in place.

Quekid5 · on Nov 8, 2019

Like you, I've found the general idea of CQRS/ES[0] incredibly valuable because it both enforces a functional approach to state (e.g. 'current state' is a fold over events), and it forces people to really think about the domain. (E.g. which bits are really atomic units, or what can be fixed up if it goes wrong in some way.)

It also forces people to think about something that's usually glossed over: Consistency. If you have an RDBMS backing you, you tend to not think about the fact that what the user on a web page sees is already out of date when they see it, so any responsible application should track when that data was read and reject updates if the backing data has changed since it was read... but very few applications even attempt to do this (it's really hard and a huge amount of boilerplate with most APIs). With CQRS/ES you are really forced to think about these things up front -- and you can actually avoid most of the issues. Whether that increases or decreases 'productivity' I don't know, but I do know that thinking about these things increases correctness.

For me, correctness is paramount. If you don't mind bugs, I can give you an infinitely fast solution.

[0] I do think ES is an integral component.

taurath · on Nov 9, 2019

I’ll tell you this - it decreases it, especially for small teams. You have to do a lot of work to make your query engine actually good and reliable, and reporting is also a huge bear to deal with. You have to cache somewhere to get your queries anywhere near real-time, and it takes a lot of reinvention.

Quekid5 · on Nov 9, 2019

I don't understand this response. Most of our use cases have always done fine with a simple JSON document store -- no reinvention or anything. Most of our Queries are just a simple PostgreSQL table with an Aggregate ID and a JSON column.

Remember: For smallish applications you can still have your application itself be a monolith, so no need for complicated setups with message queues (Kafka, whatever), etc. etc. In this case you can basically rely on near-instant communication from the Command side of things to the Query side of things. You can also have the frontend (Web) subscribe to updates.

(We have our own in-house library to do all of this so YMMV. I'm not sure what the commodity CQRS/ES libraries/frameworks are doing currently.)

trhway · on Nov 9, 2019

> any responsible application should track when that data was read and reject updates if the backing data has changed since it was read... but very few applications even attempt to do this (it's really hard and a huge amount of boilerplate with most APIs).

it is done very simple by versioning. Usually implemented transparently for the data layer API clients.

Quekid5 · on Nov 19, 2019

It's simple. I'm just saying that (for most frameworks) it's a lot of work and boilerplate. It's also easy to miss individual cases during code review, etc.

lensopra · on Nov 19, 2019

You kind of miss the part where he says "Usually implemented transparently for the data layer API clients." Transparently as in no work, no boilerplate.

I guess your mileage might vary, but Java has JPA/Hibernate, and .NET has Entity Framework, and they both make it easy, so I'm going to be surprised if any major framework or language doesn't make this easy.

The concept is also called optimistic locking, if that makes Googling it easier. Using that term, I easily found that Node.js's Sequelize supports it too https://sequelize.org/v5/manual/models-definition.html#optim...

Lord_Zero · on Nov 15, 2019

SQL Server has a built in ROWVERSION type which plays really nicely with EF Core. Made implementing concurrency checks easy.

madelyn · on Nov 8, 2019

Exactly, CQRS gets such a bad rep because of all the pieces that (typically) surround it. But really as just a way of organizing code, it's so handy.

The freedom and flexibility to create your read models and write models the way they are needed just completely sidesteps a lot of design issues that creep up. Plus, it makes the code so easy to follow when everything follows the pattern; picking out where state changes happen becomes trivial for example.

taurath · on Nov 8, 2019

To be clear, I'm referring to the (in my circles) usual parlance of not "pure" CQRS, which is as you describe and could entirely be in a OOP "Model" layer, but what usually goes with it - things like event sourcing for auditing, cached read services (eventual consistency), etc. Should've been more specific but was just pattering off the first big complexity thing that came into my mind.

lucisferre · on Nov 8, 2019

This seems to overstate the difficulty in implementing CQRS, and by that I assume you mean event sourcing. Kafka is definitely not needed, or even necessarily a good idea, nor is a PhD.

I'm not sure what you experience with CQRS has been but CQRS is not a solution for scale it is a solution for long-term maintenance costs of a growing codebase by enforcing the decoupling of services.

hinkley · on Nov 9, 2019

I like where you're going for that. Why don't we just call it 'temporarily embarrassed big data company'?

On one project where we seemed to handle engineering strategy well, one of the considerations we'd make is whether paying the tech debt would accompany an inflow of cash or not. If it didn't we'd better tackle it now so we don't bleed out.

It's probably another way to say "spend money to make money". Management is more willing to spend out of a problem when they've had a fresh fix.

The trick is, though, can you pay down that debt fast enough that the customers don't get upset while waiting. If someone's firing up a big ad push you'd better be prepared. And people, like me, who get caught flat-footed once (or have friends who were), don't want it to happen a second time. So they get opinions on 'last responsible moment' that might differ from the idealists.

taurath · on Nov 9, 2019

If your requests come in hard and fast then you’ll have a big problem with tech debt, but if you can schedule it properly you can do it as a part of adding functionality and people tend to be happy, because once the functionality is in it can be enhanced much more quickly.

duxup · on Nov 9, 2019

>In lots of cases going for the very long term scalable solution will be an impediment to your growth, and I'd suggest dealing with those issues when the chance that you need them is on the horizon, rather than across the globe.

My attitude is:

"Man if this takes off and I have to make some changes on how we do this....I should celebrate!"

opmac · on Nov 8, 2019

YAGNI - learn to embrace it.

Gibbon1 · on Nov 8, 2019

Thing I've tried and mostly failed to get across. There is usually plenty of money and resources to throw at products that are making money. Vs the amount for those that aren't.

As long as you're not painting yourself into a corner a lot of things can be fixed 'later'. Later when your not desperate to get the thing off the ground. Later when you can throw two engineers at it for six months. Later when the pain points are well understood.

resters · on Nov 8, 2019

> I'd like to remind people that you are probably not a "temporarily low-scale big-data company", in the same vein as a temporarily embarrassed millionaire.

This point is valid in the same way as telling a startup founder to give up the startup and invest in the S&P 500. It's trivially true, but ultimately useless advice.

Building technology is hard precisely because there are so many tradeoffs. Speed to market vs scalability is one of many. It's silly to pretend that there is some kind of rote obviousness to never caring about scale in the early stage, anymore than there is rote obviousness to just investing in the S&P.

Ex post, most startup founders/investors should have just invested in the S&P, and most early stage tech leads should have just used Wordpress or Rails.

taurath · on Nov 8, 2019

I didn't say never care about scale in the early stage. I've found it a more common problem that people in startups are building systems with too much complexity to solve problems they faced at previous employers, in order to "save themselves time" in the future that ends up never coming. There are certainly some technology companies for which scalability is the most important thing - Instagram being the biggest example I can think of. The point of me saying so was to remind people - despite what you may think, you are not an Instagram, and more than likely your business doesn't require you to be one.

Plenty of early stage tech leads DO use wordpress and rails because their company doesn't need something more. You just don't hear about them often because they won't be sexy enough for HN. YAGNI.

resters · on Nov 8, 2019

> YAGNI

OK so then does YAGNI apply to investment? Why not let existing firms handle it in the S&P?

> YAGNI (continued)

over-engineering is definitely a problem, but so is hindsight bias.

I think that most non-trivial tech builds involve some smart investments and some dumb ones. It's essentially a portfolio of decisions made in the face of uncertainty. YAGNI is cynicism and hindsight bias, and over-generalization.

I think this is less obvious than it should be because of the popularity of the highly cynical "agile" approaches that consider a project that does not plan beyond the next sprint as somehow not taking on code debt.

ticmasta · on Nov 8, 2019

>> OK so then does YAGNI apply to investment? Why not let existing firms handle it in the S&P?

Yes, don't rebuild your own balanced portfolio or pay someone to "beat the market" for you; just buy a COTS ETF, put it in the server closet and forget about it for a decade.

Udik · on Nov 8, 2019

> CQRS is one of the biggest examples I've seen personally for this: your flexibility will go down, work to do "basic" things will go up

The only time I've worked on a codebase that had implemented CQRS, it was when I was given to maintain a crud application to manage maybe 20 different records. Rarely used. The people who had developed it in .net mvc were of such technical prowess that, besides the CQRS pattern, the frontend still had a functional shopping cart (yes, you clicked on "save" and your "order" went to the shopping cart, relic of the mvc demo app).

Sometimes I wonder if I've just been unlucky or the world is all like this.

DaiPlusPlus · on Nov 8, 2019

As someone currently fighting a battle with an LoB application written without foreign-key constraints with hundreds of thousands of rows of corrupted data because of bugs in sprocs that assigned the wrong value to the wrong foreign key column because they were similarly named - THIS!

The reply in the GitHub thread we’re talking about makes it clear that they still perform FK validation - it’s just performed in the application code rather than the DBMS.

I note there is another alternative: deferred constraints - or just run a query to check for invalid rows at 3am every morning.

sturgill · on Nov 8, 2019

I’m pretty confident that GitHub doesn’t use foreign keys because it was built as a Rails app. And the “Rails Way” is to create these constraints in the model. Foreign key constraints weren’t a first-class member in Rails until v4 (if memory serves correctly).

I was once a full-time Rails dev and really loved the framework (I don’t write as many user facing applications these days). Most of the Omakase trade offs didn’t bother me. But I never understood the disdain for foreign keys. For 99.99% of web apps, you want them (dare I say, need them).

Even in Rails 3 I would add them by hand in the migration files. Very few applications will ever actually care about sharding. We were pulling in millions at the last Rails company I worked for and were just fine with a master and a single replica.

If you get to a point where sharding is best for your company you hopefully have enough revenue coming in to fund a data transition. Your goal should be to outgrow what MySQL (or Postgres) can do for you in master/replica mode. If you do, you’ll likely be independently wealthy...

If data integrity matters at all, model-based checks (or periodic queries for orphaned data) will not suffice. Just put a foreign key in the table where it belongs and let your DB do what it does best. ACID is an amazing thing...

evanelias · on Nov 8, 2019

> I’m pretty confident that GitHub doesn’t use foreign keys because it was built as a Rails app

Maybe originally, but lack of foreign key usage is certainly not Rails specific today. Large MySQL shops generally don't use foreign keys, full stop, for the exact reasons Shlomi described in the original comment.

Facebook does not use foreign keys either. In my experience, same thing is true at all the other large MySQL-based companies. And those companies make up a majority of the largest sites on the net btw -- including most of the consumer-facing social networking and user-generated content products/apps out there.

This does not mean that, for example, Facebook is full of data that would violate constraints. There are other asynchronous processes that detect and handle such problems.

> If you get to a point where sharding is best for your company you hopefully have enough revenue coming in to fund a data transition.

Do you mean, stay with your current RDBMS of choice and shard while also removing the FKs? Or do you mean transition to some other system? (and if so, what?)

The former path is bad: sharding alone is very painful even without introducing a ton of new application-level constraint logic at the same time.

The latter path is also bad: only very recent NewSQL DBs have sharding built-in, and you probably don't want to bet your existing rocket ship startup on your ability to transition to one under pressure. Especially given much higher latency profiles of those DBs, meaning a very thorough caching system is also required, and now you have all the fun of figuring out multi-region read-after-write cache consistency while also transitioning to an entirely new DB :)

sturgill · on Nov 9, 2019

“[S]harding alone is very painful even without introducing a ton of new application-level constraint logic at the same time.”

I typically espouse app logic to check state and use foreign keys to ensure enforcement (eg, race conditions that are very hard to ensure at the app level but are built into many RDBMS). Foreign key failures are just treated like any other failure mode.

But honestly, I haven’t been part of a company that have really hit those upper limits that require sharding. They exist, yes. But most companies will never need to worry about it. Which is my point.

evanelias · on Nov 9, 2019

I agree that most companies won't ever need to shard, and pre-sharding (or worrying about extreme scalability challenges in general) is usually unwise premature optimization. But it does depend on the product category.

Social networking and user-generated content products (including GitHub) need to be built with scale somewhat in mind: if the product reaches hockey-stick growth, a lot of scalability work will need to be completed very quickly or the viral opportunity is lost and the company will fail. I'm not saying they should pre-shard, but it does make sense to skip FKs with future sharding in mind.

This was nearly a decade ago, but in one case I helped shard my employer's DBs (and roll out the related application-level changes) with literally only a few hours to spare before the main unsharded DB's drives filled up. If we also had to deal with removing FKs at the same time, we definitely wouldn't have made it in time and the product literally would have gone read-only for days or weeks, probably killing the company. Granted, these situations are incredibly rare, but they really do happen!

hrktb · on Nov 8, 2019

It’s at its core a case by case decision, I think FK are also a net negative in data ingestion scenari where the data set is big enough.

Trying to make sure everything is where it needs to be at any given time, everything is inserted in the right order and the data is always consistent brings exponential amount of conplexity when it could all be checked at the end and pruned for invalid data. And usually DB integrity will not be enough, you’ll want business level validation that all is OK, so there will be app level checks anyway.

joevandyk · on Nov 9, 2019

PostgreSQL supports deferring constraints. https://begriffs.com/posts/2017-08-27-deferrable-sql-constra...

hrktb · on Nov 9, 2019

This looks nice, it’s still limit to a single commit though. That’s where it’s a PITA for anything that won’t (or we don’t want to) fit a single commit.

In particular splitting commits allows to ingest data in parralel (for instance if we import stores and store owners, both could be ingested separately without caring at first if each references a valid entity)

JoshuaDavid · on Nov 9, 2019

At least with mysql, and probably with postgres as well, you can temporarily turn off foreign key checks for a set of statements. So you can still get the benefits of foreign key constraints by default but when they do more harm than good you can turn them off. With the added benefit that turning off FK constraints screams "I am doing something unusual and dangerous - this requires extra caution."

sturgill · on Nov 9, 2019

App level data integrity is usually critical. Ingestion into a reporting database is an entirely different construct (where I agree FK constraints are burdensome).

matt4077 · on Nov 8, 2019

It's been a while, but IIRC at the time Rails got started MySQL actually did not even support foreign key constraints. Since that was the DBMS of choice, it wasn't much of choice.

evanelias · on Nov 8, 2019

InnoDB's foreign key support predates the existence of Rails by several years. However, InnoDB wasn't the default storage engine for MySQL at the time, so that may be a factor.

DaiPlusPlus · on Nov 9, 2019

I wonder how different the state of database application development would be today if all those cheap whitelabel webhosts powered by cPanel or Plesk (where most of us got started, I imagine) opted for PostgreSQL instead of MySQL - which would have influenced the major MySQL adopters like phpNuke, phpBB, WordPress, etc.

spookthesunset · on Nov 8, 2019

The application code isn’t enforcing FK constraints because that is impossible for the application to do. Their database is almost 100% guaranteed to be corrupt as a result.

Application code has bugs. Application code can fail in ways that result in corruption. A primary job of the database is to keep itself from getting corrupted. Enforcing foreign key violations is only something the database can do correctly. Punting the responsibility to a higher layer will result in corruption.

hrktb · on Nov 8, 2019

> A primary job of the database is to keep itself from getting corrupted

That would be where the road splits. If you handle the database as a very fast and structured storage application, a lot of these assumptions go away, with then a different set of tradeoffs.

Some data corruption could be fine if you can guarantee the critical cases, just as bugs in the code are fine as long as the useful cases are covered.

I remember a database with scheduling entries in it that could get duplicated depending on the sharding, but it didn’t matter because the app handled the case gracefully.

I think understanding the tradeoffs that match the best the use case is the most important point, always assuming that a DB has to guarantee integrity can be a burden preventing from looking at all the options.

GauntletWizard · on Nov 8, 2019

There are 100% foreign key violations in their database. That is not the same as their database being corrupt.

They have engineered for, and understand the implications of, foreign key violations. Typically, it's as simple as "This row can be deleted", and that can cascade - at a totally different rate than you'd find in a database and with totally different performance characteristics.

spookthesunset · on Nov 8, 2019

Foreign key violations are data corruption!!! It violates the rules of how the data relates and can and will screw up any number of things that depend on the rules being enforced.

Reporting and bi data might get hosed.

Account management might get hosed.

Who knows what happens when FK rules are violated because by definition they should never be violated. It puts all applications on top into a undefined state, leading to bugs and god knows what else.

Foreign key violations are 100% data corruption.

SkyPuncher · on Nov 8, 2019

> It violates the rules of how the data relates

You're missing an important point, it violates a certain set of rules of how the data relates. That certain set of rules typically makes it easier to write most general data applications.

If your rules define a foreign key as optional, you can easily engineer an application around it.

Great examples include NoSQL and Ruby duck-typing. In both cases, you infer actions based on the data structure, rather than making explicit assumptions.

Aeolun · on Nov 8, 2019

The whole fact that you made a foreign key implies that that is now a rule of your database. If anything is in that column that does not adhere to the foreign key, that means your database is corrupted.

lugg · on Nov 8, 2019

Yes, and what if you don't make a foreign key?

spookthesunset · on Nov 8, 2019

Then you have an implicit rule that is enforced by a hope and a prayer that some junior dev never makes a mistake, your senior engineers are clairvoyant and understand every single aspect of your systems 100% with zero off-days, your code review process catches every single possible edge case (especially the edge cases that you never knew existed), your QA process is 100% and never makes mistakes, your servers never crash in ways that leave things in an inconsistent state, etc.

Or you could, you know, simply add a foreign key constraint and never, ever, ever have corrupt table relations.

Why people fight their tools is beyond me. There is almost zero reason to defend not using foreign keys.

hootbootscoot · on Nov 9, 2019

or you never wanted it in the first place.

i suggest being open to the concept that other perfectly capable humans may, in fact, design systems with different underpinning assumptions than those you appear to presuppose always must apply to everyone and everything.

It's important to remove the DBA or Developer hat and realize that you are working together on a singular system (or maybe it's even the same person, etc.)

"corruption" implies that the desired results have not been achieved, that essential data has been lost or compromised, and this clearly is not the case with such a deliberate design decision. one should be prepared to accept this possibility in order to not be merely a fanatic and unreasonable.

jerzyt · on Nov 9, 2019

That's exactly my experience and viewpoint.

js4ever · on Nov 9, 2019

Mine is exactly the opposite, most use cases I'm exposed to do not require FK at all, so I simply never use them.

matt4077 · on Nov 8, 2019

> Foreign key violations are 100% data corruption

Sure, yes, of course, who cares?

These are self-imposed rules, so breaking them is wholly up to yourself. And there are trade-offs involved that may often make it acceptable relax them.

You seem to be rather invested in one set of arbitrary definitions. If people do fine even in situations where they "by definition" shouldn't, it's the definitions that have been found lacking, not the people.

dragonwriter · on Nov 9, 2019

> Sure, yes, of course, who cares?

The stakeholder from whom you got the requirement on which the logical FK constraint is based, for one.

hootbootscoot · on Nov 9, 2019

which may not exist.

dragonwriter · on Nov 10, 2019

Why would you identify an FK relationship other than requirements?

lugg · on Nov 8, 2019

This assumes things depend on the rule being enforced.

Nobody suggests taking an app / db with foreign keys and removing them that is a recipe for disaster.

GauntletWizard · on Nov 8, 2019

You can keep saying that, but it does not make it true. Properly modeled data does not need those constraints. Well written software handles these correctly. Mediocrely written software fails and complains loudly. Badly written software might get hosed. You're at just as much risk (or more, in my opinion) of that with bad schema changes as with not having foreign key constraints.

spookthesunset · on Nov 8, 2019

Do you not check input from your javascript front-end before you save it? Even if that front-end does its own validation? (No, you see, but there is only one web front-end.... we don't need the backend to validate the input!!!!)

In what way is letting the database ensure it isn't getting fed crap any different?

Why do developers constantly think it is okay to let unvalidated user input hit their database? Any client calling your database is a hostile client that will feed your database bad data. Arguing otherwise is complete ignorance.

Dylan16807 · on Nov 8, 2019

> In what way is letting the database ensure it isn't getting fed crap any different?

Because one canonical validation layer is generally enough. You can see how having two separate partial validation layers could cause problems, right? And you can't put all the validation in the database, for non-trivial apps.

And "clients" shouldn't be talking to the database, no matter if you have foreign keys or not. That's a totally separate issue.

LaGrange · on Nov 8, 2019

> Reporting and bi data might get hosed.

> Account management might get hosed.

LOL, if I read the data directly from the db instead of via the application's API then sure, I lose the application's guarantees. But, y'know, same might be said if I just go and read the DBs files from a sidecar shell script or something.

> It puts all applications on top into a undefined state

...that's just an assumption that's false.

spookthesunset · on Nov 8, 2019

If you want to operate assuming your data is always corrupt because your engineers don't understand how to use the tools provided by their database.... Seems like an awful lot of work to re-invent a wheel that your DB server can solve for you. I guess that is on you though....

lugg · on Nov 8, 2019

Will result in corruption. Yes.

Will result in more fault tolerant software, also yes.

It's a trade off and one I make willingly at every scale.

I stopped using foreign keys after university and have never wanted them since.

Non nullable database fields are far more useful than worrying about fks.

xref · on Nov 9, 2019

> Will result in more fault tolerant software

More fault tolerant because you have to waste time debugging the faults and monkeypatching them in code just to avoid FKs?

> Non nullable database fields are far more useful than worrying about fks

Can’t even count how many non-nullable fields I’ve seen packed with “” empty strings to get around that requirement

lugg · on Nov 10, 2019

1. Over my career almost none of it has been debugging problems due to a lack of foreign keys.

Further to that, instead of having lazy cascading deletes moving the goal posts on you. Data missing its parent for example is an indication something is wrong. It's a useful diagnostic.

2. Most string data is safe to represent as empty. Handling null on the other hand is a different situation and often induces warnings or other side effects depending on the language. Forcing the null checking to your boundaries is much like forcing your state/mutation to the boundaries in FP. Hell eveb though I think it's insane, hexagonal arch works on this idea as well.

michaelt · on Nov 8, 2019

> or just run a query to check for invalid rows at 3am every morning

One of the nice things you get from database FK constraints is when something messes up the data, you get an exception thrown in the code that's messing it up, complete with a backtrace and whatever inputs and timestamps you care to log.

eerikkivistik · on Nov 8, 2019

I'll jump in here as well in case someone is considering throwing out foreign keys due to this. I would argue, never make compromises like this unless you have a scaling problem that is so bad, that there is no other way around it. Don't get me wrong, there are places where this is the case, but 99% of software does not have this issue. Use foreign keys and save yourself the headache in the future. Also bear in mind, that if you ever hit that scaling problem, you are making so much money, that it's a nice problem to have.

fennecfoxen · on Nov 8, 2019

I would add to this if you have the scaling problem, and are planning to throw out foreign keys, and start sharding: you have essentially moved to a distributed data store, but you are using a query language not designed for it. You will face challenges learning what parts of the database system you can and cannot use safely, and enforcing these constraints will be fraught.

You are essentially moving to NoSQL. Maybe it makes more sense to just own up to being on NoSQL, and using a data store and data access patterns that were actually built for the task? It should be something to think about, anyway.

Certainly consider this option if you're planning for that scale from the start as a more meaningful alternative to simply saying "no foreign keys" from the start. I won't necessarily say it's overengineering; there are entire problem classes where tracking millions and billions of records are on the table, particularly in event monitoring.

swasheck · on Nov 8, 2019

Such an excellent post. Thank you so much.

blantonl · on Nov 8, 2019

When you get to a point where you need to start sharding your database, congratulate yourselves for the enormous success and toast the night to come.

The next morning you can embark upon the engineering effort to start the sharding re-architecture.

__jal · on Nov 8, 2019

I've worked on exactly one project where performance concerns lead to removing Fkeys.

The compromise we came to was to enforce them in dev and qa in order to catch bugs, and relax them in prod.

I still strongly believe that database constraints are a developer's best friend, in that you can trivially make your data structures fight back against misuse. This makes several classes of bugs obvious. But like anything, there are times they are not optimal.

I'll also say I think there are very few cases where a small performance advantage outweighs the costs, and would be hesitant to head down this path with a team less competent than the folks at Github.

WorldMaker · on Nov 8, 2019

I've worked on projects where adding foreign keys increased performance tremendously. A good relational database can take advantage of them in all sorts of ways in execution planning. Even scenarios such as some types of sharding, a relational database sometimes can use foreign keys for useful data proximity information, to better co-locate "families" of records together.

You need to know your database engine, how normalized you are or are not, how appropriate your data is for its relational model. It's not always your FK constraints that are the bottleneck you might think they are. Sometimes their "slowness" is hiding a problem elsewhere in your schema, a bad shard or an index that could be better or a key-value pool with soft version controlled enums pretending to relational data.

Sometimes "optimal" isn't a simple spectrum but a checklist of trade-offs and no "right" answer.

kyrra · on Nov 8, 2019

Two of Google's big internal databases don't use foreign keys (Spanner[0] and F1 [1]), rather they use hierarchies (parent/child tables). It definitely limits you a lot more than foreign keys, but seems to make sharding easier. It works well for some use cases, but takes some getting used to. Cockroachdb also has this functionality [2].

[0] https://cloud.google.com/spanner/docs/schema-and-data-model

[1] https://ai.google/research/pubs/pub41344

[2] https://www.cockroachlabs.com/docs/stable/interleave-in-pare...

munchbunny · on Nov 8, 2019

Someone two degrees of separation from me coined the term "medium data". I don't remember their name, but I absolutely love the term.

You are "big data" when black swan events (like hardware failure taking down a database node) become regular enough that you begin to statistically model it. Before that, you might just have a lot of data, but you aren't having "big data" problems.

nkozyra · on Nov 8, 2019

Or, as a wise man once said, "Premature optimization is the root of all evil."

It's a different part of the field of course, but so many people yield opportunity cost chasing some difficult architectural problem that will likely never impact you, and if it does you'll have resources to throw at it.

Many startups won't even have slave production databases, much less have to worry about network distributed sharding, complex caching strategies, database optimization beyond query analysis.

lemmsjid · on Nov 9, 2019

I will add to the pile of agreements on this.

Also, if you do end up sharding a relational database, it is often by identifying subgraphs of document-like structures that you can shard on. Need to ensure these subgraphs do not have foreign key relations with one another, but you can maintain the valuable foreign key relations WITHIN the subgraphs.

Practical example: database of user profiles where suddenly you have billions? You can skill keep foreign keys on user to email-addresses or user-to-comments while eliminating cross-user foreign keys.

I would also add a good rule of thumb: when you make the design decision to remove a database feature, you need to assume that you now need to handle that feature yourself, or your data will get corrupted. For example, when removing FK constraints, or transactional boundaries, or introduce irregular checkpointing, you now need to implement a data repair system, because your data will get broken. At which point you are probably going to end up using a change event log (your own transaction log) and a system that replays the logs in order to repair and rebuild the database.

pfranz · on Nov 8, 2019

Even at very large companies, hacks can work surprisingly well for a surprising amount of time. As much as we harp on technical debt (which includes myself), we rarely appreciate how much time can be saved by implementing the most straightforward solution so we can focus on hard things.

I'm not sure if Guido was just being amicable, but in the recent Dropbox retirement post he said, “There was a small number of really smart, really young coders who produced a lot of very clever code that only they could understand,” said van Rossum. “That is probably the right attitude to have when you're a really small startup.”

devy · on Nov 9, 2019

In addition to context + YMMV, I want to mention, sharding is NOT THE ONLY solution for scaling[1], albeit it's a popular technique. So that makes Shlomi Noach's first point weaker.

[1]: https://www.quora.com/What-are-alternatives-to-sharding-data...

swasheck · on Nov 8, 2019

Additionally, at least in Sql Server, trusted foreign keys can give the CBO more, good options because of the guaranteed referential integrity on reads. FKs are killer on larger writes, though.

kraig911 · on Nov 8, 2019

+1 million times this. Seems like everywhere I work with are concerned about google size data problems when the apps I'm working on are at best 1000 concurrent users...

michelinman · on Nov 9, 2019

I agree. No two systems are comparable. Must be very confusing for the people ramping up. A solution fits the problem, not the other way around. Keep it simple stupid. KISS

capoDanger · on Nov 8, 2019

Came here to say something like this, though you said it more eloquently. Also a couple replies to you below add great context.

patsplat · on Nov 9, 2019

Who needs foreign keys... or relations?

Documents do well for a wide variety of applications ;-)