Not knowing anything about web development or Django or whatever, why isn't this just in a real database? Where you can do a join instead of 2000 individual queries?
Object Relational Mappers like the one Django use are a whole topic of debate unto themselves.
They strive to map SQL concepts into a more code-friendly domain. Rows are represented as objects, references become attributes of those objects which can be eager-loaded by a join when you fetch the parent object or lazy-loaded with their own query when they are actually accessed. The N+1 problem comes from this lazy-loading scenario.
This programming model abstracts away the SQL which can be really cool and easy until you try to do something off the beaten path. Junior devs love it for the ease, and diving into the internals once things go wrong turns them into Senior devs!
Coming from an area of programming where dataframes/tables are the first-class citizen this is mind blowing to me. Is there a reason for doing things this way by default? I guess it puts off having to think about complex queries, but it also means performance is degraded multiplicatively the more tables you're pulling in, right? Is the tradeoff really just "it's easier to set up any old garbage and fix performance later"?
Lazy-loading is usually the default ORM setting, so you wouldn't pull in any extra tables by default. You can setup Eager-loading on a per-query basis (i.e I know I'll need it here) or just set up certain relationship to always eager-load, depending on your needs.
You've hit the tradeoff right on the nail. ORM-based code is dead-simple to write since everything just looks like an object. Anyone who knows the programming language can start querying the DB without really knowing anything about SQL. This lets you write the code quickly and in a domain thats really easy to reason about.
I've seen many times where an endpoint might really only require 1 SQL query but it turns into a handful because of the ORM. Very rarely does it matter though, because each query takes < 5 milliseconds and you barely notice the slowdown.
If you know your ORM library well then you can get the benefits of both worlds. Mature ORMs will give you a ton of control (including letting you write SQL directly) when you need, you just have to be ok with the learning curve.
Columns though, you do have to restrict the column set if you don't want/need them all.
It's dead simple to do dead simple, and then it quickly gets hard to impossible to do the complex things you need, is my experience with Django ORM.
My favourite model personally is a very lightweight mapping into a model in the querying language, like Diesel or less. Django unnecessarily changes established terminology and introduces abstractions that are ultimately limiting.
IME, it’s more because most (citation needed) web devs are familiar with JSON and its nested format, and so it’s much easier to keep everything in that mental model, and let something else do the conversion.
SQL is of course not a difficult language, but I will give ORMs that it can be somewhat difficult to context switch between the two mental models.
Django, IME, does an extremely good job at producing decent SQL, most of the time. There are times when you can be faster, and there are certainly times where it’s massively slower, but the average is higher than if you just had everyone writing queries.
My counter to that is “have RDBMS SMEs create VIEWs for anything you want, and let devs just call those,” but then I get told I’m old-fashioned, and that no one wants to store any kind of logic in a DB, version controlled or not.
The steelman of their argument (coming from things as a database person) is thinking about the problem in a different way for a different domain, which can beget a "simpler" approach.
Something like a document db approach (one big json entry to rule them all) gets significant benefits in setting certain levels of abstraction as even possible.
Hey - we just get the user details and then we update some json and that's all we can do, and it kinda sucks, but it means we're going to put most of our computation in this website and really make something that looks like a modern web app.
It has a bunch of downsides, but you dont have to teach anyone SQL, its just a simple repository pattern + microservices + limiting the scope of any one thing.
Type checking is strong, tooling support is strong and you usually can stick in one language and go ham.
Django queries are an interface to a real relational database. It's not its own thing.
The tradeoff Django chose was lazy loading by default over automatically pulling the entire database by following every foreign key.
One of the standard fixes is to tell it to do JOINs for specific foreign keys instead of lazy loading (select_related(), mentioned but not shown in the article).
The ORMs provided by full stack frameworks like Rails, Django and DotNet are basically engines in and of themselves. They’re doing a lot of work at runtime to reduce pressure on the DB and ensure connections are returned to the pool as soon as possible.
They’re not ‘automagic’; you can reason with them quite easily.
Then what's up with the pattern they show of querying every board and then iterating through to query again for every report associated with the board?
I was expecting them to say after introducing this, that they started with an intentionally bad example so they could improve on it. But instead they went on to talk about a 2.5x speedup on something that would already be 100x faster in any other database. I don't get why they're doing this at all.
If you have 1 user asking for 2000 things that’s fine, but what if you have 2000 users asking for 2000 things?
You will need to write your API to batch database requests to avoid inefficiency anyway, so it makes sense to break up large joins into smaller queries that can batch well.