ORMs lure you in with a false sense of neat abstraction. They have nice intuitive examples on their home pages. But then you use them in the real world, doing gnarly queries, and you realize that doing anything powerful and fast in the ORM requires its own completely separate abstractions, which are often difficult for the uninitiated to follow. It's also often a big pain to debug the raw SQL that gets compiled after the ORM does its magic.
The argument I've made before when going down the path of ORMs has been: do we forsee needing to use this model code on a different database engine? Outside of simple toy applications, or needing to support different engines with the same code, I agree that ORMs are more trouble than they're worth.
You don't use ORMs for gnarly queries -- that's not what they are for! They are for making manipulating the entities easier -- reading the data out of the database in a way that makes easy to modify.
You can (and should) use them for simple queries. You have a list of entities you want to query and filter, that's going to be fine. Joins are fine. But if you're doing some complex analysis, an ORM is the wrong tool. That doesn't mean it's a poor abstraction, or difficult to follow, or something to be avoided. It's not the right tool for that job. For the job it's designed for, it's going to save a lot of effort.
SQL is great for analysis -- it's pretty much what it's designed for. But for bringing data into your app and modifying it, SQL is cumbersome and verbose. If you're loading data into objects then you're just creating your own personal ORM anyway.
In ActiveRecord, there's a method called find_by_sql. You can't call it directly; it's a class method on an ActiveRecord model. So you have to choose which of your ActiveRecord models should be used to instantiate the rows of your result set. (What if your result set doesn't really match any of your models? Pick one arbitrarily.) Your SQL has some extra columns. What happens to the data in those columns? They get monkey-patched onto the individual objects. (Which is stupidly expensive in Ruby.) Other than that, the individual objects are fine. They even have all your smart instance methods, which may or may not behave properly with all the ad-hoc monkey patching.
If you tried to short-circuit all of that nonsense, you tend to get arrays of hash tables. Which is, in my opinion, already a perfectly adequate interface!
> What if your result set doesn't really match any of your models? Pick one arbitrarily.
I don't want to sound like the ORM defender, but I'm not sure I understand.
This sounds like a deficiency of Ruby and the ActiveRecord record model. In Java, for example, you'd just write a new POJO for your query, which isn't exactly difficult. There are no "smart methods" or whatever.
It is a valid criticism that this can proliferate data classes, but that depends on the application.
I was writing in terms of what I actually understand and have used—which doesn’t include any Java ORM. In fact if there are Java ORMs that consist solely of POJOs which are populated by raw SQL queries, I would gladly use them!
The API docs don’t make it clear whether that’s still possible so I didn’t mention it explicitly, but I have done that before and that’s what I was alluding to with the “array of hash tables” comment.
In my personal experience, ORM’s seem to encourage queries to get scattered throughout your logic (they’re just normal code and function calls, afterall... at least, they look like it) and encourage mixing application-side logic with query logic. The former makes it incredibly hard to remove if you need to reach for raw SQL and the latter leads to bad performance due to many application-database roundtrips snd not filtering enough before sending data to the application.
Yes, both of these things can be solved through disciplined modularisation of ORM logic, but in my personal experience across multiple companies, most developers simply aren’t that disciplined and treat ORM code as any other application code, instead of treating it as the remotely executed database code that it actually is.
In my experience, writing raw SQL (through https://www.hugsql.org/ in my case), you are instead encouraged to think of them as separate and carefully consider the boundaries, which helps keep the queries and application logic modular and allows for more carefully crafted queries that minimise roundtrips and data shuffling.
Again, this has been my experience, across a number of companies. Perhaps your experience differs, in which case, I’m jealous.
Running raw user SQL isn't a prerequisite of an ORM needed to make it an "ORM", it's a useful feature that most ORMs try to include because the authors recognize the many shortcomings. Also, by writing raw engine-specific SQL, you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.
> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
The biggest selling point is a massive reduction in boilerplate code. Database agnostism is a feature that almost nobody ever uses, so who cares! Your proposed alternative is engine-specific SQL so you lose either way. At least with an ORM, you'd lose significantly less. You'd just have to deal with the places you used SQL. Which, in my experience, is pretty small and pretty specific.
> this typically means writing custom stitching code that stitches the SQL cursor results back into the models again.
I often feel like people who complain about ORMs have either actually never used one or used a poor one. As long as my query matches the structure of my object(s) I don't need any stitching code. And if they didn't match, I wouldn't write stitching code because that would be waste of time.
If I'm writing a custom query, I'm probably not looking to integrate into the model anyway -- if it's used for reporting I'd just take the results as is. If I'm writing a query specifically to get a matching model object (to manipulate) I'm going to get the whole model and no stitching would be required.
> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s. Maybe some commercial products still try (choose MS or Oracle!), but it's rare nowadays.
The primary selling point of an ORM is that it abstracts marshaling/un-marshaling rows to/from entities. Instantiating and persisting entities to relational storage.
> And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.
That's not typical in most uses I've seen. Far more typical are things like:
- Go straight to SQL for reporting, since that's what SQL does. Useful in reporting contexts, and also for list/filter UI screens.
- Use raw SQL to query a list of entity IDs for updating based on some complex criteria. Iterate over the identifiers and perform whatever logic you need to before letting the ORM handle all the persistence concerns.
> I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s.
Do you use the same database engine for your unit and integration testing as you do production? I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.
As a side note, I quite literally can't use aurora-mysql for local unit and integration testing. It doesn't exist outside AWS.
Unit testing code that touches the database is not useful, and in fact it indicates there is likely a design flaw. Code that acquires from or changes data in the DB should be self contained.
An integration test that doesn't use the same DB as production is unsatisfactory.
Using a DB inside your unit tests is an antipattern and arguably a violation of the concept of a unit test in the first place.
Integration tests should run against a test environment, otherwise, what integration are you testing? I don't see the value in writing integration tests that test the integration between my code and a one-off integration test DB that exists solely for the purpose of integration testing.
There is no syntactical difference between Aurora/MySQL and regular MySQL until you start interacting with S3.
Also if you’re using only the subset of MySQL that is supported by SQLite, you’re missing out on some real optimizations for bulk data loads like “insert into ignore...” and “insert...on duplicate key update...”. Besides that, the behavior of certain queries/data types/constraints are inconsistent between MySQL and every other database in existence.
Finally, you can’t really do performance testing across databases.
> Do you use the same database engine for your unit and integration testing as you do production? I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.
The PHP ORM Doctrine supports a custom SQL like language called DQL that integrates your defined model/relationships as well as converting it to the correct SQL dialect for your backing store.
Queries look like:
SELECT u FROM ForumUser u WHERE (u.username = :name OR u.username = :name2) AND u.id = :id
It's certainly limiting, though you can write (or find implementations[0] of) custom types[1] that can be pretty powerful (void where prohibited, limitations apply).
I've always thought the "being SQL database agnostic" theory of ORMs was more about a development team being able to choose from some common choices than about apps being portable in practice.
Yeah, I’ve never seen any complex applications, using ORM’s, that were easy to port to another database. Hell, one company I was at switched from MySQL to MariaDB and even that took some work despite that they should be almost the dame thing. If you switch to a more substantially different database, eg, from MySQL to Postgres, then its even harder. I’m also a believer in using a databases features when it makes sense and not limiting myself to the standardised subset of SQL just in case I might want to change databases later.
Hibernate promises that you use the database "as if" it was not there. You can drop into SQL whenever you need, but you cannot remove all the stuff you had to add just to make it work (saving, reloading, handling caching....). So instead of being "out of the way" it poisons your entire code-base.
Now this is excellent point which is true in my experience. It gives little ramp up in starting and keep creating speed breakers as development proceeds to handle real complex business scenarios.
They make simple things simpler, which is a good thing.
I wouldn't mind writing queries in SQL so I don't have to learn a new ORM for every language I use. However I like that ORMs unmarshal results for me in objects / tuples / maps of the language sparing me the work.
For complicated things I write SQL and possibly decode the results manually.
If you know SQL and know the ORM, you can just do the complicated things in SQL.
If you know a problem is going to be complicated, you know to use SQL.
If you don't know if a problem is going to be complicated, you can default to SQL and decide to use the ORM later if it's a good fit once you've got more problem details.
>You can (and should) use them for simple queries.
This is not a very compelling argument to use ORMs. It is saying "it makes easy things easier". This doesn't really buy you much value. The simple things are already simple. Bringing in a very large, complicated external dependency to make simple things simpler, is not a good idea.
>If you're loading data into objects then you're just creating your own personal ORM anyway.
By this definition, the Postgres driver I use for node is an ORM as it takes a row of string/type identification codes and turns them into javascript objects and javascript types. It isn't an ORM though, that is not what an ORM does. An ORM converts one paradigm into a completely different paradigm, which is why it fails and is a terrible idea.
This kind of doctrinaire thinking, this sort of broad and bold declaration, is the stuff of high-traffic blog posts but not good advice for real world developers.
Django, just as an example, does a magnificent job with its built-in ORM. Millions of developers use it, and they are not all fools. A fool is someone who would set out to build a simple-to-intermediate CRUD web app by writing SQL.
> This kind of doctrinaire thinking, this sort of broad and bold declaration, is the stuff of high-traffic blog posts but not good advice for real world developers.
> A fool is someone who would set out to build a simple-to-intermediate CRUD web app by writing SQL.
\u{1f644}
SQL works great for simple-to-intermediate CRUD web apps too.
I love Django’s orm. I wish Django would publish its orm as a separate library. Every time I see a flask / sqlalchemy app I think to myself: Gee you could have saved at least 2x the keystrokes if you’d just used Django.
No it makes simple things easy. Simple things in raw SQL are bloody complicated. Even just getting data, manipulating it, and saving it is at least twice as difficult in maintainability and lines of code than using an ORM.
> An ORM converts one paradigm into a completely different paradigm, which is why it fails and is a terrible idea.
I'm not sure where people get the idea that ORMs fail at their job. They really don't. They do it very well and we're all quite happy.
All the anti-ORM arguments here about how ORMs fail at completely different jobs other than mapping objects to RDBMS operations. Well duh. Nobody complains about cars that can't fly and planes that can't fit on the highway but when ORM can't cook bacon it doesn't fit the paradigm.
> Even just getting data, manipulating it, and saving it is at least twice as difficult in maintainability and lines of code than using an ORM.
That has not been my own experience outside of the most trivial queries. Once any amount of complexity is introduced, I find that ORM-based queries often make it difficult to really see whats going on with indexes and locking, I can’t just paste a query (eg to use EXPLAIN) without finding it in a query log first and, unfortunately, in my experience its rare to find teams disciplined enough to not treat ORM code as if it were normal application code (ie don’t mix it into your application logic), so often end up with a few database/application roundtrips, doing filtering in the wrong place etc. Yes that last one isn’t technically the fault of the ORM, but when I see it again and again in real world code, I start to think that most developers don’t have the discipline to be careful with ORM code while when not using ORM’s and writing raw SQL outside of your applications code, you have no choice. Not the ORM’s fault, but still a symptom of using one that I’ve experienced in multiple teams.
Things don't usually start out bad, but they become bad only after you have a more complex codebase with many users (ie many queries). Its at this point that you need to know what a query is doing, yet its at this point that the query logs are full of queries, so finding the ones I want becomes hard. Putting a trace on the queries also becomes hard when you have a large codebase where the query logic is intermingled with application logic. I mentioned this in my comment.
You say it's my "refusal to want to change the way you work that is the issue", but I never chose to work that way. All of the codebases where I've had this issue were inherited: I was not the one to decide to work like this and I did not mix the query logic into the application logic. But that was my point: I've had the same experience across multiple teams in multiple companies, so blaming the developers seems like a cop out and not much of a solution. I'll happily adapt the way I work, but I can't force existing teams to change. Maybe discipline could fix it, but I have not experienced this discipline anywhere I've worked. But sure, its my fault somehow.
Eventually if you’re working with an object oriented language or if you have to convert the result to JSON, you’re going to have to convert one paradigm to another.
Unless you’re programming in assembly, everything you do is being converted into another paradigm that is what every compiler and interpreter does.
You can only use a fraction of the features of a SQL database when you use an ORM because they don't translate to the new paradigm. When I am using a high level programming language, it is merely helping me do things like manage memory. I am not constantly wishing I could drop down and work with pointers and so on. It is a foundational paradigm that builds on the top of the one before it. ORMs just present a different, incompatible paradigm.
There are plenty of scenarios in certain verticals where you need to control memory management like games and others where you need to program in assembly. Garbage collection is an incompatible paradigm where you need to control when memory is allocated and freed.
Back in the day when I was doing C, there were times when we just couldn’t get the speed we needed from the compiler. I wrote inline assembly. Does that mean C was unsuitable because it was incompatible with our performance requirements?
An ORM is not a high-level version of SQL though. A more accurate metaphor for an ORM would be like a converter from one high level programming language that is object oriented, to another high level programming language that is functional.
In the case of C#, LINQ is a declarative built in part of the language. At runtime, it is converted into an AST and if you use Entity Framework, it is translated into SQL - another declarative language.
You can theoretically express any standard SQL query in LINQ even though outer joins can be obtuse at first. The translation may not be as optimal as hand written sql, but no compiler can translate code into assembly that would be as optimized as someone who could hand roll their own. We decided decades ago that high level languages were worth the trade off most of the time.
This is where I disagree. I dislike having a value which is the entity. The basic lesson from relational databases and later data oriented design is that you don't have an entity. All you have are aspects that are related.
This is also tapping into one of the many optimizations you have to know about when using ORMs. Can you "select" 10M rows from a table? Is an object instantiated for each one? If they're lazily created, when are they destroyed, and where is the buffer of rows held, client side or server side? How do you efficiently update each one without incurring a sql statement for each update?
All of these questions require deep knowledge of the inner workings of the ORM, when in SQL, this is a lot more straightforward.
Not really. It isn't a SQL issue; it's a database driver problem.
The Django ORM takes take a whole of an hour to read, and after that you have 90% of the use cases covered. Considering that I've been using that framework for many years now, the overhead of that knowledge is irrelevant.
You're going to love this: about the time I left one place I worked one top engineer did a meeting on using Hibernate for... drum roll... analytic queries. My brain fried with questions of why, how is this better, and this is definitely the result of someone that hasn't touched real SQL in a while and just wanted to use the same hammer for everything.
I tend to use Active record for simple queries. And for anything complex, I write a SQL query to define a view and build a view backed model. That way I can use SQL when I want/need to, but I can keep it contained in one place, and I don't have constantly rewrite tedious queries.
That's not true. A simple job might be 3 lines of code in an ORM to update a record. In SQL that will be a lot more code especially if that's wiring up a foreign relationships. With SQL you will also have a lot of uncheckable strings containing code.
Simple tasks are done maybe thousands of times in any one application. It's the complex tasks are rare.
>A simple job might be 3 lines of code in an ORM to update a record. In SQL that will be a lot more code especially if that's wiring up a foreign relationships.
That's not a fair comparison unless you include the time and effort involved in creating your ORM models, before you can even write those 3 lines of ORM code. The effort to construct those models isn't even fully amortized over your project, since it must be maintained through schema migrations.
ORMs are like putting on an exoskeleton to go buy groceries because it will let you carry your bags more easily. In my opinion, the complexity and indirection they introduce for accomplishing simple tasks do not justify their existence in the vast majority of cases.
You would need to show me your DDL statements because that's the equivalent. I can generate the model from the database or the database from the model.
It's not much more effort to type "create table employees" with all the fixings as it is to type "class employees" with all the fixings.
>I can generate the model from the database or the database from the model.
Depends on the ORM. Just like "raw sql mode", not every ORM supports that. It's not an inherent feature of being an object-relational mapper, it's part of the bells and whistles of some ORM packages. And I'm going to guess you're coming from the Python/scripting universe, because generating models in other languages is definitely more complicated.
But let's pretend that all ORM software can generate model definitions from the database. Just do the update then. I want to validate your argument that a simple update in SQL is "a lot more code especially if wiring up foreign relationships." If you think an ORM is much less code, or simpler, I'd like to see how.
I think I can give a better example that isn't purely about performing updates, but is still about "simple things" and not particularly complex flows.
In my API, I have models from my ORM. These can then be used for database migrations, which allows changes in my database structure to be checked in to version control.
From there, these models are obviously used for queries within my app. Generally, you are correct that generating a simple query with the ORM is not meaningfully easier than writing the SQL myself, with the one exception being that the ORM sanitizes inputs to my queries without me having to think about it at all, which is nice.
From there though, I can use the models from my ORM to automatically validate incoming requests to the API endpoints, as well as automatically serialize the results of my queries when I want to send a response back to the client. If you're not building a REST API, obviously this might not be particularly useful, but for someone that is, it has saved me a lot of work.
Finally, generally I find code in my code is a bit easier to debug than strings.
Your example makes for good SQL but a simple ORM example:
employee = new Employee() { Name = "Bob" }
employee.Manager = new Employee() { Name = "Jill" }
context.Employees.Add(employee)
context.SaveChanges()
This would insert a new employee record for Jill, grab the primary key, then insert the record for Bob with his ManagerId field to Jill's id. If Bob wasn't a new employee, it would perform and update instead. If there were more related elements and different levels of depth the ORM would order the inserts/updates to get the necessary keys and wire everything up. After this block of code, the objects are all updated with their new primary key values. Everything is executed in a single transaction.
The ORM is performing operations on objects -- it's pretty much right there in the name -- it's going to be a poor choice for bulk updates because that's not what it's for. But again, nothing stops you from using the right tool for the right job -- whether that be an ORM or raw SQL.
Yeah, your "uncheckable strings" (which could be checked by spinning up a testing database) is replaced by 700k lines of code that, although tested, is still full of bugs.
The Hibernate repo is 700 thousand lines of java code!
There's a tendency amongst some (especially "enterprise" programmers) to forget that dependencies are also just code. If they break, you are on the hook too. You own that complexity.
Just because a job is simple doesn't mean it doesn't take time & effort. A tool can be helpful if it reduces the time or effort to accomplish something.
I don't think that is what parent wanted to say. It's still a lot of boilerplate code and config that can be avoided by using an ORM for the 'simple' parts of an application. And if you use a framework like sqlalchemy, you still have access to the lower levels if it turns out the ORM abstraction is unsuitable.
Whenever I get to the point where I'd do something gnarly with a query, I just fall back to SQL statements with results that the ORM can parse; or just dump directly into data structures and go from there, even if the originating tables still have ORMs.
If your framework/library/language doesn't let you skip the ORM - or populate the object with the results of a custom query - well, I'd say that's a failing of that ORM, not of ORMs in general.
I remember the very first time I had to do anything interactive on a web page. We had a long list of items in a table, and as a stopgap for adding search functionality we were going to sort the table by multiple columns.
Stack Overflow was still a twinkle it Atwood's eye. So I go googling about for stable sort implementations in Javascript and I find plenty. Except they all have the same problem. They were all slow as molasses. I should have taken it as a huge warning that none of these implementations used demos showed sorting of more than 10 items.
I needed to sort 100 items. 300 at the outside. Ultimately I had to go to source material and implement it myself.
Clearly there is some notion in the software community that market for ideas is inexhaustible. That the 'oxygen' in the room is infinite, and therefore I can interject whatever half-assed concept I had into it without costing anybody anything.
Virtually all of us, when we tackle a problem, try to do better than what is already on offer. But what if the thing on offer is truly, horrible? If my only goal is 'better' instead of 'good', then my alternative will be really bad. And in a field full of bad, who wants to be the person who introduces the 6th standard? The 8th?
Keep your dumb ideas to yourself, or put them in question form and ask people why
After someone posted Norvig's Sudoku solver, I was troubled by his statement about how many algorithms he'd have to implement so he just did brute force. So I took a whack at it, got fairly far down the deductive path before things got hard. But I'm not going to show it to everybody. The internet has been working on strategies for 10 years, and they've done way more algorithms than the ones I knew about. The best I've managed is to maybe simplify a couple rules, but I think one could argue that it's the bad definition of 'simple'. It can't handle as many cases, but I can explain it to anyone. Is that enough to add to the noise? Probably not.
As someone who’s been using mostly Clojure recently, wouldn’t this apply to programming language based object systems, too? Maybe the problem with the object relational impedance mismatch is the objects.
There's absolutely nothing in this world that has frustrated me more than knowing how to write a query in SQL but, for the life of me, not being able to express it using the ORM syntax.
Since then I keep using ORMs (because mapping the things you pull from the db to actual objects is undoubtedly good) but I write my queries by hand.
I also dislike ORMs that, by default, when you try to access an object you forgot to pull from the db with the query you wrote, automatically generate another query to pull the data instead of erroring out.
The argument I've made before when going down the path of ORMs has been: do we forsee needing to use this model code on a different database engine? Outside of simple toy applications, or needing to support different engines with the same code, I agree that ORMs are more trouble than they're worth.