Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Give me an O! Give me an R! Give me an M!

What does that spell? SLOW PERFORMANCE!

Todays programmers dont understand data. They understand frameworks. To find the nr of all cars that are out of insurance they write:

    10 Nr=0
    20 Hey framework, give me all cars!
    Framework: Ok, here are 8001093 business objects representing all the cars in our DB. Each has all the attributes the car has. Color, mileage etc.
    30 Thanks!
    40 Foreach Cars as Car
    50 If Car->insurance_end_date < FancyDateLib.now() Nr++
I see variants of this everywhere. And the performance impact is just untoppable. It's often several million times slower then a simple sql query.

But beefy hardware with lots of ram and memory takes care of it. 'Our software is enterprise grade, so of course it cannot run on commodity hardware'.



Not this again. ORMs are tools, basically dynamic code generators that run SQL and map the results to in-memory objects, and vice-versa. Some are simplistic and others are incredibly advanced, and the code itself is usually faster than your own sql->objects logic that you would write otherwise.

The issues with performance are almost always with the way the tool is used, like choosing a bad algorithms or the wrong data structure for a certain situation.

Lazy-loading a large child object in a tight loop over 100s of entries will always be slow because it's an improper approach, and nothing other than knowledge and competency will resolve that. It's rather amazing ORMs can be so controversial when the real issue is the developer, which speaks to the far bigger problem of expertise and quality in this industry.


It is the developer, but ORMs are so controversial in part because they often obscure that you're doing something crazily ineffective in ways that makes developers that don't understand the abstraction fail to see that they're doing something obviously wrong.

It's more stark that you're doing something crazy if you do a SELECT, instantiate objects from each returned row, then apply a filtering rule to that object, than if you're "just" operating on something that looks just like it's all local. Both to you, and to people reviewing your code.

I prefer working with an ORM, but it does take discipline to use an ORM properly; to make sure you know and use the facilities for building queries that return only the rows you want, and instantiate objects with only the columns you need.

I tend to agree with you that it will only be resolved by knowledge and competence, and throwing out ORMs won't solve that. But I also understand the frustration at how many ORM users basically seem to use it as a means to let them pretend there's no database server there, rather than as a tool to generate queries more easily.


Perhaps the anti-pattern to rule them all is that as things become easier to do, it gives more people the opportunity to do them badly. We have so many incredible tools, and simple but powerful high level languages that allow us to write apps without bothering with complicated stuff like assembler. However, these tools don’t mean that we get to ignore how CPUs work, and ORMs don’t mean that we get to ignore how data structures work, or how we’re supposed to operate on them.

I think you’re 100% right. ORMs are just tools, they can be used well or poorly. That’s up to the developer. As a former DBA, I also think they’re great tools. Aside from making the business logic easier to write, they also bring the business logic and the data closer together.


> as things become easier to do, it gives more people the opportunity to do them badly.

ala, this is why we have electron, php, and javascript. None of those things behave badly, but it's very easy to write something useable in those, but resource intensive, or insecure, or poorly maintainable.


>[...] ORMs are so controversial in part because they often obscure that you're doing something crazily ineffective ...

Then just turn on detailed SQL logging while you are coding and keep an eye on the queries generated by your ORM calls.

If you do this methodically for all your DAOs, then you should spot performance problems early when they can still be easily fixed.


With all this continuous effort required, wouldn't it be simply easier to ditch the whole 'use ORM because it makes life easier' nonsense and use just some SQL-mapping tool like iBatis? Where you really, truly have 100% immediate control over queries, no dancing around with logging and wasting time figuring out why the lib decided to fetch this and not that.

I honestly don't get the fear some people have from SQL. If you can't do SQL good enough for 98% of use cases out there, you are not senior dev, not even experienced one. It's just one of those basic dev requirements that won't go away in next 50 years anyway.


True, but on the other hand if the developer gets told they are loading nearly 20,000 objects they should be smart enough to realize something is going to need to be optimized here, and although I don't know how to do it I better find out.


That depends upon the context. I've seen developers argue over optimizing ETL processes that weren't in the slightest bit time sensitive, took 25 minutes and were going to be run once.

More often than not I've seen developers fret over performance when it's not actually bothering users. This fetishization of performance seems to be a cultural thing in tech.

Conversely, data integrity, normalization and transactionality are usually given far too little weight.


> This fetishization of performance seems to be a cultural thing in tech.

Tell me where you've seen that, because I'd love to move there. From my experience, there is a common fetishization of non-performance. The "premature optimization" adage taken to its extreme - "we can buy more hardware", "developer time > machine time", "who cares about wasting electricity of millions of our users", etc.


Pretty sure that is what tailing the log during development is all about. I don't see what is so obscure about it, in the Rails based examples in the post, one can just see the logs flying by with the exact queries being executed.


It is, but a developer that's using ORMs because they're scared of SQL tends to be the same type of developer that will decide not to look at those.

I'm not dismissing ORMs - I use Sequel (the Ruby ORM) for almost all my database access. But I've also seen enough people use ORMs as an excuse to pretend they don't need to understand SQL or understand the database to understand why some people look at ORMs with suspicion.

A lot of code that is obviously bad when the database queries are plain for everyone to see are not so obviously bad when it's less clear if that method call translates to a database query or just extracts data locally.

Note that this is a general issue with this type of abstraction: It is a common complaint against transparent RPC wrappers as well that if they're too good at hiding that an object is remote it's easy for someone to carelessly cause massive amounts of unnecessary roundtrips.


I don't know, I find that a lot of the people that write the most inefficient ORM code would also not spot the inefficiencies in SQL queries either.

On the flipside, I've seen a lot of people that can write efficient SQL queries, but then turn around and ruin their performance with a bespoke code model that inefficiently calls those SQL queries and poorly leaks the memory and database connections while doing so.

A good ORM makes it rather easy to fix an inefficient query of a fellow developer or previous self (almost all of the examples in this article are effectively one-line changes; many of the biggest performance gains I've made in ORM usage have been removing code and/or making it more readable), but fixing the mistakes of a bespoke model can be a huge challenge with a lot of surprises.


Not this again. ORMs are some of the most impenetrable code you'll ever see. Understanding the performance characteristics of a CRUD application using SQL is vastly simpler than trying to puzzle out the arcane ways in which ORMs decide to throw up garbage once you add one more thing to the mix.


It's just code, compiled into a library that you can use, and queries are either logged by the library or in your database. What exactly is so impenetrable? That's just strange. Also CRUD is the ideal scenario for ORMs which handle all the mapping, security, parameterization, and even conversions between data types and models seamlessly while letting you just work with your objects.

It seems you either didn't use ORMs correctly or used a very poor one.


>It's just code, compiled into a library that you can use.

That's also true for code that one shouldn't use.


Dont get me started over the abuse of active record.


Not that again. ORMs achieve one main thing: being able to map your app’s objects to a relational database and back.

But there are tons of other benefits:

1) Avoid all injection attacks by default by binding variables rather than interpolating their vakues

2) Write SQL code for you to automatically, so you always have balanced parentheses and no typos or errors mixing statements

3) Autogenerate classes and methods from the SQL model automatically so there is a place where you can add custom methods on objects

4) Model fields and relationships in a way the app can understand, so you can make meaningful error checks and manipulations in the app instead of relying on the database engine to give you a nice error message or manually copying the data type logic.

5) Play nice with version control, making sure to encapsulate the code in ONE PLACE instead of a million places when the schema or model changes.

6) Pissing people off on HN so we can better discuss and explain principles of architecting good softeare while talking anout the benefits of ORM

7) Let you write an adapter to move to eg a graph database which is far faster.

8) In fact, just making you avoid doing joins in the database by default is already a feature as you can make your app far more scalable w sharding and possibly think about making it byzantine fault tolerant and distributed!

See for example this:

https://qbix.com/platform/guide/database

https://qbix.com/platform/guide/models


I was with you some of the way but "making you avoid doing joins in the database" made me drop my monocle. You want joins in the database, they are designed for joins. Moving joins to the client will kill performance and scalability.

And any sane ORM will perform the joins in the database by default.


No, that's not true if you're building a huge site. Google has been avoiding joins as early as 2005.

Joins were good for the smaller websites, but they don't scale. By avoiding joins, you have shared-nothing models that can be partitioned horizontally aka sharding.

Now true, the latest and greatest databases such as CockroachDB go out of their way to try to do joins for you across partitions, even in an ACID manner, but then you have to use those. Better to avoid joins in the DB and do it in the app. You can then use a graph database instead of a relational database, going from O(log N) lookups to O(1) lookups for related data.

Oh and finally, the newest (and pretty cool) craze of BFT, Byzantine Fault Tolerance. You can't achieve that if you're doing joins across different publishers, because they're not supposed to be able to access each other's stuff "just like that".

Our ORM supports joins, even with multiple indexes, it even lets you define relationships and figures out the joins FOR YOU, but it is discouraged if you're building scalable sites.

  By the way thank you for proving point #6 hehe


> Joins were good for the smaller websites, but they don't scale

Most sites do not have to scale beyond this limitation (or can use database followers to throw a bit of money at the problem). Providing Google as an example is a bit exaggerated as almost nothing in the world has the scaling needs that google has.


So you have to use sharding because your databases have limited capacity. But you can just join in the app, because the app have unlimited memory?


The app doesn’t hold the entire database at a time. The app simply does the following:

1) Get the root record(s) from id(s)

2) See what related records it needs, combine them into a list of ids, partition list by shard

3) Ask each shard for the corresponding records

4) Repeat from 2 if necessary

5) Return this whole tree / graph to the user

Graph databases can do this in O(1) instead of O(log N) lookups.

Relational joins are just one way to achieve this, which can be made atomic in the ACID sense.

However, as you scale up your website, eg with 100,000,000 users, it would be silly to do massive joins. Google even says this in their docs now, for BigQuery.

Instead, design your systems from the beginnig to be as parallel as possible, if you think they will scale.

Look at the problems with Ethereum for example. Or Twitter fail whales of the past.


A graph database basically caches related entities with the root entity, so lookups are fast when the query follows the paths of the graph. The price is that any other query is extremely costly. But if it works for your use case, more power to you. But not really relevant in a discussion about ORMs - Object Relational Mappers.

The relational model was designed to address the limitations of the network/graph database, especially to allow arbitrary (ad-hoc) querying and to decouple the physical storage from the logical model. But if you don't need all that, a graph database may be fine.


Most applications don’t need that. But if they do, just have indexes!


Acolyer again:

https://blog.acolyer.org/2017/07/07/do-we-need-specialized-g...

For the vast majority of use cases regular SQL databases blow graph databases out of the water. Unless you go for graph-specific algorithms like Shortest Path, and even then...


We are not all google, im extremely happy with my joins and perfomance of Mysql.


>The issues with performance are almost always with the way the tool is used

That's overly generic.

The truth is that some tools encourage bad performance habits and a lax attitude about it whereas others don't. ORMs do.


Encouragement and attitude doesn't force you to do anything. You still have to choose to use it, either well or poorly.

This is no different than any other tool that makes things easy but with obvious limits. Proper decision making is still up to you. There really isn't much controversial here if you get past the whole "ORM" hype/hate cycle.


>Encouragement and attitude doesn't force you to do anything. You still have to choose to use it, either well or poorly.

I don't believe in choices, people are flimsy. I believe in creating an environment that encourages good behavior.


> I don't believe in choices

...ok, people still make choices though, you're not controlling their minds. Perhaps educate your workforce so they make the right decisions by themselves, it's more effective and takes less effort than trying to coerce them through generalizations.


>...ok, people still make choices though, you're not controlling their minds.

No, but as a PM you can dictate they don't use an ORM.


Try reading the 2nd part of my comment.


I think the benefit of using an ORM outweighs the cost of making sure you don't have bad programmers on staff. We should be utilizing code reviews and proper source management anyway, right? This should be as simple as a senior/lead developer seeing someone iterating over 8MM rows and saying "don't do that."


Just a UI issue.

Compilers will warn you if you do stupid common mistakes. Not all mistakes, but many of the stupid common ones. If you make stupid common mistakes with an ORM, why doesn't the ORM warn you?


They require rather sophisticated analysis to detect. A compiler will detect trivial errors, but will not flag if you are using an O(n) algorithm when an O(1) could be used. Not yet, anyway.


I read the paper. A third, maybe half of the stuff the ORMs didn't detect looks simple to detect.

(Greybeards may remember how perl's -W switch switch suddenly detected a frightful amount of performance problems using mostly simple tests. Almost 20 years ago now.)


Maybe you are on to something. All the major ORM's are open source (AFAIK) so if you made a write-up of the common issues and how you suggest they could be detected by the ORM, I think it would be very well received.


I'd think you could do warnings on this simply by processing a log of the queries, looking for query patterns that match common poor code patterns.

E.g.

    SELECT id FROM table [some conditions]
followed by a number of

   SELECT * FROM table WHERE id = ...
 
is one example of a anti-pattern that suggests that someone is doing an overly simplistic query followed by a loop. Similar with signs of triggering loading of related objects instead of a JOIN.

Doing it on the emitted SQL would also make it quite easy to make it reasonably ORM agnostic - it doesn't need to be perfect, after all, so doing relatively crude pattern matching ought to be able to find at least the more basic problems.


The authors of the paper, if I’m remembering from reading it earlier today, built a static analyzer to help find these types of problems! Some of the work has already been done!


I use an ORM. I don't perform queries like that because the ORM makes it easy to build complex queries, joins, limiting the data returned etc, then execute them in one query, it's a convenience, not a straightjacket.

The problem is not ORMs, the problem is just people not thinking about the resources their query over some data takes, and trying to do things in memory that are better done in the database (as in your example), because databases are hard, and sql is not a particularly friendly language, so they take the easy way out. They'd do things like iterating over all instances just as much if not using a framework/ORM, because they understand their programming language and don't care to understand SQL, and also because they get away with it when there are 1000 cars in the fleet, and didn't anticipate having 1 million.


One of the biggest problems with ActiveRecord (the ORM in the article) is that it uses extremely obtuse names for extremely common methods. There are three different methods for finding how many things meet a given criteria[0], and none of them do the same thing. There are three different methods for loading data from an associated table[1], and none of them do the same thing either. None of the method names explain how the function is implemented or whether it's going to hit the database.

This is a problem that goes beyond the usual ORM antipatterns and bad database designs and becomes its own special kind of hell, because it makes code far harder to reason about and thus far harder to review, in a way that I don't remember encountering in Hibernate, Gorm, or anything else.

[0] count, length, and size

[1] includes, preload, and eager_load


This is mostly due to its popularity and various people needing different use cases, the long-time maintainer is now working on a mach cleaner ORM for Rust, called Diesel[1].

1 - http://diesel.rs


Compile time SQL query checks... amazing! I've been trying to find a similar library for a while. Thanks for linking


jOOQ for Java does something similar. It's so good, I'm likely stuck with Java on the backend until I stop using RDBMs.


Hey Mat, thanks for spreading the love. Hope you never stop using RDBMS!

Fancy a couple of jOOQ stickers? :)


I wouldn't use the stickers, so they would be wasted. Just keep making jOOQ a great to use library. :)

And while I have you here...your blog posts are always great and very informative. It's interesting to see the differences between RDBMSs (I really only have experience with MSSQL/MYSQL some PG), and how they do one thing or another.


> Just keep making jOOQ a great to use library. :)

Will do! Thanks for the nice words about the blog.


> This is a problem that goes beyond the usual ORM antipatterns and bad database designs and becomes its own special kind of hell, because it makes code far harder to reason about and thus far harder to review, in a way that I don't remember encountering in Hibernate, Gorm, or anything else.

I feel like this brings up an annoying cultural problem in tech wgere developers hit intermediate skill levels and feel the need to proclaim their expertise by writing something in the you’re-doing-it-wrong genre but don’t have the breadth of experience to realize that they’re over-generalizing and so they go from “ActiveRecord has this problem” to “ORMs are bad” without asking whether anyone else has it better. I’ve read tons of similar posts where e.g. Java proved that exceptions, OOP, or static typing were bad or Twitter proved you shouldn’t use Rails.


Yes there are a few problems with ActiveRecord; it has been well used and has accreted lots of similar functions (a victim of its own success), and also it doesn't make it crystal clear when you are hitting the database.

IMO ORMs should be simple, unsurprising, and only have one or two methods which actually execute the sql and fetch data - they are there to make it simpler to build queries and map rows to objects, not to blur the line between fetching and manipulating data.


Few months back I was working together with a group of framework programmers from a large consulting company.

They seem to spend solid third of their time googling for how to write a code. Write code, hit a jam, google, find something in stackoverflow, ruminate between what is found, copy paste and try if it works. They don't know their framework well enough to just write a code.

I'm a more of a embedded software/hardware guy and manager, but I can sit down and write pivot pivot query in plain sql without consulting any references.

I'm not saying these guys are doing something wrong. It may be that in the typical consultant work where you have existing old frameworks, new frameworks and new stuff coming in constantly, there is less value to master the tools that you are using. You write some glue, then move on.


That's a good thing, I think. Programmers should have a good understanding of general algorithms but it's up to the frameworks themselves to reduce the need to be "mastered". I don't want to spend time mastering a particular framework because that knowledge is not portable across languages.


I suspect it's good thing also. It may also be the only way to scale programming into larger groups.

You can differentiate the workforce. Programming by copy pasting example code withing frameworks may allow skipping the requirement to understand general algorithms.

Less required expertise means less pay and cheaper products. You hire 10 low-paid easily replaceable code monkeys who slap together pieces of software from ready components. It's like like assembly work. Then you hire one guy who knows things to supervise them and solve the problems when they can't figure them out. It's like blue collar assembly workers and engineers.


I think part of this is the industry and part of it is the technology. Entire languages in the web are basically meant to be glue. Embedded usually has different requirements from Web, memory and performance are real problems you run into fast if you don't know what your doing. Where in web, memory and performance are a problem of economics, everyone is being served from a cluster that can be scaled if the software is shoddy.


That has nothing to do with ORMs. I've seen plenty of people write code like:

  SELECT * FROM DATA
  for each r in result
    if r.x > 12
      do_something(r.y)
Exactly the same thing without any ORM. If anything ORMs should improve performance for novices since it makes it a lot easier for people to writer better queries that run on the database.


My record was a 5 order of magnitude performance improvement on a site which a big consulting firm had been working on for most of a year, where page rendering times were somewhere north of 20 minutes (the server’s hard timeout).

None of their developers knew about WHERE constraints but they did know how to join tables so they were looping over hundreds of millions of rows in classic ASP using essentially the code you have above, with a bunch of unnecessary type conversions cargo culted into the inner loop (IIRC string to int to string).

Added business lesson: we billed double for a rush job but since it was only about 20 hours to fix the code and build out the remaining features (about half of the site), the original contractor still made considerably more. Our sales guy wasn’t canny enough to realize that he should have offered to do it for, say, a third of the other company's price.


Hey, I think those same consultants worked on the main website for a company I used to work for :)


Seeing how much business people like that got was definitely an educational moment about the efficiency of the IT market ;-)


I've also seen:

  SELECT * FROM DATA WHERE x > 12
where there's no corresponding index. There's also the infamous N+1 query pattern:

  // get list of ids
  for each id in ids
    r = SELECT * FROM DATA WHERE id=:id
    if r.x > 12
      do_something(r.y)
IMHO, both are signs of not fully understanding what the database does for you. In the same vein as the "learn JS before frameworks" argument, I'd argue devs should at least learn and understand SQL, indices, etc. before using ORMs.


> I'd argue devs should at least learn and understand SQL, indices, etc.

This was all dev 101 when I was coming up (yikes, almost 20 years ago now).


Yeah, I wish it were common knowledge...although, if you ask 10 different devs, you'll get 10 different answers on what "ought" to be included in a basic CS / developer education.

(You don't know the minutiae of pointer arithmetic!? You're not familiar with Docker, Vagrant, or AWS Lambda!? You can't construct a sed / awk one-liner with your eyes taped shut!? You don't know about concurrent skip lists!? You've never completed SICP or read Purely Functional Data Structures!? And so on.)


I agree that this has more to do with the developer than the ORM. I once watched someone write a foreach loop around a set of millions of records, to update their "BatchDate" property to the current date / time. To add insult to injury, to get the current time, they issued a separate query to the DB, in the form of "SELECT GETDATE()", then injected the result - as a string - into the update statement.

To be fair though, ORMs do move us one layer further away from the actual DB, so people tend to be even less likely to understand how to write ORM calls which result in performant SQL calls.


The ORM makes it more likely to do stuff in the application instead of the DB because many queries are hard or impossible to represent in an ORM fashion.

Additionally, the ORM makes it much harder to see what is going on under the hood. For example you don't know if a value of an object is stored in the main table or lazy loaded from a related table. So you have no clue if echo "$user.name lives in $user.city" results in 0 sql queries or one or two.


The ORM makes it more likely to do stuff in the application instead of the DB

Again this doesn't match my experience. A mediocre .NET developer who can access the database with LINQ is far more likely to run their queries on the database than a mediorce .NET developer who has to try to write SQL queries by hand.

And the truth is that modern ORMs are often pretty clever with their optimizations, and in many cases the ORM will often outperform an average developer doing the obvious thing is SQL.


Also ORMs have enough visibility into your code to tell you of for doing stupid things. There are libraries working with ActiveRecord (and likely others too) which will warn you that you could skip the specific query if you added .eager_load(). Or about 1+N you're doing needlessly.

This can be done on SQL itself, but it would be much harder.


> ORM makes it much harder to see what is going on under the hood

I've never in the last 20+ years seen an ORM that doesn't allow you to log the SQL queries with a single configuration option.


Heck, I'm disappointed in any ORM that doesn't log queries by default in development mode!


With an add on Django even displays them in the browser!


It's trivial to find optimizations in django debug toolbar..."oh yeah I should use select_related"

If you have no idea what you are doing with an orm you probably won't be able to write decent SQL anyways.


This is only true if you let the ORM design the schema for you. I would hazard a guess that most people only use ORMs for queries (including mutating ones). In which case you very much do know which table a field is in.


Even if you use the ORM to design the schema for you, an ORM isn't going to just willy nilly put fields in tables by some confusing whim. Even the most opaque ORM conventions for field naming tend to have relatively easy ways to spot that MyClass.propertyName binds to table my_class and field property_name, or whatever is the case.


ActiveRecord has a find_by_sql which I use for medium to complex queries. They are easier to write and understand in SQL than in Ruby. The problem is that there are many developers that don't know SQL. I found them in Ruby projects and Python and Node.js. They write whatever they manage to code with the ORM and don't understand the implications on database performance.


An ORM doesn’t have to be slow. Most can source their data from custom queries or stored procedures.

Not using an ORM requires writing lots of code to do what the ORM would otherwise do. Is that really appropriate today? Should we not take advantage of the available CPU/RAM? I say ORM is the correct choice in many cases. Not always.


ORM doesn’t have to be slow, but I have seen a lot of implementations like the parent comment. For example, doing a foreach over 2000 rows where an UPDATE would be much, much quicker. ORMs help a lot with developing software, but you need to profile and optimise - in some cases bypassing the ORM.


    I say ORM is the correct choice in many cases
I'm still undecided about this. Would be fun to compare some actual approaches. Which is your favorite ORM?


Currently I’m trying to use Dapper if possible. It allows me to write efficient queries manually and just maps the results to objects.

For simple queries though (single-table), Entity Framework is just as fast.

Because I’m mostly working on dashboards and stuff, writing to the database isn’t much of a concern.


What is the code to get the number of cars with expired insurence when using Dapper?


Something along the lines of

    db.Query<Car>("SELECT * FROM Cars WHERE InsuranceEndDate < @Date", new { Date = DateTime.Now });
/edit: Sorry, missed the "number of". Well, you get the idea. It'd use 'QuerySingle' instead.


That does not look very ORMish to me. It might return objects. But isn't one point of an ORM that you tell the ORM which data you want and not what SQL query to send to the DB?


Also Dapper fan. I like mapping into objects without code generation and complicated queries are easier to write compared to LINQ if familiar with actual SQL. Only thing I really don't like is bulk inserts as it will do one statement per object. You can still use both Dapper and LINQ at same time but it might confuse others.


> Which is your favorite ORM?

jOOQ. Not really an ORM, but handles the tedium of mapping results to an object while letting me write type checked SQL. IMO, it is the best solution for dealing with an RDBMS. I wish every language had a jOOQ equivalent.


“Todays programmers”: I'm sorry to disappoint you, but this isn't new. My father (DB engineer, now on the verge of retirement) has been complaining about that for almost two decades now!

Just as a reminder, Hibernate is 17 years old :).

Also, modern ORM frameworks give ways to iterate on the data from the programmers language, but translated directly to SQL. See what [Diesel](http://diesel.rs/) does, for instance.

The paper talks about other issues, which comes from negligence or a lack of understanding of the performance costs of the queries. But this performance impact could aslo happen in a SQL-only environment (not using a framework won't help you to create indexes, or to add pagination to your queries).


That’s how I have seen it done in Perl, PHP, Django, etc as well. It’s not an ORM thing it is a naive programmer thing.

The ORM makes it easier to do things like Class.filter(insurance_end_date < today_date)

But hey what do I know, I am not berating the youf of today so I don’t belong in this skit.


    Class.filter(insurance_end_date < today_date)
Which ORM is that? It would be cool to compare some approaches to common problems.


ASP.NET MVC4 also has Linq so you can do stuff like Class.All(x => x.attribute == value).ForEach(foo => foo.method())

Django has similar functionality through Q and F expressions.

Most ORMs have this functionality of specific language constructs to allow complex queries to be expressed without writing SQL, just with slightly different semantics.


The Django ORM for example.

You can ever traverse relations and still use the same syntax: https://docs.djangoproject.com/en/2.0/topics/db/queries/#loo...


Diesel has something similar (except that it can't overload < (you can overload < in Rust, but the type doesn't match what's needed), so you need to specify it as insurance_end_date.lt(today_date))


C++ has a similar thing with sqlpp11: https://github.com/rbock/sqlpp11


Looks similar to Laravel’s Eloquent, which is inspired by Rails.


All the ORMs I have used can easily do the equivalent of this

     select count(*) from cars where condition;
If programmers are doing what you say that they are then the programmers are the problem not the library.

Furthermore most ORMs (certainly any that I would consider using!) allow escaped SQL to be used - and if the query gets much more complicated than a couple of where clauses I consider using this feature.

A decent ORM used well allows programmers to program faster on the simple stuff but still write fast code for the complex stuff.


ORMs are terrible but you can make the same mistakes in the article without a ORM, or in a query builder

I think the article alludes to much more important problems like querying in a loop, querying the same information again just because it's not in function/object scope

A lot of the mistakes I see are because developers don't learn SQL, use it wrong, then assume it's slow, then use NoSQL, then reenforce each other into believing NoSQL is saving their performance

Let's not make this a conversation on ORM or not, but how do we make sure developers understand how to properly make use of their relational databases, which for the last decade or so has been painted as old & cruddy compared to sexy NoSQL


This thread demonstrates that ORM is still (one of several of?) our field's Vietnam war... http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...

On top of the data/frameworks issue there's the one of data locality. Frameworks make the tradeoff decisions for you, but not always the right ones since the right tradeoffs are application-dependent. In one context using a cluster of commodity machines that do a mix of compute for your service and store and shuffle data around is a good idea. In another context a few beefy servers with the beef divided up between the app servers and database servers depending on where most of the compute most efficiently takes place is a better idea. (Though even with poor performance design, hardware is still good enough that a lot of things run just fine by having multiple services on one modest machine. That doesn't stop lots of programmers from reaching for frameworks that demand easily scaleable multi-machine systems even when there's no need and won't ever be need.)

In any case consistency matters: SQL strings are just one form of shipping code to the data to compute remotely instead of retrieving the data and computing locally. When you bring in an ORM, inconsistency is bound to follow with lots of computation that could have been in a query or stored procedure now living in the ORM framework and the application itself.

I can at least understand many people's annoyance with stored procedures though. Most DBs procedural extension languages suck badly enough that shipping JavaScript strings over to NoSQL stores can seem like a big improvement, and various levels of SQL standard conformance is all you typically get to use from the broader design pool of declarative languages.


I love to write SQL queries and even use them extensively in my code. However, almost everyone I talk to resists this, and warns me that "one day you will regret..." It gives me an uneasy feeling that I actually might, though it hasn't happened yet.


I think one of the problems with SQL queries is that as far as your application is concerned they are just strings. There's no typing information or even syntax checking. Also if you do something like 'select * from' then there the results returned aren't deterministic.

If you scatter these throughout your code and then the database schema changes, you have a hell of a refactoring job to make sure everything still works. One advantage of an Orm is that if you keep your objects in line with your database the generated sql will stay correct.

Personally, I am more than happy to take that hit. I think it is a price well worth paying in order to have optimised queries that do exactly what I what them to do. To make it easier for myself though I will always try to keep my queries in one place in the code. Then all my code needs to know is it is getting clients_older_than(32) or whatever..


You can get the best of both worlds by using e.g. jOOQ in Java (allows you to write e.g. db.select(MY_TABLE.MY_COL).from(MY_TABLE) where those values are generated from the database therefore they exist and are of the right type.

It maps 1:1 to the SQL statement that gets executed so there's no magic e.g. extra n+1 queries being introduced without you noticing.

But if you change your schema, re-generate, and immediate compile errors showing you where you're referencing something that's now been deleted, so it's not fragile like putting SQL in a String in your code (where if the schema changes, the compiler can't help you)


Came here to endorse jOOQ as well. It alleviates a lot of boring boilerplate like ORMs do, but you don't trade off control neither take the risk of wrong use silently causing problems.


I wasn't familiar with jOOQ before. It looks pretty comprehensive.


Honestly, jOOQ is one of the killer libraries for Java that keep me using it for backend code.


> There's no typing information or even syntax checking.

Intellij is pretty good at highlighting and pointing out typo errors. The code will still compile though.

> no typing information

Given the popularity of javascript, I thought this would now be considered a feature ;)


Refactoring scattered string-only queries isn't that bad if you're confident with tools like grep/ag/sed and so on and maybe have a bit of foresight in naming things not super generically. You can even add logging at the junction between app query callouts and the DB to log the full queries and the stacktrace of where they came from if you suspect you may have missed things in your text search due to people constructing query strings incrementally with other variables concatenated in. And as you mention having the convention of centralized named procedures (whether stored or in the app itself) mitigates this and other problems since there should just be one place to check.

To me the line of argument around lacking static type checks always felt like FUD, but maybe it's best countered with counter-FUD...

If you've opened up your DB layer to accept strings (that you are supposed to build with a SqlBuilder statically referencing column and table names) you still have no guarantees that everything still works because some code somewhere might have just used a handwritten String instead. In other words you still need tests.

The static type checking value only helps against schema changes that rename or remove things, which is generally pretty rare, and depending on the level of autogeneration and query mapping might not even help if the column type changes. It doesn't help when the semantics change. For example (much more commonly) the introduction of a new column that is expected to be filtered on and/or required to be set a non-null value in inserts. So the static references are mainly reduced to being a mechanism to make finding users of a table easier, and hope that whoever is making the schema changes is going to look around for those users and update them accordingly. (When you don't own the table, as is usually the case in large software with many teams, the table owner similarly doesn't own your code, so that's kind of a vain hope. The best solution is what is done in open code with unknown consumers -- versioning. Stop renaming/deleting/changing the semantics of things, just provide new things, under different versions or namespaces if they really need to share names.) But you can accomplish this to the same effectiveness by creating a static reference to the table near where you execute queries on it, and use easy to read handwritten strings for the rest. In my experience though many queries are trivial enough that a SqlBuilder-esque pattern isn't much overhead, it's not a hard hit most of the time, and more complicated ones may belong as stored procedures if you've already invested in that direction.

If you go the full ORM route, which it sounds like you aren't suggesting since you mention optimized queries that do exactly what you tell them to do, the table owner may graciously update the central object builder to set a non-null default value for everyone (or specify one in the table def), which would maybe stop things from breaking immediately, but maybe wouldn't actually stop breakages (especially on the select side where data is retrieved that under the new semantics was meant to be filtered), so end users are still on their own for whether this new column matters to them or not. And that's just one type of schema change that's not a simple rename or column removal, there are many others.


Don't let the dark side blind you. Managing data is complex. ORMs add more complexity and they make data look easy. IMHO the biggest deal with most ORMs is that the satisfy the programming platform and coders aesthetics.


Writing better code makes programs more robust and reduces maintenance cost. For a management point of view, the boss may think that your project was more easy than the project given to the other team that butchered their work.


So long as you're using prepared statements, you should be okay.


What difference does using prepared statements make?


You can avoid SQL injection. It makes it harder to shoot yourself in the foot, though still possible to inject if your prepared statement includes a string interpolated variable.


Yes, good point.


You have abstraction between your code and DB you can change underlying tables but interface will remain consistent requiring no changes to the application. There are security benefits too easier to prevent SQL Injection, you also only give the app user permission to execute sps so no arbitrary queries can be run.


The only way you'll regret it is if you type the query directly into an editor which doesn't support syntax highlighting or doesn't validate the query before you run it and it blows up in production. This is 'typo crashed my code' kind of stuff.

The solution becomes really obvious if it happens once for 95% of people good at computers.

You might regret it a bit more if you do something like a filesystem read to get the query.

There's your regret. It hurts so much, doesn't it? :)


That's one thing I love about Gemstone/Smalltalk.

In Smalltalk, you would write such a query as:

   (cars select: [ :car | car insurance endDate < now ]) size.
(There probably is a shortcut for just counting the elements, though admittedly you have to have each car for the query).

So what Gemstone does is that when cars is a DB collection, it interprets the block and turns it into a query. So the code for iterating in-memory and the code for doing an optimized query is the same.


In Ruby, using Sequel:

    DB[:cars].where { insurance_end_date < Time.now }.count
(replacing "Time.now" with Sequel.function("NOW") if you want server side, or whatever date/time formatting function you want that'll return time/date in your desired format client side; client side it'll be evaluated once)

I think most ORMs can do a reasonable job at this in languages where you have sufficient flexibility in overloading behaviour.


Anyone who does this is just bad. I’d be surprised if they could do SQL without an ORM either. I really have to question the value of a CS degree when there are all of these people who have them and still don’t seem to realize that running database queries in a loop is a bad idea. And they’ll still get hired because they’ve successfully memorized some data structure trivia that they’ll almost certainly never use.


This isn't what happens when you use an ORM, this is what happens when you have a JavaScript code monkey writing data access code and then slapping the word "Enterprise" on it.

The places I've worked at that have written true Enterprise software, whether consulting, publicly traded healthcare companies, banks etc, have had extremely knowledgeable DBAs writing stored procedures for data access. The ORM usage was reserved for modifying the results of stored procedures or small things like account management. Nothing truly critical to the business was handled through an ORM directly pulling data from tables.


The problem isn't "kids these days." The problem is relational databases. It's a profoundly non-intuitive and fundamentally broken model. Developers shudder away from SQL precisely because it is so alien from the problem they're trying to solve.

As for performance, note that even when you know exactly what you're doing it is hard to wring good performance out of relational databases. This is why companies spend hundreds of thousands of dollars a year to hire "Database Administrators" to design and "tune" their database. (What other piece of software requires a highly paid, full time expert?) The most complicated databases are so complicated that we get "Junior DBAs" and "Senior DBAs" and multiple levels of certification.

And let's not forget what happens at extreme scales. At the lowest latencies and the largest data sets relational dbs are simply impossible to use. This might never be a problem for most businesses who are processing a few hundred messages a second (if that) but it should be on the mind of any startup that hopes to one day have millions of customers.

In the long run memory will become cheaper, faster and persistent. (Let us pray.) When that happens most everybody will abandon the big mess that are relational databases and just manipulate objects in memory as the gods intended. Then the relational model just becomes something to scare grand kids with.


Interesting. I have had a couple of people in interviews (who know SQL/Databases on cv) writing code a lot like that pulling in two whole tables, joining them, and then filtering the joined results.

It is possible i was overly harsh in my judgement, as assumed meant no real familiarity at all.


People do this because it’s easier, takes less time, and is within the paradigm their head is already in all day.

We should fix our abstractions instead of telling people they don’t understand data. The latter is fine every so often, but doesn’t scale and isn’t even true most of the time.


Or maybe we should just nut up and tell people they don’t understand data?


Why not both? Truly, we should work to have better abstractions. We shouldn’t glorify making our own lives more difficult. And we shouldn’t glorify ignorance either! It’s not an either/or dichotomy here


What a stupid comment. Your point has absolutely nothing to do with ORMs and only shows your utter inexperience with them.

You could replace everything you said with using SQL directly and just doing a select * from cars


This comment breaks the site guideline against calling names in arguments. Could you please read https://news.ycombinator.com/newsguidelines.html and stick to the rules, regardless of how wrong or annoying another comment might be?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: