One thing that bothers me is the 'relational databases are good enough' statement, that is repeated in other contexts as well.
But especially here, where we're talking about reducing complexity, it feels off to me. PostgreSQL and MySQL seem to me like incredibly complex packages. SQL, the language, is not easy to master either; most programmers I meet know mostly basics. On top of that, there's a long ongoing history of security malpractice.
When talking about reducing complexity, CouchDB and Redis are far easier alternatives, in my humble opinion, though they go slightly against 'use the tools developers know'.
The implementation of PostgreSQL is complex, no doubt about that. But if you need strong data consistency and durability guarantees, it provides a rock-solid foundation.
SQL might take some getting used, but it is also not rocket science. It shouldn't take more than a week's study to master the basics. There is of course a lot of awful SQL code out there, exactly because most programmers don't even know the basics. You can do incredibly powerful things in it that would take 10x the code in an OO/procedural language. In my opinion dumping an ORM on top is also not the best way to leverage the strengths of an RDBM.
It is slightly ironic that you bring up security malpractice in the context of PostgreSQL, when in the next sentence you advocate Redis as a far easier alternative. As was recently in the news the Redis defaults were for a long time insecure (google for Fairware ransomware).
> In my opinion dumping an ORM on top is also not the best way to leverage the strengths of an RDBM.
I agree. Unfortunately, the way I see people usually using them is pretty bad - you should not let ORM-generated stuff dictate your business model. Database is a database. A storage layer. Business objects will not map 1:1 to ORM objects. Approaches like "let's inherit from ORM class and add business-related methods", in my experience, lead to total disaster. One has to respect the boundary between storage layer and business model layer.
I'm aware of the different approaches (mostly from reading Fowler's PoEAA), but currently we use an Active Record-style ORM with a few extra features (like Class Table Inheritance) and we haven't found any major issues with this approach. What was the worst case scenario you experienced with the 1:1 approach?
Over the past 5 years I've been in two projects using 1:1 ORM Active Record == business model base approach. One completely failed in part because of this, second is barely manageable, but I managed to save it by moving business code mostly to the outside of Active Record classes.
The problem I encountered in those projects is the mismatch between storage mental model and business mental model, which lead to explosion of crappy code (AKA technical debt). In particular:
1. the classes I need for business model may have initially mapped well to database tables, but over time they stop; business logic and model changes much faster than you'd like your DB schema to
2. since many things in AR can fire SQL queries, you have to keep in mind the workings of your database when doing almost every operation on your model; it's an abstraction leak
3. code shooting off SQL queries is randomly called from all over your codebase; it's harder to keep track of it and, if needed, optimize those queries
I like AR as a convenient API to get data from/to database, but given the point 1., I eventually learned to isolate AR layer as something below business model layer, so that the pattern is that business model is explicitly serialized and deserialized from database, instead of the database being coupled with the logic of your program.
Now I vaguely recall complaining about this before on HN and getting my ass handed back to me by someone who pointed out that these are all ORM n00b mistakes. I wish I could find that comment (pretty sure I noted the link down somewhere). Yeah, I admit - in those two projects I mentioned, we were all ORM noobs. So we've learned those lessons the hard way.
> Approaches like "let's inherit from ORM class and add business-related methods", in my experience, lead to total disaster.
I don't disagree, in fact I'd go further and say that data and logic should not be coupled, but this is the Active record pattern which is far from the only way to use an ORM, most ORM's won't even support this pattern by default.
The best ORM ever, F#'s FSharp.Data.SqlClient. A very thin layer that let's you statically program in SQL in your app. But I typically just use Functions/Stored Procs. But sometimes, for one off things and experimentation it can be nice to write SQL directly in your app.
Relational databases are not simple systems as you say, but they do seem to me simpler to use - especially in the 95% case where a single, large enough machine hosting postgresql/mysql is entirely sufficient.
Key-value stores are "easy", but what I think isn't easy is to reduce your business domain to a simple key-value model without sacrificing promises and gaurantees offered by a good relational database system.
>But especially here, where we're talking about reducing complexity, it feels off to me. PostgreSQL and MySQL seem to me like incredibly complex packages. SQL, the language, is not easy to master either; most programmers I meet know mostly basics. On top of that, there's a long ongoing history of security malpractice.
PostgreSQL and MySQL are very complex, but the complexity is entirely contained. Both are well-tested and reliable, so developers can deploy them without worrying much about them.
I would disagree about SQL being difficult to master. The basics are all most people need, and are not at all difficult to learn. The more advanced stuff (e.g. CROSS APPLY) is not necessarily standard across implementations, and can usually be replaced with application code.
>When talking about reducing complexity, CouchDB and Redis are far easier alternatives, in my humble opinion, though they go slightly against 'use the tools developers know'.
I can't say I'm that familiar with CouchDB, but Redis is entirely inappropriate for most SQL use cases. It's a key value store, and is not meant to do any sort of advanced queries.
Minor aside; I don't think the complex bits of sql are things like cross apply - that's almost no different from a join, especially if you're from a non-sql background where "joins" are typically statements+loop-equivalents and typically hierarchical and ordered. If people have difficulty with cross apply, they're just not trying.
Of course, if you regard sql as something you'd rather not "waste" time on, of course you're going to find those kind of subtle distinctions confusing - sort of like how people think css is difficult.
The more reasonably "complex" bits are the update visibility semantics, i.e. which transaction isolation levels mean what in various scenarios.
That's really complex, and it's truly somewhat unique to sql in that most alternatives simply don't bother trying to solve those problems at all - that can be a bad thing, but it is simpler.
Postgres and MySQL aren't overly complicated for simple use cases.
A lot of other technologies drop complications like foreign keys which you won't miss when you start developing software but it gives guarantees you will miss sorely when you start seeing inconsistent data 6 months in.
The expensive and complicated thing about these is deployment and maintenance. But then, you could instead pick sqlite and switch to big one when needed.
Sometimes it is good to get a pickup truck ahead of time, but often a smaller less versatile car will suffice. But not quite a motorbike.
Go with SQLite if you can get away with it. It's a library, not an external engine, and databases are stored inside normal files, which makes a lot of things easier if you're building a standalone app (as opposed to server-side software).
SQL is well worth its time to learn. It's a good DSL for relational data. Most programming languages used for regular code are not very convenient with relational data. As for its security issues, this is actually simple - one has to respect SQL as a real programming language with its own syntax and grammar, instead of resorting to idiocies like gluing strings together in an ad-hoc manner.
I would say SQL is easier to master than most functional or procedural languages. The true issue, in my experience, is that it is different enough most developers don't want to take the time to learn it beyond the basics, much to their detriment.
I think you're forgetting about practicality and not reinventing the wheel.
I'm going to use a car analogy here. Modern cars are incredibly complicated machines. They're also generally very reliable, thanks to about a century of development and engineering, and (relatively) inexpensive thanks to economies of scale.
If I want to transport myself to work on weekdays, and transport my girlfriend and maybe another friend on a weekend trip, and carry a bunch of groceries once a week, I can do all of that with a standard 5-person car. It isn't completely optimal for any of those tasks: it has more space than it needs for any of them, especially the weekday commuting. For my daily commute, I don't even reach highway speeds because I live close to work, so the engine is seriously overkill.
So I could have a custom-designed vehicle for each of these use-cases. Each vehicle would be a little less complex than my current car. But that's a lot to maintain, and would surely be far more expensive and less reliable, since each one is a one-off, requiring custom design and engineering, special parts, etc., and not benefiting from the economies of scale and engineering resources that a mass-market car gets. So instead, I just go buy a ready-made car and use it, and it works great and I'm happy.
Is PostgreSQL overkill for a lot of uses? Probably so. But it's designed to be used for all kinds of different tasks, and while it may not be quite as efficient for any of those tasks as some custom-designed solution, it's far more flexible, and it's readily available, plus it's benefitted from an enormous amount of engineering and debugging that a custom-designed solution would not. Things like CouchDB don't have nearly the number of users and amount of development, so while they may make sense for some tasks, the fact that PostgreSQL has more lines of code does not necessarily mean it's less reliable, in fact the opposite is likely true, just like an off-the-lot Honda or Toyota is likely much more reliable than some custom-designed car that someone built in their garage or some high-end limited-production exotic car like a Ferrari or Bentley.
The reason for using an off-the-shelf solution is because it's fast and easy and reliable. It doesn't matter if you're not making use of 80% of the features or capabilities. And software isn't like cars or engines; hard drive space is nearly free, and except for certain applications you're not likely to see a significant downside to just using a standard SQL database versus something more tailor-made. The main problem is cost (like with Oracle), but with PostgreSQL or MySQL this isn't an issue since they're free (and Free). It also helps that they use a standardized query language which makes them much more accessible.
I've found that as soon as you change your data model and now need to deal with old data, document stores get just as complicated but require more custom solutions.
But especially here, where we're talking about reducing complexity, it feels off to me. PostgreSQL and MySQL seem to me like incredibly complex packages. SQL, the language, is not easy to master either; most programmers I meet know mostly basics. On top of that, there's a long ongoing history of security malpractice.
When talking about reducing complexity, CouchDB and Redis are far easier alternatives, in my humble opinion, though they go slightly against 'use the tools developers know'.