Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
All of NoSQL is because of this (maxdemarzi.com)
16 points by maxdemarzi on Feb 27, 2023 | hide | past | favorite | 17 comments


My personal theory is that NoSQL happened because most people found relational database schema changes far too difficult.

I've been protected from this by the Django migrations system for over a decade at this point (that landed in Sep 2014, but predecessor South was usable for quite a few years before that point).

But if you don't have a good migrations system in place, changing the schema of your relational database (and keeping those changes synced across development machines, staging, production etc) is enormously painful - even more so for applications with a lot of production traffic.

I've seen many situations where engineers have made sub-par design decisions in order to avoid having to add or modify a column on a large, existing table for a production system.


I think it is more to do with the so called "Object–relational impedance mismatch", which I guess is related to schema migrations.

NoSQL would still have migration issues as the types change over time, much like the table schemas. You would need to be able to read/write older objects stored, or transform them all to the newer object schema in one go.

I remember when Mongo DB first came out, it seemed the most liked feature was that you could just JSON.stringify and JSON.parse things into and out of the db.

I personally prefer SQL tables, as the act of designing the tables and their relations seems to half way solve a problem, and removes the issue of "which tree structure should this value live in, how do I copy it to both of these types?".


Django is such a blessing in many regards. Protected me from SPA, JWT, Database Migrations, GrapghQL (realized I don't need it with Djangos ORM) to name a few, while never really stopping me to use these technologies if I really wanted to.

Thanks for your work!


What? No. NoSQLs don't help with joins, as they don't have them at all. Meaning every "join-like" has to be client side and entails transmitting stuff to the client, which is even worse than temporary tables.

There are multiple reasons to use NoSQL and non of them is this one.

Having said that - cool, I like relational databases and love my multi-joins going faster.


They don't have joins at all because of how expensive binary joins are to do. NoSQL pre-joins relations (graphs dbs), pre-joins foreign keys (document dbs) , pre-joins everything (queries) (wide column dbs). Saying "all" is a bit of a hyperbole but it gets to the point of the matter.


> What? No. NoSQLs don't help with joins, as they don't have them at all. Meaning every "join-like" has to be client side and entails transmitting stuff to the client, which is even worse than temporary tables.

I have worked with two NoSQL databases and both have support for basic joins. Not that it's needed very often.

https://www.mongodb.com/docs/manual/reference/operator/aggre...

https://www.arangodb.com/docs/stable/aql/examples-join.html


Many popular NoSQL DBs do support joins now. However, the fact that with NoSQL, you tend to structure your data so joins aren't necessary IS one way of solving the problem relational DBs have with "join performance".

The problem is basically:

- Say you have a query filtering on 2 fields, e.g. `WHERE x = ... AND y = ...`. If you have separate indexes on x and y, if both filters return a lot of records, you'll have perf problems. Internally, the DB needs to fetch all of the records where `x = ...` AND all of the records where `y = ...`, them and then combine in memory. This is very, very slow with large data volumes, even if there are very few records that satisfy the combo (e.g. lots of records where `x = ...`, lots of records where `y = ...`, but few records where both `x = ... AND y = ...`)

- The same is true for many related problems, like filtering on 1 index, then sorting on another (e.g. the common `WHERE x = ... ORDER BY y LIMIT ...` queries)

- If your data is all in a single table/collection/whatever, then you can fix these performance issues with a compound index on (x,y), which allows you to efficiently filter/sort on the combination of x and y, without having to fetch huge intermediate record sets. But if your data is in different tables/collections/whatever, I'm not aware of any popular DBs that allow the creation of cross-table compound indexes, so that's out

NoSQL's hierarchical, join-free approach means you CAN create that compound index most of the time, while with relational DBs, you much more often can't. With NoSQL, it's much more likely that when these slow queries emerge, fields x and y are already in the same "collection". Traditionally, with relational DBs, people have solved these perf issues via denormalizing - if fields x and y live in different tables, you decide to keep a copy of field y on table x as well, to allow you to speed up the query with a compound (x,y) index. Works, but denormalizing sucks. NoSQL DBs are another approach, basically eliminating joins with more hierarchical data, but that has issues too (including, very often, requiring you to denormalize!).

"Worst-case optimal joins" are a way to have relational DBs solve this for you, without needing to resort to denormalization and compound indexes, which is very exciting! Instead of saying "fetch all the data where x = ..., and all the data where y = ..., then find the intersection in memory", they try hard to avoid this. e.g. intelligently fetch a little from x, a little from y, see if that's enough. It's a great concept, but only recently seeing much serious attention, and most big relational DBs don't support them yet.


Are join algorithms like this really the alternative to compound indexes? I understand that they reduce the memory required for joins, but the amount of data involved in a join is the same.


The real reason NoSQL took off was ease of onboarding for writing MVP applications really quickly at startups. No need to come up with a detailed and performant table structure in advance -- and that's a good thing because your schemas are going to change every week! The languages you use in your stack are probably not typesafe anyway! Just extend the REST API all the way to the database and stick your JSON in it.

There are two main reasons it continues to thrive: most developers who entered the industry since ~2010 have not used anything else, and the traditional SQL databases are painful to set up in multi-region cloud topologies without a dedicated DBA team.


It took about 10 years, but worst case optimal joins and multi-way joins in general are finally fixing the Join problem in databases that led to the proliferation of NoSQL systems over the past decade.


The reason NoSQL exists is that many programmers want a persistent hash table with the guarantees (and man-hour investment) of a database.

(Excluding graph database use-cases, of course.)


At a lot of startups - at least a few years ago - it seemed more like people just not understanding databases at all and wanting to try something new.

I guess since NoSQL isn't new anymore, maybe the database choice at startups is sensible again - as they try something new in another area?


To be fair, whether or not you understand SQL, it does integrate spectacularly poorly with most programming languages. Sure you can use ORM, but a massive cost of capability and performance.

Database rows simply aren't objects. You have to squint so hard you risk retinal detachment before the two are even a little bit the same.

Other database paradigms exist, and have existed for a long time. File systems spring to mind. The Windows registry is another.

A case can at least be constructed for using something other than a relational database if all you want is to persist objects in a consistent manner.


I seemingly had a disagreement with a hiring VP last year who asked why I thought NoSQL became so popular. This was not the answer I gave, nor is it the answer he apparently wanted.


I don't really think this is the main reason people use NoSQL.


Consider changing the title to "Worst case optimal joins"? Original title sounds clickbait-y.


All? Seems a stretch...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: