All of NoSQL is because of this

simonw · on Feb 27, 2023

My personal theory is that NoSQL happened because most people found relational database schema changes far too difficult.

I've been protected from this by the Django migrations system for over a decade at this point (that landed in Sep 2014, but predecessor South was usable for quite a few years before that point).

But if you don't have a good migrations system in place, changing the schema of your relational database (and keeping those changes synced across development machines, staging, production etc) is enormously painful - even more so for applications with a lot of production traffic.

I've seen many situations where engineers have made sub-par design decisions in order to avoid having to add or modify a column on a large, existing table for a production system.

justsomeuser · on Feb 27, 2023

I think it is more to do with the so called "Object–relational impedance mismatch", which I guess is related to schema migrations.

NoSQL would still have migration issues as the types change over time, much like the table schemas. You would need to be able to read/write older objects stored, or transform them all to the newer object schema in one go.

I remember when Mongo DB first came out, it seemed the most liked feature was that you could just JSON.stringify and JSON.parse things into and out of the db.

I personally prefer SQL tables, as the act of designing the tables and their relations seems to half way solve a problem, and removes the issue of "which tree structure should this value live in, how do I copy it to both of these types?".

number6 · on Feb 28, 2023

Django is such a blessing in many regards. Protected me from SPA, JWT, Database Migrations, GrapghQL (realized I don't need it with Djangos ORM) to name a few, while never really stopping me to use these technologies if I really wanted to.

Thanks for your work!

globalreset · on Feb 27, 2023

What? No. NoSQLs don't help with joins, as they don't have them at all. Meaning every "join-like" has to be client side and entails transmitting stuff to the client, which is even worse than temporary tables.

There are multiple reasons to use NoSQL and non of them is this one.

Having said that - cool, I like relational databases and love my multi-joins going faster.

maxdemarzi · on Feb 27, 2023

They don't have joins at all because of how expensive binary joins are to do. NoSQL pre-joins relations (graphs dbs), pre-joins foreign keys (document dbs) , pre-joins everything (queries) (wide column dbs). Saying "all" is a bit of a hyperbole but it gets to the point of the matter.

frankjr · on Feb 27, 2023

> What? No. NoSQLs don't help with joins, as they don't have them at all. Meaning every "join-like" has to be client side and entails transmitting stuff to the client, which is even worse than temporary tables.

I have worked with two NoSQL databases and both have support for basic joins. Not that it's needed very often.

https://www.mongodb.com/docs/manual/reference/operator/aggre...

https://www.arangodb.com/docs/stable/aql/examples-join.html

yashap · on Feb 27, 2023

Many popular NoSQL DBs do support joins now. However, the fact that with NoSQL, you tend to structure your data so joins aren't necessary IS one way of solving the problem relational DBs have with "join performance".

The problem is basically:

- Say you have a query filtering on 2 fields, e.g. `WHERE x = ... AND y = ...`. If you have separate indexes on x and y, if both filters return a lot of records, you'll have perf problems. Internally, the DB needs to fetch all of the records where `x = ...` AND all of the records where `y = ...`, them and then combine in memory. This is very, very slow with large data volumes, even if there are very few records that satisfy the combo (e.g. lots of records where `x = ...`, lots of records where `y = ...`, but few records where both `x = ... AND y = ...`)

- The same is true for many related problems, like filtering on 1 index, then sorting on another (e.g. the common `WHERE x = ... ORDER BY y LIMIT ...` queries)

- If your data is all in a single table/collection/whatever, then you can fix these performance issues with a compound index on (x,y), which allows you to efficiently filter/sort on the combination of x and y, without having to fetch huge intermediate record sets. But if your data is in different tables/collections/whatever, I'm not aware of any popular DBs that allow the creation of cross-table compound indexes, so that's out

NoSQL's hierarchical, join-free approach means you CAN create that compound index most of the time, while with relational DBs, you much more often can't. With NoSQL, it's much more likely that when these slow queries emerge, fields x and y are already in the same "collection". Traditionally, with relational DBs, people have solved these perf issues via denormalizing - if fields x and y live in different tables, you decide to keep a copy of field y on table x as well, to allow you to speed up the query with a compound (x,y) index. Works, but denormalizing sucks. NoSQL DBs are another approach, basically eliminating joins with more hierarchical data, but that has issues too (including, very often, requiring you to denormalize!).

"Worst-case optimal joins" are a way to have relational DBs solve this for you, without needing to resort to denormalization and compound indexes, which is very exciting! Instead of saying "fetch all the data where x = ..., and all the data where y = ..., then find the intersection in memory", they try hard to avoid this. e.g. intelligently fetch a little from x, a little from y, see if that's enough. It's a great concept, but only recently seeing much serious attention, and most big relational DBs don't support them yet.

weird_user · on Feb 28, 2023

Are join algorithms like this really the alternative to compound indexes? I understand that they reduce the memory required for joins, but the amount of data involved in a join is the same.

_skel · on Feb 27, 2023

The real reason NoSQL took off was ease of onboarding for writing MVP applications really quickly at startups. No need to come up with a detailed and performant table structure in advance -- and that's a good thing because your schemas are going to change every week! The languages you use in your stack are probably not typesafe anyway! Just extend the REST API all the way to the database and stick your JSON in it.

There are two main reasons it continues to thrive: most developers who entered the industry since ~2010 have not used anything else, and the traditional SQL databases are painful to set up in multi-region cloud topologies without a dedicated DBA team.

maxdemarzi · on Feb 27, 2023

It took about 10 years, but worst case optimal joins and multi-way joins in general are finally fixing the Join problem in databases that led to the proliferation of NoSQL systems over the past decade.

acuozzo · on Feb 27, 2023

The reason NoSQL exists is that many programmers want a persistent hash table with the guarantees (and man-hour investment) of a database.

(Excluding graph database use-cases, of course.)

onlyrealcuzzo · on Feb 27, 2023

At a lot of startups - at least a few years ago - it seemed more like people just not understanding databases at all and wanting to try something new.

I guess since NoSQL isn't new anymore, maybe the database choice at startups is sensible again - as they try something new in another area?

marginalia_nu · on Feb 27, 2023

To be fair, whether or not you understand SQL, it does integrate spectacularly poorly with most programming languages. Sure you can use ORM, but a massive cost of capability and performance.

Database rows simply aren't objects. You have to squint so hard you risk retinal detachment before the two are even a little bit the same.

Other database paradigms exist, and have existed for a long time. File systems spring to mind. The Windows registry is another.

A case can at least be constructed for using something other than a relational database if all you want is to persist objects in a consistent manner.

ergonaught · on Feb 27, 2023

I seemingly had a disagreement with a hiring VP last year who asked why I thought NoSQL became so popular. This was not the answer I gave, nor is it the answer he apparently wanted.

below43 · on Feb 27, 2023

I don't really think this is the main reason people use NoSQL.

smilekzs · on Feb 27, 2023

Consider changing the title to "Worst case optimal joins"? Original title sounds clickbait-y.

malfist · on Feb 27, 2023

All? Seems a stretch...