Simplifying Join Syntax

david_p · on May 17, 2024

I love the idea of a simplified SQL syntax for joins.

I have been working with graph databases for years now: these databases had to solve this problem from day one, because of the focus on relationships between entities.

I must point out that Neo4j was the first to propose a syntax that made traversal feel simple and natural again: the Cypher query language.

Neo4 and other industry players have spent years working on a new standard query language for graph databases that was released in April this year: GQL. GQL is the first database query language normalized by ISO since SQL, so it’s a big deal.

Anyway, if you wanna learn more about GQL, that a look at https://www.gqlstandards.org/

ghusbands · on May 17, 2024

Graph databases aren't particularly relevant to a novel syntax for relational databases. The part of the related standard that is about relational databases (SQL/PGQ) demands that every table hold either edges or vertices, but not both [1], which is very limiting.

[1] https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-fin...

aidos · on May 17, 2024

I struggled to find the actual language at the link you posted - is there an obvious example somewhere?

Given that graphql is often shortened to gql is feel this is going to get confusing!

kirmerzlikin · on May 17, 2024

This paper [1] explains some basics

[1] https://arxiv.org/abs/2112.06217

atombender · on May 17, 2024

I think InterSystems Caché did it first. Its SQL join syntax lets you do `WHERE employee->manager->manager->name`. I'm not sure it has arbitrary recursion like you can do with graph databases, however.

twic · on May 17, 2024

> I must point out that Neo4j was the first to propose a syntax that made traversal feel simple and natural again

Was it the first? Does it predate path expressions in Hibernate Query Language / Java Persistence Query Language?

funcDropShadow · on May 17, 2024

The presented idea shortens the given examples, but is not composable. What happens if you have 1:N instead of 1:1 relation? Or even a N:M relation. Where do you specify whether you want an innner / outer / left / right join? This proposal works for some simple queries but fails to capture the generality of the relational model.

So, from a language development point of view, one has to ask: Is this special case worth the extra syntactical sugar? It has the downside, that when a query evolves and falls out of the special case you have to reformulate the syntactical sugar yourself. This creates friction, and it is still necessary to understand the relational model.

Other commenters mentioned Neo4j as an example where similar ideas have been implemented. From my limited experience with Neo4j, I'd say it makes a lot of sense there, because graph queries will often fall into the sub class of queries, that can benefit from the syntactical sugar.

All in all, I would not call this a simplification. Syntactical sugar never is a simplification. It is an "easification". It makes certain examples easy and hides what is going on, without really abstracting it away.

shiandow · on May 17, 2024

They explain 1:N relations in the article, so I won't repeat that here. A N:M relation will be presented the same way as in all other relational databases, it requires a separate table with rows containing both keys.

And if I'm honest I think this captures more of the relational model than SQL does, you might be confusing the two. This synctactic sugar is actually using the properties relations have, rather than SQL which allows you to compute all kinds of things whether they make sense or not.

Note that they talk about using foreign keys for this purpose. Really what this is doing is turning the relation formed by this column and the primary key into a function (see [1] for an article I wrote on the subject), which allows for some nicer syntax because functions are nicer than general relations. This means most of the problems you mention can be resolved through simply enforcing that constraint. In a sense this moves the problem, but it does mean you can't accidentally invalidate the query. And frankly having some syntactic sugar for foreign keys in SQL is a feature that's long overdue.

The main downside is the inability to do anything other than equijoins, and the inability to specify new relations on the fly. The latter is a bit of a problem, but not insurmountable. I can't figure out a reasonable way to do anything other than equijoins, but that might be for the best.

Also, ironically what I'm really missing is how to do an actual join, it's nice to explicitly specify functions but if you've got two foreign keys to the same table (functions with the same codomain) how do you calculate the join (pullback)?

[1]: https://pragmathics.nl/2023/10/24/putting-the-relational-bac...

ghusbands · on May 17, 2024

> All in all, I would not call this a simplification.

It is a simplification, as it gives you less that you need to understand and decode. Working with a query with ten to twenty joins and join conditions, you have to juggle a lot of intermixed concerns and you probably have to read a lot of query text and table/column definitions. And you have to look back and forth to know which tables are pulled in for checks and which for data. For example, this with just four tables:

  SELECT alpha.*, epsilon.zoo FROM beta INNER JOIN alpha ON beta.id=alpha.foo LEFT JOIN epsilon ON alpha.boing=epsilon.id INNER JOIN gamma ON gamma.id = beta.bar WHERE gamma.baz='bar'

Requires you to understand more about everything and spreads things out more than this equivalent:

  SELECT *, boing.zoo FROM alpha WHERE foo.bar.baz='qux'

(This is based on a real query I was given to work with.)

asplake · on May 17, 2024

“As all joins involve the primary key…”

Often but not always the case

Edit: at least not in SQL, and therefore SQL databases. If the language being described here isn't actually SQL, that could still be a problem.

maweki · on May 17, 2024

This implicit JOIN feature is already present in HQL and also in Jooq. It's very convenient, especially when you use a star-like table structure where the important data table is basically just IDs and references.

https://www.jooq.org/doc/latest/manual/coming-from-jpa/from-...

twic · on May 17, 2024

Yes, this is a lot like path expressions in HQL / JPQL.

There is a key difference, i think - in JPA land, relationships between classes are named at both ends. So an OrderDetail has an "Order order" field, but an Order also has a "Set<OrderDetail> details" field. That means you already have a name to use when going from an order to its details - you can say "sum(order.details.price)". Whereas the language in the article has to make it implicit, with (IMHO weird!) syntax like "OrderDetail.sum(price)".

This starts to hurt more when you have multiple collections of the same type on an entity. Consider:

  City
    city_id
  Person
    birth_city_id
    residence_city_id

What would

  SELECT city_id, Person.COUNT()
  FROM City

do here?

fjfaase · on May 17, 2024

I think it is important to realize that relational, object-oriented and structured data (such as XML and JSON) are all implementation of an often far more richer data model. Would it not be better to implement something for the richer data model, such that querying relational data models to produce JSON (for example) would become trivial?

A long time ago, I made some attempts about this: https://www.iwriteiam.nl/AoP_spec_stat.html Not so long ago, I worked a bit on a data oriented language with cursors, compounds and components: https://github.com/FransFaase/DataLang.

bionhoward · on May 17, 2024

Cool idea! I always thought this was the purpose of cypher match statements but your syntax is much shorter and more familiar dot-pathing

One idea would be for the x.y.z to use functions like

x.y(argument).z

This way you could parameterize the traversals and it’d wind up looking like gremlin in sql

Judyrabbit · on May 17, 2024

x.y(argument).z should be OK in syntax, but it is hard to implement. we can not predict which data type y(argument) would return, and if it can perform dot operation further. We have implemented DQL(The syntax mentioned here) by translating to SQL. But we priovide only Chinese version now, we will release the English version serveral months later, please keep an eye out if you're interested

twic · on May 17, 2024

> Both tables use id field as their primary keys. Managers are also employees, so the two tables share the ids. Since managers have more attributes, their information is stored in a separate table.

> Now we want to find the total income (including the allowance) of each employee (including every manager).

> A JOIN operation is necessary for SQL to do it:

  > SELECT employee.id, employee.name, employy.salary+manager.allowance
  > FROM employee
  > LEFT JOIN manager ON employee.id=manager.id

> But for two tables having a one-to-one relationship, we can treat them like one table:

  > SELECT id,name,salary+allowance
  > FROM employee

What about employees who aren't managers? I assume they have no entry in the manager table. The SQL would ignore them, because it's a left join, which is not what was asked for. Does the proposed query do the same?

What happens if there is also

  salesperson table
    id
    allowance

? Which table is joined?

This language seems a little half-baked.

beagle3 · on May 17, 2024

A 30 year old implementation of the same idea (even slightly better IMHO) can be found in kdb+/ksql/shakti, you can see examples and comparison in https://shakti.com/ (press document, then sql.d - no idea how to link directly). And yes, it is by Arthur Whitney and related to the [in]famous K

ghusbands · on May 17, 2024

The direct link is https://shakti.com/document/sql.d

zeronone · on May 17, 2024

jOOQ has similar feature

https://www.jooq.org/doc/latest/manual/code-generation/codeg...

ghusbands · on May 17, 2024

Many ORMs have similar features - jOOQ is not unique in this. The article is talking about an addition to SQL syntax, not a DSL embedded in another language.

alexisread · on May 17, 2024

Didn't https://www.odata.org/ do this already? If the Odata server understands the table relations, you can navigate through table relations in the REST query. I found that most parts of the business prefer filtering more than joining, learning another DSL other than SQL was a real barrier, and that many BI tools only support a subset of Odata, so are pretty much useless.

I definitely prefer concatenative (monadic) syntax a la Linq though, as it allows better scoping of efficient joins without a planner - it allows you to duck-tape (allusion intended) together a platform service easily.

pavi2410 · on May 17, 2024

SOQL in Salesforce supports "relationship queries" that works on both directions, viz, parent-to-child and child-to-parent.

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl....

PeterZaitsev · on May 17, 2024

There are many good ideas to improve SQL, though it is not easy to get traction as it is such an old standard with so many implementations

atemerev · on May 17, 2024

See “natural join”, which already exists in SQL; I use it all the time.

ghusbands · on May 17, 2024

Natural joins are terrible in that they match on any equal column names. Let's hope you don't have employees with names and departments with names.

In any case, even with natural join syntax, you end up with queries longer than those in the article.

doctor_eval · on May 17, 2024

Yeah I thought natural join was the same, but I keep forgetting to use it!

tucnak · on May 17, 2024

More geniuses trying to "improve" SQL, and yet the only party to succeed in doing this is Postgres people, of actual relational SQL database—by means of dialect.