I love the idea of a simplified SQL syntax for joins.
I have been working with graph databases for years now: these databases had to solve this problem from day one, because of the focus on relationships between entities.
I must point out that Neo4j was the first to propose a syntax that made traversal feel simple and natural again: the Cypher query language.
Neo4 and other industry players have spent years working on a new standard query language for graph databases that was released in April this year: GQL. GQL is the first database query language normalized by ISO since SQL, so it’s a big deal.
Graph databases aren't particularly relevant to a novel syntax for relational databases. The part of the related standard that is about relational databases (SQL/PGQ) demands that every table hold either edges or vertices, but not both [1], which is very limiting.
I think InterSystems Caché did it first. Its SQL join syntax lets you do `WHERE employee->manager->manager->name`. I'm not sure it has arbitrary recursion like you can do with graph databases, however.
The presented idea shortens the given examples, but is not composable. What happens if you have 1:N instead of 1:1 relation? Or even a N:M relation. Where do you specify whether you want an innner / outer / left / right join?
This proposal works for some simple queries but fails to capture the generality of the relational model.
So, from a language development point of view, one has to ask: Is this special case worth the extra syntactical sugar? It has the downside, that when a query evolves and falls out of the special case you have to reformulate the syntactical sugar yourself. This creates friction, and it is still necessary to understand the relational model.
Other commenters mentioned Neo4j as an example where similar ideas have been implemented. From my limited experience with Neo4j, I'd say it makes a lot of sense there, because graph queries will often fall into the sub class of queries, that can benefit from the syntactical sugar.
All in all, I would not call this a simplification. Syntactical sugar never is a simplification. It is an "easification". It makes certain examples easy and hides what is going on, without really abstracting it away.
They explain 1:N relations in the article, so I won't repeat that here. A N:M relation will be presented the same way as in all other relational databases, it requires a separate table with rows containing both keys.
And if I'm honest I think this captures more of the relational model than SQL does, you might be confusing the two. This synctactic sugar is actually using the properties relations have, rather than SQL which allows you to compute all kinds of things whether they make sense or not.
Note that they talk about using foreign keys for this purpose. Really what this is doing is turning the relation formed by this column and the primary key into a function (see [1] for an article I wrote on the subject), which allows for some nicer syntax because functions are nicer than general relations. This means most of the problems you mention can be resolved through simply enforcing that constraint. In a sense this moves the problem, but it does mean you can't accidentally invalidate the query. And frankly having some syntactic sugar for foreign keys in SQL is a feature that's long overdue.
The main downside is the inability to do anything other than equijoins, and the inability to specify new relations on the fly. The latter is a bit of a problem, but not insurmountable. I can't figure out a reasonable way to do anything other than equijoins, but that might be for the best.
Also, ironically what I'm really missing is how to do an actual join, it's nice to explicitly specify functions but if you've got two foreign keys to the same table (functions with the same codomain) how do you calculate the join (pullback)?
> All in all, I would not call this a simplification.
It is a simplification, as it gives you less that you need to understand and decode. Working with a query with ten to twenty joins and join conditions, you have to juggle a lot of intermixed concerns and you probably have to read a lot of query text and table/column definitions. And you have to look back and forth to know which tables are pulled in for checks and which for data. For example, this with just four tables:
SELECT alpha.*, epsilon.zoo FROM beta INNER JOIN alpha ON beta.id=alpha.foo LEFT JOIN epsilon ON alpha.boing=epsilon.id INNER JOIN gamma ON gamma.id = beta.bar WHERE gamma.baz='bar'
Requires you to understand more about everything and spreads things out more than this equivalent:
SELECT *, boing.zoo FROM alpha WHERE foo.bar.baz='qux'
(This is based on a real query I was given to work with.)
This implicit JOIN feature is already present in HQL and also in Jooq. It's very convenient, especially when you use a star-like table structure where the important data table is basically just IDs and references.
Yes, this is a lot like path expressions in HQL / JPQL.
There is a key difference, i think - in JPA land, relationships between classes are named at both ends. So an OrderDetail has an "Order order" field, but an Order also has a "Set<OrderDetail> details" field. That means you already have a name to use when going from an order to its details - you can say "sum(order.details.price)". Whereas the language in the article has to make it implicit, with (IMHO weird!) syntax like "OrderDetail.sum(price)".
This starts to hurt more when you have multiple collections of the same type on an entity. Consider:
City
city_id
Person
birth_city_id
residence_city_id
I think it is important to realize that relational, object-oriented and structured data (such as XML and JSON) are all implementation of an often far more richer data model. Would it not be better to implement something for the richer data model, such that querying relational data models to produce JSON (for example) would become trivial?
x.y(argument).z should be OK in syntax, but it is hard to implement. we can not predict which data type y(argument) would return, and if it can perform dot operation further.
We have implemented DQL(The syntax mentioned here) by translating to SQL. But we priovide only Chinese version now, we will release the English version serveral months later, please keep an eye out if you're interested
> Both tables use id field as their primary keys. Managers are also employees, so the two tables share the ids. Since managers have more attributes, their information is stored in a separate table.
> Now we want to find the total income (including the allowance) of each employee (including every manager).
> A JOIN operation is necessary for SQL to do it:
> SELECT employee.id, employee.name, employy.salary+manager.allowance
> FROM employee
> LEFT JOIN manager ON employee.id=manager.id
> But for two tables having a one-to-one relationship, we can treat them like one table:
> SELECT id,name,salary+allowance
> FROM employee
What about employees who aren't managers? I assume they have no entry in the manager table. The SQL would ignore them, because it's a left join, which is not what was asked for. Does the proposed query do the same?
A 30 year old implementation of the same idea (even slightly better IMHO) can be found in kdb+/ksql/shakti, you can see examples and comparison in https://shakti.com/ (press document, then sql.d - no idea how to link directly). And yes, it is by Arthur Whitney and related to the [in]famous K
Many ORMs have similar features - jOOQ is not unique in this. The article is talking about an addition to SQL syntax, not a DSL embedded in another language.
Didn't https://www.odata.org/ do this already?
If the Odata server understands the table relations, you can navigate through table relations in the REST query.
I found that most parts of the business prefer filtering more than joining, learning another DSL other than SQL was a real barrier, and that many BI tools only support a subset of Odata, so are pretty much useless.
I definitely prefer concatenative (monadic) syntax a la Linq though, as it allows better scoping of efficient joins without a planner - it allows you to duck-tape (allusion intended) together a platform service easily.
More geniuses trying to "improve" SQL, and yet the only party to succeed in doing this is Postgres people, of actual relational SQL database—by means of dialect.
I have been working with graph databases for years now: these databases had to solve this problem from day one, because of the focus on relationships between entities.
I must point out that Neo4j was the first to propose a syntax that made traversal feel simple and natural again: the Cypher query language.
Neo4 and other industry players have spent years working on a new standard query language for graph databases that was released in April this year: GQL. GQL is the first database query language normalized by ISO since SQL, so it’s a big deal.
Anyway, if you wanna learn more about GQL, that a look at https://www.gqlstandards.org/