Maybe you'd like to check FunSQL.jl, my library for compositional construction of SQL queries. It also follows algebraic approach and covers many analytical features of SQL including aggregates/window functions, recursive queries and correlated subqueries/lateral joins. One thing where it differs from dlpyr and similar packages is how it separates aggregation from grouping (by modeling GROUP BY with a universal aggregate function).
FunSQL.jl requires Julia to run (obviously as it is a Julia library) but it
produces standard SQL so Julia in this case is just an implementation language.
I have re-implemented parts of FunSQL in Python and OCaml (the one I have ended
up using) and have added a concrete syntax similar to what you have in PRQL.
from employees
define
salary + payroll_tax as gross_salary,
gross_salary + benefits_cost as gross_cost
where gross_cost > 0 and country = 'usa'
group by title, country
select
title,
country,
avg(salary) as average_salary,
sum(salary) as sum_salary,
avg(gross_salary) as average_gross_salary,
sum(gross_salary) as sum_gross_salary,
avg(gross_cost) as average_gross_cost,
sum(gross_cost) as sum_gross_cost,
count() as count
order by sum_gross_cost
where count > 200
limit 20
But, in my mind, the biggest difference between PRQL and FunSQL is the way
FunSQL treats relations with `GROUP BY` - as just another kind of namespaces,
allowing to defer specifying aggregates. A basic example:
from users as u
join (from comments group by user_id) as c on c.user_id = u.id
select
u.username,
c.count() as comment_count,
c.max(created_date) as comment_last_created_date
The `c` subrelation is grouped by `user_id` but it doesn't specify any
aggregates - they are specified in the `select` below so you have all selection
logic co-located in a single place.
I think this approach is very powerful as it allows you to build reusable query
fragments in isolation but then combine them into a single query which fully
specifies what's being selected.
Writing an alternative syntax is straight forward. Perhaps prototype PRQL using xi's excellent FunSQL backend? This way it's working out of the gate. Once syntax+semantics are pinned, writing another backend in the language of your choice would then be easier. Getting the backend correct is non-trivial work, and xi has done this already. Besides, we need a sandbox syntax anyway, so it might be fun to collaborate.
I like the explicit pipelining idea, seems much easier to reason about. Some comments:
I found the "# `|` can be used rather than newlines." a bit odd. So when using let, you're only transforming one column? I think the example would look weird with returns instead of |.
Depending on your intended target, it might help adoption if you stay closer to the naming conventions of that target. If you're targeting mainstream Java/Python/C#/Javascript etc. then functions need parentheses, "take 20" may be worse than slice, etc.
I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.
Also I think you should try writing one or more native wrappers in your intended target languages to make sure it's easy to interface between the two, even if it means you'd have to use dots in that language.
I could imagine an end game where the ergonomics were so good that a database like Postgres ends up with a native PRQL frontend. Not sure you're there yet, though. IMHO SQL as a query language suffers from a) sometimes really bad ergonomics, b) it's hard to wrap in another programming language (also ergonomics), c) it has far too many concepts - it's not orthogonal.
> I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.
Thanks good idea, I just changed this to remove the microversions. If we use SemVer, then before `1`, we'd hold versions compatible to the 0.X, and then to the X.
IMO you are at the forefront of where query languages need to and will go.
Some programmers like you see that SQL ordering is backwards to human thinking, except in the simplest cases. But many people with practice and sunk costs in their SQL expertise will be resistant. The resistance usually wins the day.
But sometimes, a useful tool gets created by one person, and a rift is created in that resistance. Think John Resig creating jQuery, leading to LINQ and many other similar patterns. You could be that person for database query languages, but how do you ensure that?
Maybe imagine what made jQuery easy to adopt and indispensable for programmers: easy availability as a simple .js download; solved the problem of DOM differences between browsers. Good luck to you, and thanks for sharing.
Let me know any feedback — as you can see it's still at the proposal stage. If it gains some traction I'll write an implementation.