Hacker News new | past | comments | ask | show | jobs | submit login

I wrote this over the holidays, because I find SQL wonderfully elegant in its function, but really frustrating in its form.

Let me know any feedback — as you can see it's still at the proposal stage. If it gains some traction I'll write an implementation.




Maybe you'd like to check FunSQL.jl, my library for compositional construction of SQL queries. It also follows algebraic approach and covers many analytical features of SQL including aggregates/window functions, recursive queries and correlated subqueries/lateral joins. One thing where it differs from dlpyr and similar packages is how it separates aggregation from grouping (by modeling GROUP BY with a universal aggregate function).


This is awesome! I'll add a link to it on PRQL.

I guess the biggest difference between FunSQL (and similarly dbplyr) and PRQL is that the former needs a Julia (or R) runtime to run.

I really respect the library and keen to see how it develops.


FunSQL.jl requires Julia to run (obviously as it is a Julia library) but it produces standard SQL so Julia in this case is just an implementation language.

I have re-implemented parts of FunSQL in Python and OCaml (the one I have ended up using) and have added a concrete syntax similar to what you have in PRQL.

    from employees
    define
      salary + payroll_tax as gross_salary,
      gross_salary + benefits_cost as gross_cost
    where gross_cost > 0 and country = 'usa'
    group by title, country
    select
      title,
      country,
      avg(salary) as average_salary,
      sum(salary) as sum_salary,
      avg(gross_salary) as average_gross_salary,
      sum(gross_salary) as sum_gross_salary,
      avg(gross_cost) as average_gross_cost,
      sum(gross_cost) as sum_gross_cost,
      count() as count
    order by sum_gross_cost
    where count > 200
    limit 20
But, in my mind, the biggest difference between PRQL and FunSQL is the way FunSQL treats relations with `GROUP BY` - as just another kind of namespaces, allowing to defer specifying aggregates. A basic example:

    from users as u
    join (from comments group by user_id) as c on c.user_id = u.id
    select
      u.username,
      c.count() as comment_count,
      c.max(created_date) as comment_last_created_date
The `c` subrelation is grouped by `user_id` but it doesn't specify any aggregates - they are specified in the `select` below so you have all selection logic co-located in a single place.

I think this approach is very powerful as it allows you to build reusable query fragments in isolation but then combine them into a single query which fully specifies what's being selected.


Writing an alternative syntax is straight forward. Perhaps prototype PRQL using xi's excellent FunSQL backend? This way it's working out of the gate. Once syntax+semantics are pinned, writing another backend in the language of your choice would then be easier. Getting the backend correct is non-trivial work, and xi has done this already. Besides, we need a sandbox syntax anyway, so it might be fun to collaborate.


I like the explicit pipelining idea, seems much easier to reason about. Some comments:

I found the "# `|` can be used rather than newlines." a bit odd. So when using let, you're only transforming one column? I think the example would look weird with returns instead of |.

Depending on your intended target, it might help adoption if you stay closer to the naming conventions of that target. If you're targeting mainstream Java/Python/C#/Javascript etc. then functions need parentheses, "take 20" may be worse than slice, etc.

I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Also I think you should try writing one or more native wrappers in your intended target languages to make sure it's easy to interface between the two, even if it means you'd have to use dots in that language.

I could imagine an end game where the ergonomics were so good that a database like Postgres ends up with a native PRQL frontend. Not sure you're there yet, though. IMHO SQL as a query language suffers from a) sometimes really bad ergonomics, b) it's hard to wrap in another programming language (also ergonomics), c) it has far too many concepts - it's not orthogonal.


> I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Thanks good idea, I just changed this to remove the microversions. If we use SemVer, then before `1`, we'd hold versions compatible to the 0.X, and then to the X.


"LIMIT 10" from PostgreSQL sounds nicer than "take 10" because "take" for me sounds like a random selection is made (take 10 from the bag).

Also want "OFFSET 20" from PG.


IMO you are at the forefront of where query languages need to and will go.

Some programmers like you see that SQL ordering is backwards to human thinking, except in the simplest cases. But many people with practice and sunk costs in their SQL expertise will be resistant. The resistance usually wins the day.

But sometimes, a useful tool gets created by one person, and a rift is created in that resistance. Think John Resig creating jQuery, leading to LINQ and many other similar patterns. You could be that person for database query languages, but how do you ensure that?

Maybe imagine what made jQuery easy to adopt and indispensable for programmers: easy availability as a simple .js download; solved the problem of DOM differences between browsers. Good luck to you, and thanks for sharing.


Also PRQL/Prequel is a great name. Just that can take you far.


This looks great! Clear docs and rationale, and the syntax is well thought. I'm definitely following this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: