Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

HAVING is less understood? There’s nothing strange about HAVING, it’s just like WHERE, but it applies after GROUP BY has grouped the rows, and can use the grouped row values. (Obviously, this is only useful if you actually have a GROUP BY clause.) If HAVING did not exist, you could just as well do the same thing using a subselect (i.e. doing SELECT * FROM (SELECT * FROM … WHERE … GROUP BY …) WHERE …; is, IIUC, equivalent to SELECT * FROM … WHERE … GROUP BY … HAVING …;)


FWIW I just tested a somewhat complex query using HAVING vs a sub-select as you indicated and Postgres generated the same query plan (and naturally, results) for both.


A DB wizard I used to work for showed me HAVING after looking at this nasty sub-sub select I did, with some app-layer-loop after.

Caesar if you see this thanks for being a great mentor.


Snowflake SQL also has the interesting feature QUALIFY.

Their docs go into more detail (https://docs.snowflake.com/en/sql-reference/constructs/quali...), but the short version is that typically SELECT is evaluated in the order FROM, WHERE, GROUP BY, HAVING, WINDOW, DISTINCT, ORDER BY, LIMIT.

But what happens if you want to filter on the result of a WINDOW? Sorry, time to write a nested query and bemoan the non-composability of SQL.

Snowflake adds QUALIFY, which is executed after WINDOW and before DISTINCT. Therefore you can write some pretty interesting tidy queries, like this one I've been using at work:

    SELECT
      *,
      row_number() OVER (
        PARTITION BY quux.record_time
        ORDER BY quux.projection_time DESC
      ) AS record_version
    FROM foo.bar.quux AS quux
    WHERE (NOT quux._bad OR quux._bad IS NULL)
    QUALIFY record_version = 1
Without QUALIFY, you'd have to nest queries (ugh):

    SELECT *
    FROM (
      SELECT
        *,
        row_number() OVER (
          PARTITION BY quux.record_time
          ORDER BY quux.projection_time DESC
        ) AS record_version
      FROM foo.bar.quux AS quux
      WHERE (NOT quux._bad OR quux._bad IS NULL)
    ) AS quux_versioned
    WHERE quux_versioned.record_version = 1
or use a CTE (depending on whether your database inlines CTEs).

I definitely pine for some kind of optimizing Blub-to-SQL compiler that would let me write my SQL like this instead:

    (query (from foo.bar.quux :as quux)
           (select *)
           (where (or (not quux._bad)
                      (null quux._bad))
           (select (row-number :over (:partition-by quux.record-time
                                      :order-by (:desc quux.projection-time))
                               :as record-version)
           (where (= 1 record-version)))


> I definitely pine for some kind of optimizing Blub-to-SQL compiler

I've been playing with malloy[1] that lets you pipeline/nest queries like you are describing here.

  source: quux as from_sql(..) {
    where: record_version = 1 
    where: _bad != null  
  }
1. https://looker-open-source.github.io/malloy/documentation/la...


But HAVING can also act on aggregate results. In fact, the example in the article is not the most important use for HAVING. Subselects can't do something like.

SELECT year, COUNT(*) sales, SUM(price) income FROM sales HAVING sales > 10 AND income < 1000;


Why wouldn't

    SELECT *
    FROM (SELECT year, COUNT(*) as nb_sales, SUM(price) as income
          FROM sales)
    WHERE nb_sales > 10 AND income < 1000;
work just like your example?


For more complicated examples, HAVING can produce easier to read/understand/maintain statements.

There may also be performance differences depending on the rest of the query, but for simple examples like this exactly the same plan will be used, so the performance will be identical.

Unfortunately, the simplest examples do not always illustrate the potential benefits of less common syntax.


I totally agree with that, but I was responding to this statement:

> Subselects can't do something like: (…)

which is wrong.


This should work, right? Just a bit more unnecessary text.

SELECT year, sales, income FROM ( SELECT year, COUNT(*) sales, SUM(price) income FROM sales ) AS innerquery WHERE sales > 10 AND income < 1000;




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: