Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Malloy writes SQL you can't (I'm pretty sure of it).

Here you go. Malloy looks like a fine tool but it's an unnecessary mental burden for people who understand SQL and I think the below is more maintainable long-term.

  with
  aggregates as (
    select
        state,
        name,
        sum(sum(number)) over (partition by state, name) as state_name_births,
        sum(sum(number)) over (partition by state) as state_births,
        sum(sum(number)) over (partition by name) as name_births,
        sum(sum(number)) over () as total_births
    from bigquery-public-data.usa_names.usa_1910_current
    group by state, name
    qualify state_name_births > 1000
  ),
  
  scoring as (
    select
      state,
      name,
      (state_name_births / state_births) / (name_births / total_births) as popularity_score,
      rank() over (partition by state order by (state_name_births / state_births) / (name_births / total_births) desc) as rnk
    from aggregates
    qualify rnk <= 20
  )
  
  select
    state,
    string_agg(name order by rnk) as names
  from scoring
  group by state
  order by state


I wrote a similar one (forgot about partitioning because I need it so rarely, adapted yours for SQL Server to compare), only 1/3 as fast compared to your solution (1.672 seconds vs 0.545), but 1/4th the reads (82,280 vs 292,066) :D

And yeah, I agree. That wasn’t actually that hard and is far more readable if you know SQL.

For the sake of completeness, here’s mine:

    WITH CTE AS (SELECT main.Name,
                        main.State,
                        main.Births,
                        (CAST(main.Births AS float) / sb.births_in_state) /
                        (CAST(nb.births_of_name AS float) / total.all_births) AS popularity,
                     ROW_NUMBER() OVER (PARTITION BY State
               ORDER BY (CAST(main.Births AS float) / sb.births_in_state) /
                        (CAST(nb.births_of_name AS float) / total.all_births) DESC) AS RankPerState
                 FROM BirthCount main
                          CROSS APPLY (SELECT SUM(Births) as all_births FROM BirthCount) total
                          CROSS APPLY (SELECT SUM(Births) as births_in_state FROM BirthCount WHERE main.State = State) sb
                          CROSS APPLY (SELECT SUM(Births) as births_of_name FROM BirthCount WHERE main.Name = Name) nb
                 WHERE main.Births > 1000
                 GROUP BY main.Name, main.State, main.Births, total.all_births, sb.births_in_state, nb.births_of_name)
    
    SELECT State, STRING_AGG(Name + ' (' + CAST(ROUND(popularity, 3) AS varchar(50)) + ')', ', ')
    FROM CTE
    WHERE RankPerState <= 10
    GROUP BY State
    ORDER BY State

edit: Forgot to show the small view I created to make it more readable:

    CREATE VIEW BirthCount AS
    (
    SELECT Name, State, SUM(Number) AS Births
    FROM Names
    GROUP BY Name, State
    )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: