> Malloy writes SQL you can't (I'm pretty sure of it).
Here you go. Malloy looks like a fine tool but it's an unnecessary mental burden for people who understand SQL and I think the below is more maintainable long-term.
with
aggregates as (
select
state,
name,
sum(sum(number)) over (partition by state, name) as state_name_births,
sum(sum(number)) over (partition by state) as state_births,
sum(sum(number)) over (partition by name) as name_births,
sum(sum(number)) over () as total_births
from bigquery-public-data.usa_names.usa_1910_current
group by state, name
qualify state_name_births > 1000
),
scoring as (
select
state,
name,
(state_name_births / state_births) / (name_births / total_births) as popularity_score,
rank() over (partition by state order by (state_name_births / state_births) / (name_births / total_births) desc) as rnk
from aggregates
qualify rnk <= 20
)
select
state,
string_agg(name order by rnk) as names
from scoring
group by state
order by state
I wrote a similar one (forgot about partitioning because I need it so rarely, adapted yours for SQL Server to compare), only 1/3 as fast compared to your solution (1.672 seconds vs 0.545), but 1/4th the reads (82,280 vs 292,066) :D
And yeah, I agree. That wasn’t actually that hard and is far more readable if you know SQL.
For the sake of completeness, here’s mine:
WITH CTE AS (SELECT main.Name,
main.State,
main.Births,
(CAST(main.Births AS float) / sb.births_in_state) /
(CAST(nb.births_of_name AS float) / total.all_births) AS popularity,
ROW_NUMBER() OVER (PARTITION BY State
ORDER BY (CAST(main.Births AS float) / sb.births_in_state) /
(CAST(nb.births_of_name AS float) / total.all_births) DESC) AS RankPerState
FROM BirthCount main
CROSS APPLY (SELECT SUM(Births) as all_births FROM BirthCount) total
CROSS APPLY (SELECT SUM(Births) as births_in_state FROM BirthCount WHERE main.State = State) sb
CROSS APPLY (SELECT SUM(Births) as births_of_name FROM BirthCount WHERE main.Name = Name) nb
WHERE main.Births > 1000
GROUP BY main.Name, main.State, main.Births, total.all_births, sb.births_in_state, nb.births_of_name)
SELECT State, STRING_AGG(Name + ' (' + CAST(ROUND(popularity, 3) AS varchar(50)) + ')', ', ')
FROM CTE
WHERE RankPerState <= 10
GROUP BY State
ORDER BY State
edit: Forgot to show the small view I created to make it more readable:
CREATE VIEW BirthCount AS
(
SELECT Name, State, SUM(Number) AS Births
FROM Names
GROUP BY Name, State
)
Here you go. Malloy looks like a fine tool but it's an unnecessary mental burden for people who understand SQL and I think the below is more maintainable long-term.