Why not SQL for pure declarative queries? Here's llm-hallucinated sql query of the polars example:
SELECT country, SUM(amount - discount) AS total
FROM purchases
WHERE amount <= (
SELECT MEDIAN(amount) * 10
FROM purchases
WHERE country = purchases.country
)
GROUP BY country;
It might be just an issue of familiarity but sql seems the most straightforward and easy to understand for me.
e.g., we can use regexes to query text. Python is a general-purpose language, you can query text without using regexes but it would be insanity to ignore regexes completely (I don't know how easy is to invoke regexes from R). Another example, bash pipeline can be embedded in Python ("generate --flag | filter arg | sink") without reimplementing it in pure Python (you can do it but it would be ugly). No idea how easy it is to invoke shell commands from R. SQL is just another DSL in this case -- use it in Python when it makes the solution more readable.
It looks like llm hallucinated the query that doesn't group by country to get the median. Here's version generated after asking to fix it:
SELECT p.country, SUM(p.amount - p.discount) AS total
FROM purchases p
JOIN (
SELECT country, MEDIAN(amount) * 10 AS median_amount
FROM purchases
GROUP BY country
) m ON p.country = m.country
WHERE p.amount <= m.median_amount
GROUP BY p.country;
> A very nice feature of BeautifulSoup is its excellent support for encoding detection which can provide better results for real-world HTML pages that do not (correctly) declare their encoding.
Progress is being tracked on Github Discussions[3].
[1]: https://duckdb.org/docs/api/python/spark_api.html
[2]: https://duckdb.org/docs/api/python/relational_api.html
[3]: https://github.com/duckdb/duckdb/discussions/14525