Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which things did you decide to move to duckdb?


A lot of aggregation, pivot and join logic, like:

    df = duckdb.query("from tbl1 join tbl2 using (id) where id is not null ").pl()
The .pl() is a Arrow-based conversation from DuckDB to Polars. It's in-memory and I believe zero-copy, so it happens almost instantaneously.

I go back and forth between DuckDB and Polars functions in the same scope because it's so cheap to convert between the two.


I'm genuinely curious, if you are already using (and assuming that you like) dataframe APIs, why you would use SQL?


It's the opposite; I prefer DuckDB and generally work with DuckDB's friendly SQL interface. SQL is declarative and is (for me) more intuitive than method-chaining -- especially for complex analytic operations that happen in one go.

(software people might beg to differ about the intuitive bit because they are more used to an imperative style, and to my surprise, even the best software engineers struggle with SQL because it requires one to think in set and relation operations rather than function calls, which many software folks are not used to)

I actually don't use the Polars dataframe APIs much except for some operations which are easier to do in dataframe form, like applying a Python function as UDF, or transposing (not pivoting) a dataframe.

Also Polars is good for materializing the query into a dataframe rapidly, which can then be passed into methods/functions. It's also a lot easier to unit test dataframes than SQL tables. There's a lot more tooling for that.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: