Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Congratulations to duckdb team! Can't wait to try some of the newly released features and performance improvements.

I am quite curious about the plans for python dataframe like API for duckdb, and python ecosystem in general.



there is Ibis[0] as a fairly mature package. They recently adopted duckdb as the default execution engine and it can give you a nice python dataframe API ontop of duckdb, with hot-swappability towards heavier engines.

With tools like this providing a comprehensive python API and the ability to always fall back to raw SQL, i am not sure DuckDB devs should focus on the python API at all beyond basic (to_table, from_table) features.

Impressive progress and a real chance to shake up the data tool market, but still a way to go: There is is still much to do especially on large table formats (iceberg/delta) and memory management when running on bigger boxes on cloud. Eg the elusive "Failed to allocate ..." bug[1] is an inhibitor to the claim that big data is dead[2]. As it is, we tried and abandoned DuckDB as a cheaper replacement for some databricks batch jobs.

[0] https://github.com/ibis-project/ibis [1] https://github.com/duckdb/duckdb/issues/12667, https://github.com/duckdb/duckdb/issues/9880, https://github.com/duckdb/duckdb/issues/12528 [2] https://motherduck.com/blog/big-data-is-dead/


The last I read, the Spark API was to become the focus point.

https://duckdb.org/docs/api/python/spark_api

Not sure what the current status is.

ref: https://github.com/duckdb/duckdb/issues/2000#issuecomment-18...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: