Hacker News new | past | comments | ask | show | jobs | submit login

How do you insert into DuckDB fast and what settings ("Indices") do you use? As far as I understand DuckDB builds up statistics for each "block" of data (number of different values, ... ). So I assume inserting is slow. There is a paper [0] and a comment [1] that mentions that DuckDB is 10-500 times slower in a write-heavy workload.

[0] https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/ [1] https://vldb.org/pvldb/volumes/15/paper/SQLite%3A%20Past%2C%...




I have a large number of small and frequent batches, think of it like discrete ETL, where each process operates on a pandas DataFrame. This frame ends up being written to disc as parquet and immediately followed by creating a DuckDB that imports the parquet. The duckdb file from then on will only be opened for read, no further writes.

I use a python odata library to convert user queries in rest to a SQL similar to Postgres and run it on these duckdb for applying any filters where needed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: