Hacker News new | past | comments | ask | show | jobs | submit login

I think that might make sense ingest side, but that's very expensive to deal with if you're doing anything remotely large.

I think sinking into something like delta-lake or iceberg probably makes sense at scale.

But yeah, I definitely agree that CSV is not great.




JSONL as a replacement for CSV, you shouldn't be using CSV as format for long term storage or querying, it has so many downsides and nearly zero upsides.

JSONL when compressed with zstd, most of "expensive if large" disappears as well.

Generating and consuming JSONL can easily be in the GB/s range.


I mean on the querying side. Parquet's ability to skip rowgroups and even pages, and paired with iceberg or delta can make the difference between being able to run your queries at all versus needing to scale up dramatically.


Totally agree.

I am saying JSONL is a lower bound format, if you can use something better you should. Data interchange, archiving, transmission, etc. It shouldn't be repeatedly queried.

Parquet, Arrow, sqlite, etc are all better formats.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: