Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A Data Lakehouse is fine but what benefit does it give you over a much more simple solution of ETL/ELTing the data in batches (weekly, daily, hourly, etc) and letting it sit in some kind of DB.

Lots of engines like Polars, PyTorch, Spark, and Ray can read structured data from databases, but Lakehouses are more efficient.

Databases aren't as good for storing unstructured data.

Databases can also be much more expensive than a Data Lakehouse.

Databases are awesome and have lots of amazing use cases of course. Like you mentioned, data lakehouses are great for high data volume and throughput, but there are other use cases as well IMO.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: