Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a data lake with 40,000 Parquet files. You need to list the files before you can read the data. This can take a few minutes.

Sounds like this data lake could use a Parquet file listing the Parquet files.

Butter



Yea, that's exactly what Delta Lake does. All the table metadata is stored in a Parquet file (it's initially stored in JSON files, but eventually compacted into Parquet files). These tables are sometimes so huge that the table metadata is big data also.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: