There is no Parquet table. Parquet is a compressed file format like a zip. Parqu...

MrPowers · on Jan 19, 2024

Lots of Parquet files in the same directory are typically referred to as a "Parquet table".

Yes, Parquet can be compressed with zip, but snappy is much more common because it's splittable.

Parquet tables can be registered in a Hive metastore. Delta metadata can be added to a Parquet table to make it a Delta table.

BadHumans · on Jan 19, 2024

> Lots of Parquet files in the same directory are typically referred to as a "Parquet table".

This is my point though? This is an apples to oranges comparison. A directory of Parquet files is not a table format. Comparing Delta to Hive or Iceberg is a more apt comparison. I have worked with all types of companies and I have yet to work with one that is just using a directory of Parquet files and calling it a day without using something like Hive with it.

MrPowers · on Jan 19, 2024

Yea, comparing Delta Lake to Iceberg is more apt, but I've been shying away from that content cause I don't wanna flamewar. Another poster is asking for this post tho, so maybe I should write it.

I don't really see how Delta vs Hive comparison makes sense. A Delta table can be registered in the Hive metastore or can be unregistered. If you persist a Spark DataFrame in Delta with save it's not registered in the Hive metastore. If you persist it with saveAsTable it is registered. I've been meaning to write a blog post on this, so you're motivating me again.

I've seen a bunch of enterprises that are still working with Parquet tables that aren't registered in Hive. I worked at an org like this for many years and didn't even know Hive was a thing, haha.

BadHumans · on Jan 19, 2024

> I don't really see how Delta vs Hive comparison makes sense. A Delta table can be registered in the Hive metastore or can be unregistered.

You are right about Delta tables in the Hive metastore but if you are writing from the perspective of "there are companies that don't know what Hive is" then I feel the next step up is "there are companies that just stuff files in S3 and query them with Athena(which handles all the Hive stuff for you when you make tables). Explaining what Delta gives them over that I feel is something worth explaining.

chimerasaurus · on Jan 19, 2024

I agree with the points you make above.