Indeed, DataFrames give Spark more semantic information about the data transform... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

rxin on Feb 17, 2015 | parent | context | favorite | on: Introducing DataFrames in Spark for Large Scale Da...

Indeed, DataFrames give Spark more semantic information about the data transformations, and thus can be better optimized. We envision this to become the primary API users use. You can still fall back to the vanilla RDD API (afterall DataFrame can be viewed as RDD[Row]) for stuff that is not expressible with DataFrames.

rkrzr on Feb 18, 2015 [–]

Could you give an example of something that could not be expressed with DataFrames? Would e.g. tree-structured data be a bad fit for DataFrames, since it doesn't fit well with the tabular nature?

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact