Hacker News new | past | comments | ask | show | jobs | submit login

Indeed, DataFrames give Spark more semantic information about the data transformations, and thus can be better optimized. We envision this to become the primary API users use. You can still fall back to the vanilla RDD API (afterall DataFrame can be viewed as RDD[Row]) for stuff that is not expressible with DataFrames.



Could you give an example of something that could not be expressed with DataFrames? Would e.g. tree-structured data be a bad fit for DataFrames, since it doesn't fit well with the tabular nature?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: