Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Disclaimer: Clouderan Here :D

I wouldn't necessarily agree that the same feature set against hive would also apply to Impala. For example, Impala utilizes HDFS short-circuit reads and can read data directly from disk which results in full disk throughput, this combined with highly effecient parallel reads yields some impressive numbers.

I've seen queries speed up anywhere from 2x-100x (especially when data sets can fit in memory). Since it's designed for low latent queries, results can be returned within the sub-second range.

With that being said, Impala does not currently support UDFs (slated for post-GA).

Hive does do JOIN order optimizations after 0.7.0 though (https://issues.apache.org/jira/browse/HIVE-1642), you can set "hive.auto.convert.join = true" to enable it. I believe this will be enabled by default eventually. By GA, Impala will have a cost-based optimizer for optimizing JOINS as well.

PS: Congrats on the release, I'm looking forward to giving it a go :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: