Thanks for bringing this up. A lot of what we say in the FAQ for "How does CitusDB's feature set compare against Apache Hive?" also applies to Impala, and we'll update that question shortly. The fundamental difference is that Citus builds on top of Postgres, and leverages its many features and performance optimizations.
We are also working on getting performance numbers that compare Hive, Impala, and Citus thoroughly; and we'll share our methodology and results in the upcoming months.
I wouldn't necessarily agree that the same feature set against hive would also apply to Impala. For example, Impala utilizes HDFS short-circuit reads and can read data directly from disk which results in full disk throughput, this combined with highly effecient parallel reads yields some impressive numbers.
I've seen queries speed up anywhere from 2x-100x (especially when data sets can fit in memory). Since it's designed for low latent queries, results can be returned within the sub-second range.
With that being said, Impala does not currently support UDFs (slated for post-GA).
Hive does do JOIN order optimizations after 0.7.0 though (https://issues.apache.org/jira/browse/HIVE-1642), you can set "hive.auto.convert.join = true" to enable it. I believe this will be enabled by default eventually. By GA, Impala will have a cost-based optimizer for optimizing JOINS as well.
PS: Congrats on the release, I'm looking forward to giving it a go :)
Hive is not meant for real-time queries. Hive would merely serve as a baseline for comparison; what will be interesting is how it compares against Impala. And Hadapt, as @mwexler points out, and maybe also RedShift and BigQuery :)
Thanks for bringing this up. A lot of what we say in the FAQ for "How does CitusDB's feature set compare against Apache Hive?" also applies to Impala, and we'll update that question shortly. The fundamental difference is that Citus builds on top of Postgres, and leverages its many features and performance optimizations.
We are also working on getting performance numbers that compare Hive, Impala, and Citus thoroughly; and we'll share our methodology and results in the upcoming months.