"Profiling a simple speed of light topology, shows that a good chunk of time of the SpoutOutputCollector.emit()
is spent in the clojure reduce() function.. which is part of the ACK-ing logic. Re-implementing this reduce() logic in java gives a big performance boost in both in the Spout.nextTuple() and Bolt.execute()"
Apache Storm 1.0.0 introduces batching internal disruptor queue which adds latencies slightly (but with timeout set to 1ms per each queue so it doesn't hurt) but gives huge throughput improvement.
Anyone privy to what changes they made to yield the performance improvements?