Spark, flink and a few proprietary tools can operate in both batch and streaming modes which means you can share your codebase between both tasks if you do them. Before these tools it used to be hadoop for batch and storm for streaming, but I suspect those days are dying now (which goes some way to explaining spark's success) except if you require exceptional throughput in which case do some research & benchmarks.
Thanks for that link. However, they appear to conclude Flink and Storm are quite similar in performance.
If you really need low latencies, it is quite likely that none of these will work for you anyway and you would have to build a specialized CEP-style system.
Absolutely, I find Google's "The Dataflow Model" paper about this to be a good read: http://research.google.com/pubs/pub43864.html (it talks about generalizing streaming vs. batch vs. micro-batch frameworks so it becomes an easy cost-based decision).
Financial trading algos where low latency is important were the initial big market for stream processing/cep. As another example, imagine you're trying to do real-time fraud detection based on click stream analysis. If your latency is low enough you can potentially prevent suspicious transactions ever happening instead of having to allow them and then recover them somehow later.
Probably never.
Spark, flink and a few proprietary tools can operate in both batch and streaming modes which means you can share your codebase between both tasks if you do them. Before these tools it used to be hadoop for batch and storm for streaming, but I suspect those days are dying now (which goes some way to explaining spark's success) except if you require exceptional throughput in which case do some research & benchmarks.