Hacker News new | past | comments | ask | show | jobs | submit login

> when do I need this

Probably never.

Spark, flink and a few proprietary tools can operate in both batch and streaming modes which means you can share your codebase between both tasks if you do them. Before these tools it used to be hadoop for batch and storm for streaming, but I suspect those days are dying now (which goes some way to explaining spark's success) except if you require exceptional throughput in which case do some research & benchmarks.




I disagree. If low latency is a priority, Storm has little competition. Checkout the benchmarks done by Yahoo engineers.

https://yahooeng.tumblr.com/post/135321837876/benchmarking-s...


Thanks for that link. However, they appear to conclude Flink and Storm are quite similar in performance.

If you really need low latencies, it is quite likely that none of these will work for you anyway and you would have to build a specialized CEP-style system.


Yes, this release is all about performance. Now Storm must thus be far ahead.


Absolutely, I find Google's "The Dataflow Model" paper about this to be a good read: http://research.google.com/pubs/pub43864.html (it talks about generalizing streaming vs. batch vs. micro-batch frameworks so it becomes an easy cost-based decision).


Storm is a few years late. It has largely been eclipsed by the projects you mentioned and others.

In theory, Storm can be set up to run with much lower latency than the others, but it is rare that an app will need that kind of response time.


What's the range of latencies here? What types of projects really need the lowest latencies?


Financial trading algos where low latency is important were the initial big market for stream processing/cep. As another example, imagine you're trying to do real-time fraud detection based on click stream analysis. If your latency is low enough you can potentially prevent suspicious transactions ever happening instead of having to allow them and then recover them somehow later.


Storm is actually few years earlier. Its the first real-time processing framework before other projects showed up.


True, it showed up first. And then it stagnated for several years. Now it is behind the other alternatives.


I don't see why this was downvoted. The points may be debatable, but isn't that the point of a discussion?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: