You would use this where you need to take in a lot of data from multiple sources, and create some kind of information out of that, with soft real-time constraints. So, for instance, web analytics. Or Natural Language Processing, if the communication was between humans and machines, and the responses needed to be real-time.
Think about getting a terabyte of data a day, or more.
Or you can take a different approach, and look at a system that does not use Storm, and then imagine at what scale that system would break. Do consider the system that Matthias Nehlsen wrote about here:
Using Redis as a universal bus is a fine architecture for small amounts of data, where "small amounts of data" means a gigabyte a day, or maybe a little bit more.
But what about 10 gigabytes of data per day?
What about about 100 gigabytes of data per day?
Universal bus architectures break down when each node sees too many messages that are not meant for that node. There is a limit on how far you can go with an architecture where every node gets every message. Assuming that each message is suppose to be read by certain nodes, there is an inefficiency to sending messages to nodes that don't want the message.
But there is a wonderful flexibility to this system. And I've created similar systems. Nevertheless, at a certain scale, you need to give up that flexibility for the sake of scale.
"What comes to mind immediately when regurgitating the requirements above is Storm and the Lambda Architecture. First I thought, great, such a search could be realized as a bolt in Storm. But then I realized, and please correct me if I’m wrong, that topologies are fixed once they are running. This limits the flexibility to add and tear down additional live searches. I am afraid that keeping a few stand-by bolts to assign to queries dynamically would not be flexible enough."
I would (and have) done exactly what Matthias Nehlsen suggests: use something like Redis for as long as you can. But I would also do what Montalenti eventually did, and adopt Storm to deal with a certain level of scale -- when you reach a scale where a universal bus architecture no longer works, sacrifice flexibility for performance, and switch to Apache Storm.
If I recall correctly, Montalenti and Parse.ly experimented with MongoDB and Cassandra and a bunch of other technologies before they found that Kafka/Storm is what they needed. Maybe he's written about the whole journey somewhere. Certainly, it would be interesting to read why all the other systems failed, and why moving to Kafka/Storm was the right choice for Parse.ly.
Think about getting a terabyte of data a day, or more.
Or you can take a different approach, and look at a system that does not use Storm, and then imagine at what scale that system would break. Do consider the system that Matthias Nehlsen wrote about here:
http://matthiasnehlsen.com/blog/2014/11/07/Building-Systems-...
Using Redis as a universal bus is a fine architecture for small amounts of data, where "small amounts of data" means a gigabyte a day, or maybe a little bit more.
But what about 10 gigabytes of data per day?
What about about 100 gigabytes of data per day?
Universal bus architectures break down when each node sees too many messages that are not meant for that node. There is a limit on how far you can go with an architecture where every node gets every message. Assuming that each message is suppose to be read by certain nodes, there is an inefficiency to sending messages to nodes that don't want the message.
But there is a wonderful flexibility to this system. And I've created similar systems. Nevertheless, at a certain scale, you need to give up that flexibility for the sake of scale.
Again, Matthias Nehlsen says this well:
http://matthiasnehlsen.com/blog/2014/10/30/Building-Systems-...
"What comes to mind immediately when regurgitating the requirements above is Storm and the Lambda Architecture. First I thought, great, such a search could be realized as a bolt in Storm. But then I realized, and please correct me if I’m wrong, that topologies are fixed once they are running. This limits the flexibility to add and tear down additional live searches. I am afraid that keeping a few stand-by bolts to assign to queries dynamically would not be flexible enough."
I would (and have) done exactly what Matthias Nehlsen suggests: use something like Redis for as long as you can. But I would also do what Montalenti eventually did, and adopt Storm to deal with a certain level of scale -- when you reach a scale where a universal bus architecture no longer works, sacrifice flexibility for performance, and switch to Apache Storm.
If I recall correctly, Montalenti and Parse.ly experimented with MongoDB and Cassandra and a bunch of other technologies before they found that Kafka/Storm is what they needed. Maybe he's written about the whole journey somewhere. Certainly, it would be interesting to read why all the other systems failed, and why moving to Kafka/Storm was the right choice for Parse.ly.