> In our case, we had pushed the limits of the largest EC2 box we could find. We...

pixelmonkey · on April 13, 2016

We did run our own colo for about 18 months from 2012-2014. You are right: dedicated hardware is cheap. It saved us some money for awhile. We used Dell instances not unlike the one you spec'ed.

But even when we ran our own colo and hardware, we couldn't do all of our processing on a single node.

Also, in 2014 the business became critical enough that even if we could run everything on one node, we wouldn't want to risk it. So even when we had dedicated hardware, we still had to run Storm to spread the load onto multiple nodes.

The processing we do in real-time these days lights up over 300 EC2 vCores to 70%+ utilization during peak processing hours. A nightly job we run using pyspark lights up about 3x as many cores to near-100% utilization for several hours. And jobs we run during schema changes or backup restores can use more than 3,000 cores at once to churn through hundreds of terabytes of data.

I know it's very easy to say "YAGNI" to tools like Storm/Spark, and in most cases, I am with you. But when you outgrow the alternatives, as much as I'd love to just "throw a single machine at it", it simply isn't an option. YAGNI doesn't apply when you've exhausted all alternatives!

e12e · on April 13, 2016

Thank you for the detailed reply. I didn't mean to imply aws/spark was the wrong tool for your workload - was just curious about your comment on trying on a big ec2 node w/o mentioning dedicated HW.

Judging from comments I've seen online, many people appear to be unaware of the overhead aws introduces. That doesn't mean one shouldn't use aws, just that one should be aware of the tradeoffs :-)

vosper · on April 13, 2016

I know you get this, but when you're on AWS you don't evaluate a single piece of standalone hardware for virtually any reason. It's not even worth bothering to think about. People (again, probably not you) always have this misconception that AWS is just an expensive VPS. But anyone who's really invested in the platform (and I'm guessing they are, if they have this scale as a business problem) is using a dozen different services that play really well together in the AWS ecosystem. And that's what your engineers know, and your ops team knows. It's what finance knows. You'd need to be saving massive amounts of money to even bother considering it.