Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In our case, we had pushed the limits of the largest EC2 box we could find. We have to keep up with tens of thousands of events per second, and do non-trivial amounts of CPU processing on each of them.

I understand there are many reasons for sticking with AWS, but looking at reserved instances[1], I find: r3.8xlarge which has 32 cores, 244GB of RAM and 2 x 320 GB SSD storge, for $2.66 per Hour -- or just under 2K a month.

For just a little more (USD 2099/month, no minimum term, no setup), you can get something like:

Dell R930, 4x Intel Xeon E7-4820v3 (that's 40 cores at base frequency of 1.9 GHz), 384GB DDR4, 2x480GB SSD HW RAID 1 (although I suspect 6x120GB, possibly in raid0 might be better) with 1Gbps Full-Duplex and 10 TB of egress bandwidth included from leasweb: https://www.leaseweb.com/dedicated-server/configure/22651

Granted, one might want 10Gps - and managing your own server isn't free, even when the hosting company takes care of the physical hardware (not that managing an EC2 instance is free either). But I'm curious if you tested on dedicated hardware as well? 10.000 events at 4kb/event is "just" 300 mbps (call it 1 gbps including overhead). I'm certainly not claiming it's easy to any kind of processing at a sustained ~1gbps -- but I'm curious if the kind of workload you're discussing could be handled by a single (relatively cheap) dedicated server?

[1] http://aws.amazon.com/ec2/pricing/#reserved-instances



We did run our own colo for about 18 months from 2012-2014. You are right: dedicated hardware is cheap. It saved us some money for awhile. We used Dell instances not unlike the one you spec'ed.

But even when we ran our own colo and hardware, we couldn't do all of our processing on a single node.

Also, in 2014 the business became critical enough that even if we could run everything on one node, we wouldn't want to risk it. So even when we had dedicated hardware, we still had to run Storm to spread the load onto multiple nodes.

The processing we do in real-time these days lights up over 300 EC2 vCores to 70%+ utilization during peak processing hours. A nightly job we run using pyspark lights up about 3x as many cores to near-100% utilization for several hours. And jobs we run during schema changes or backup restores can use more than 3,000 cores at once to churn through hundreds of terabytes of data.

I know it's very easy to say "YAGNI" to tools like Storm/Spark, and in most cases, I am with you. But when you outgrow the alternatives, as much as I'd love to just "throw a single machine at it", it simply isn't an option. YAGNI doesn't apply when you've exhausted all alternatives!


Thank you for the detailed reply. I didn't mean to imply aws/spark was the wrong tool for your workload - was just curious about your comment on trying on a big ec2 node w/o mentioning dedicated HW.

Judging from comments I've seen online, many people appear to be unaware of the overhead aws introduces. That doesn't mean one shouldn't use aws, just that one should be aware of the tradeoffs :-)


I know you get this, but when you're on AWS you don't evaluate a single piece of standalone hardware for virtually any reason. It's not even worth bothering to think about. People (again, probably not you) always have this misconception that AWS is just an expensive VPS. But anyone who's really invested in the platform (and I'm guessing they are, if they have this scale as a business problem) is using a dozen different services that play really well together in the AWS ecosystem. And that's what your engineers know, and your ops team knows. It's what finance knows. You'd need to be saving massive amounts of money to even bother considering it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: