I've heard this elsewhere, and when Twitter was originally founded in 2006 it wo...

I've heard this elsewhere, and when Twitter was originally founded in 2006 it wouldn't have been a particularly hard problem.

Twitter processes roughly 10K tweets per second. Even if you bloat out the text quite a lot to account for encoding overheads, metadata, etc, etc... and assume that each one is 10KB, then this is just 1 Gbps. A single NIC on an old server.

Okay, I get it, Twitter needs a lot of data too. Lists of users, etc...

Twitter has 450 million monthly active users, which sounds like a lot, but even if there's 1MB of profile data per user such as who they are following, that's just 400 TB.

That's... not that much these days. A large-but-not-enormous database cluster.

Sure, there's "historical" data, but that can be compressed and put on cheap cold storage, like S3 or whatever.

Give me a few million annually as an opex budget, a small team of decent developers, and I can guarantee you I could whip up a cloud-hosted service that can process tweets at Twitter's scale.

Obviously, what I can't replicate is the much larger set of tools and systems behind the scenes that are used for moderation, analytics, ad sales, etc...