Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've heard this elsewhere, and when Twitter was originally founded in 2006 it wouldn't have been a particularly hard problem.

Twitter processes roughly 10K tweets per second. Even if you bloat out the text quite a lot to account for encoding overheads, metadata, etc, etc... and assume that each one is 10KB, then this is just 1 Gbps. A single NIC on an old server.

Okay, I get it, Twitter needs a lot of data too. Lists of users, etc...

Twitter has 450 million monthly active users, which sounds like a lot, but even if there's 1MB of profile data per user such as who they are following, that's just 400 TB.

That's... not that much these days. A large-but-not-enormous database cluster.

Sure, there's "historical" data, but that can be compressed and put on cheap cold storage, like S3 or whatever.

Give me a few million annually as an opex budget, a small team of decent developers, and I can guarantee you I could whip up a cloud-hosted service that can process tweets at Twitter's scale.

Obviously, what I can't replicate is the much larger set of tools and systems behind the scenes that are used for moderation, analytics, ad sales, etc...



That's looking at Twitter from the other end, as a Write-Only service with non-realtime requirements on the read layer.

Twitter is both sides. Tweets are replicated out in milliseconds. Twitter search is lightning fast, comprehensiveband powerful. o Old tweets are not just glaciered into S3!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: