I've read before that Twitter uses NoSQL (or non-relational databases) and that each tweet requires a copy to be pushed to each follower every time someone tweets (same for updates or deletes). I am of a very much traditional RDBMS mindset though I am willing to bend a bit :) ... but this idea of fanning out tweets (copying tweets) to each and every follower just sounds grossly inefficient. The amount of writes must be exponential.
The way I see it, here are the WRITES they'll ever need:
1:User -> n:Users (Followers) (Temporal links with effective dates for follow history)
1:User -> n:Tweets (Temporal with tweet datetime) - this includes Retweets.
The rest are just READS (pulls instead of fanning out the same data).
What am I missing here folks? :-) Educate me please.
The entire tweet is not copied; it is a per-user list [0] of 16B to 28B records [1] -- IDs to the tweet and tweet author and sometime adhoc bits and retweet ID.
As for what you might be missing, a user will need the latest, for example, 50 tweets from only the 200 people they follow....
Do we pull in 200 * 50 latest-per-user [cached] tweets lists? Now we might have network IO issues; it's also slower and on every read; could possibly ignite some hot spots for certain read sites. We can solve for these things, but they are just different problems and not necessarily easier nor simpler.
Do we walk a [timestamp] range index with a composite of user IDs? We'll have to shard this index with 200M tweets/day, and there's not an easy way to keep the necessary joins or unions local or making sense of how many shards I should query asynchronously before discarding extras. I'll probably have to make several roundtrips with a supervising process, between synchronous fetches.
Then there is paging...
There is probably a sweet spot for not doing the fanout for those with large followings, which they touch on at the bottom of The Future. Ultimately, you have to pay the price somewhere on read or write; they've simply picked write, as it apparently is more predictable and manageable on their working set.
I think what you're missing is simply that they have explicitly decided not to compute the home timeline on request; they've decided that they need it to be precomputed and constantly updated.