It adds some complexity, but consider that we know very well how to scale this type of service: E-mail + reflectors (mailing lists), and we know very well how to do parallel mass delivery for the small proportion of accounts with huge numbers of followers.
Scaling this is easily done with decomposition and sharding coupled with a suitable key->value mapping of external id to current shard. I first sharded e-mail delivery and storage for millions of users 23 years ago. It was neither hard nor novel to do then, with hardware slower than my current laptop handling hundreds of thousands of users each.
I have no idea if that is how Twitter ended up doing it. But building it that way is vastly easier to scale than trying to do some variation over joining the timelines of everyone you follow "live" on retrieval, because in models like this the volume of reads tends to massively dominate.
You also don't need to store every tweet, you need to store the id's of the tweets (a KV store of the tweet id to full tweet is also easy to shard), and since they're reasonably chronological the id's can be compressed fairly efficiently (quite a few leading digits of tweet id's are chronological).
You also have straightforward options for "hybrid" solutions, such as e.g. dealing with extreme outliers. Have someone followed by more than X% of total userbase? Cache the most recent N tweets from those accounts on that small set of timelinesyour frontends, and do joins over those few with users who follow them.
Most importantly, it's an extensively well tested pattern in a multitude of systems with follower/following graphs whenever consumers/reads dominate over a period of decades at this point, so behaviours and failure modes are well understood with straightforward, well tested solutions for most challenges you'll run into, which matters in the context of whether it'd be possible to build with a small team.
Put another way: I know from first hand experience you can scale this to millions of users per server on modern hardware, so the number of shards you'd need to be able to manage to deal with Twitter-level volume is lower than the number of servers I've had ops teams manage (you'd need more servers total, because your read load means you'd want extensive caching, as well as storage systems for e.g. images and the like - there's lots of other complexity, but scaling the core timeline functionality is not a complex problem)
I might be underestimating how hard it is to scale microblogging. I most certainly am.
But have you looked at the scale of what Telegram provides both in width and at scale?
Certainly there are celebrities with more followers on Twitter than the largest Telegram channels, but Telegram scales surprisingly far, and I haven't seen it struggle more than once or twice since the start.
They're different problems, but messaging isn't some trivial thing. And a user having a single unified public view of their tweets is pretty much O(n).
WhatsApp is not 'vastly more innovative', and they solved different kinds of problems.
Twitter is a 'universe of 100M connected people'.
WhatsApp mostly connected single entities together.
So, for example, 'real time search' and 'relevant updates'.
Imagine taking a firehose of 100M people's random thoughts, putting that into an index, making it instantly searchable. Now pull up the most relevant thoughts from those 100M to each and every other 100M user.
Now moderate all of it in really subtle ways, whereupon most of the 'negative activity' is tantamount to spam or annoying behaviour, and not anything we might normally consider 'abuse'.
That's an incredibly different challenge and that's only two small artifacts of what they are doing.
Twitter is not rocket science, but it's not trivial either.
Also consider that R&D is usually maybe on 20% of overhead - yes - it takes 'all those other jobs and expenses' to run a company.
Wasn't WhatsApp on track to be cash flow positive?
I know I at least was shouting at them to take my money: it was the perfect HN product, reasonably priced, technically superior and with no ads or tracking.
Twitter has ads serving infra, recommendation systems (timeline, notifications, events, users), user generated events, prediction systems (ads), user graphs. The complexity is from processing and persisting exabytes of data in company owned datacenters. eg. Twitter stores images, videos, user events, user data, tweets/replies. WhatsApp has little persistence outside of metadata maybe? But your messages are not stored in a FB datacenter and if they are I'd be concerned. You can read about their infra in their blog. Comparing p2p messaging versus a distributed social media site with mountains of data and years of iteration in ML systems does not make sense.
Vastly more innovative and scaled crazy fast without fail whales or anything.
Or look to Telegram today. Delivering a vastly more complex product with a fraction of the company size it seems.