I never worked at this scale, but could it also be that different subs are horiz...

q7xvh97o2pDhNrh · on June 12, 2023

Good question! And few people get to work at this scale, so it's not an unreasonable guess. I'll join you in speculating wildly about this, since, hey, it's kind of fun.

IMHO sharding traffic by subreddit doesn't pass the smell-test, though. Different subreddits have very different traffic patterns, so the system would likely end up with hotspots continuously popping up, and it'd probably be toilsome to constantly babysit hot shards and rebalance. (Consider some of the more academic subreddits vs. some of the more meme-driven subreddits — and then consider what happens when e.g. a particular subreddit takes off, or goes cold.)

Sharding on a dimension that has a more random/uniform distribution is usually the way to go. Off the top of my head (still speculating wildly and basically doing a 5-minute version of the system-design question just for fun), I'd be curious to shard by a hash of the post ID, or something like that. The trick is always to have a hashing algorithm that's stable when it's time to grow the number of shards (otherwise you're kicking off a whole re-balancing every time), and of course I'm too lazy to sort that out in this comment. I vaguely remember the Instagram team had a really cool sharding approach that they blogged about in this vein. (This would've been pre-acquisition, so ancient history by Silicon Valley standards.)

As for subreddit metadata (public, private, whatever), I'd really expect all of that to be in a global cache at this point. It's read-often/write-rarely data, and close-to-realtime cache-invalidation when it does change is a straightforward and solved problem.

adra · on June 12, 2023

For really really large sets, you'll still eventually want to reduce read compute costs by limiting specific tenants to specific shards in order to reduce request fan out for every single read request. If say I get a super quiet forum, would it make sense to query 2 shards or 6000? Clearly there's a loss of performance when all read requests have infinite fan out.

dbanon9 · on June 12, 2023

A good, but wrong, assumption is to assume Reddit's engineers know what they're doing.

The founding Reddit team was non-technical (Even Spez. I've been to UVA; it's not an engineering school; and spez had never done any real engineering before coming onto Reddit). They ran a skeleton crew of some 10ish people for a long time (none from any exceptional backgrounds. One of them was from Sun & Oracle, after their prime).

Same group that started with a node front-end, python back-end monolith, with a document-oriented database structure (i.e., they had two un-normalized tables in Postgres to hold everything). Later they switched to Cassandra and kept that monstrosity -- partly, because back then no one knew anything about databases except sysadmins.

Back then they were running a cache for listing retrieval. Every "reddit" (subreddit and the various pages, like front-page, hot, top, controversial, etc.) listing is kept in a memcache. Inside, you have your "top-level" listing information (title, link, comments, id, etc.). The asinine thing is that cache invalidation has always been a problem. They originally handled it using RabbitMQ queues: votes come in, they're processed, and then the cache is updated. Those things always got backed up, because no one thought about batching updates on a timer (or how to use lock-free) (and no one knew how to do vertical scaling, and when they tried, it made things even harder to reason about). You know what genius plan they had next to solve this? Make more queues. Fix throughput issues by making more queues, instead of fixing the root cause of the back-pressure. Later, they did "shard"/partition things more cleanly (and tried lock-free) -- but they never did any real research into fixing the aforementioned problem (how to handle millions of simple "events" a day... which is laughable thinking back to it now).

That's just for listings. The comment trees are another big bad piece. Again, stored un-normalized -- but this time actually batched (not properly, but it is a step up). One great thing about un-normalized databases and trees, is that there are no constraints on vertices. So a common issue was that you could back-up your queue (again) for computing the comment trees (because they would never get processed properly) (and you could slow the entire site to a crawl because your message broker was wasting its time on erroneous messages).

Later, they had the bright idea to move from a data center to AWS -- break everything up into microservices. Autoscaling there has always been bungled.

There was no strong tech talent, and no strong engineering culture -- ever.

-------

My 2 cents: it's the listing caches. The architecture around it was never designed to handle checking so many "isHidden" subreddits (despite that they're still getting updates to their internal listings) -- and it's coming undone.

onemiketwelve · on June 13, 2023

I read this as a pretty scathing dressing down of incompetent engineering at reddit. But after having breakfast, what I'm realizing again is that perfect code and engineering are not required to make something hugely successful.

I've been parts of 2 different engineering teams that wrote crap that I would cuss out but were super succesfuly, and most recently I joined a team that was super anal about every little detail. I think business success only gets hindered on the extremes now. If you're somewhere in the middle, it's fine. I'd rather have a buggy product that people use than a perfect codebase that only exists on github.

eloquenceN · on June 14, 2023

Agreed. In fact, I believe success stories actually skew the other way. Those that actually build something that gets off the ground and is successful will in many cases not have the time to write perfect code.

andreareina · on June 13, 2023

https://en.m.wikipedia.org/wiki/Consistent_hashing