one I've been wondering about is user overview pages. People use those a lot (it's actually my bookmark for getting onto reddit) and yesterday I noticed that a post I made wasn't in my overview, and it's because that sub had gone dark early.
What happens when a user has 99% of their posting in subs that are now hidden, and the API is programmed to produce a fixed 30 comments of history on the overview page? The answer is extremely deep database pulls... you might pull a year of comment history to get 30 comments that aren't hidden. And depending on how they do that, it may actually pull the whole comment history for that timespan, since most of the time posts aren't hidden like this.
I worked at a backend team at work with some very overburdened legacy tables in mongo, and this is the kind of thing we'd think about. Yeah you can use an index, but then you have to maintain the index for every record, and change it every time a sub goes private/public (and we literally were hitting practical limits on how many indexes we could keep, we finally instituted a 1-in-1-out rule). And how often does that happen? Even deleted comments are overall probably a minority such that indexes don't matter, but, this is relational data, you have to know which subreddits are closed before you can filter their results, and mongo sucks at joins. And the mongo instance can become a hotspot, so, just filter it in the application instead for those "rare" instances. Even if they are doing it in mongo, the index/collection they're joining may suddenly be 100x the size, which could blow stuff up anyway.
edit: for me, one overview page is now taking me back one month in comment history. And I comment a lot on subs that are currently closed, so it could easily be throwing away 5-10 comments for every comment it displays.
What happens when a user has 99% of their posting in subs that are now hidden, and the API is programmed to produce a fixed 30 comments of history on the overview page? The answer is extremely deep database pulls... you might pull a year of comment history to get 30 comments that aren't hidden. And depending on how they do that, it may actually pull the whole comment history for that timespan, since most of the time posts aren't hidden like this.
I worked at a backend team at work with some very overburdened legacy tables in mongo, and this is the kind of thing we'd think about. Yeah you can use an index, but then you have to maintain the index for every record, and change it every time a sub goes private/public (and we literally were hitting practical limits on how many indexes we could keep, we finally instituted a 1-in-1-out rule). And how often does that happen? Even deleted comments are overall probably a minority such that indexes don't matter, but, this is relational data, you have to know which subreddits are closed before you can filter their results, and mongo sucks at joins. And the mongo instance can become a hotspot, so, just filter it in the application instead for those "rare" instances. Even if they are doing it in mongo, the index/collection they're joining may suddenly be 100x the size, which could blow stuff up anyway.
edit: for me, one overview page is now taking me back one month in comment history. And I comment a lot on subs that are currently closed, so it could easily be throwing away 5-10 comments for every comment it displays.