Actually if you use the same interface for recommendation systems for content (s...

derefr · on May 3, 2024

> it's transformative to treat recommendation as a classification problem the way TikTok does

Treating recommendation as a classification problem has all sorts of flaws, mostly down to normalizing the user's indications into a larger model. People will be more passive sometimes and more engaged at other times. Sometimes they'll blindly accept and watch everything that comes along (or blindly swipe right on every match, etc); other times they'll be very choosey.

It's kind of the same problems that asking users for star ratings has — people interpret the meaning of "one star" or "three stars" or "five stars" differently; and there are selection biases on when people will bother to vote at all. (The Netflix Prize was a challenge for a reason!)

You can avoid all these temporal and selection biases, though, because content recommendation is actually ultimately a ranking problem: any recommendation algorithm ultimately wants to generate a subjective ranking of its universe of items, such that everything gets put in a sequence, with the thing the user would most like to see next, first.

And conveniently, we already have a mathematically-perfect way to do ranking: ELO. Or, in app UX terms: the classical HotOrNot design, where you're presented with exactly two options, and you have to pick one as being better than the other. And you get to see the each option several times, paired against different other options each time.

In user-engagement terms, a HotOrNot interface would be just as much of a "dark pattern" as a Tinder interface. But in recommendation-tuning terms, the backend of a HotOrNot-UX app could build you a subjective scoring function that's much more accurate, much sooner.

> you can't conclude anything at all because a user didn't click on an item

The real negative signal that sites like YouTube (and to a degree, TikTok) use, is landing on something though a more passive engagement action (e.g. auto-play next) and this inducing engagement in a previously-passively-consuming user to "get away" from this content.

PaulHoule · on May 3, 2024

Classification gives you calibrated scores that you can use together with other information. A classifier may not be the ideal recommender by itself but it is a good component.

A probability score for “will he like it?” works as a ranking score in my experience with some caveats which aren’t so much about the score as a score but that recommendation is really a sequential problem. That is, if I get different versions of the same news article that all score 0.9, it might be OK to show me one or two articles from that list. I believe people’s satisfaction with an article is greatly influenced by being spammed with too much of the same thing and that is not so influenced by ranking scores.

I’ll go so far as to say full text search should also be treated as a classification problem, in particularly you need a probability score if you want to make a service like “Google Alerts” where you tell people about new marching documents. Also if you are trying to combine several radically different searchers (like IBM Watson did back in the day) the probability score is essential.