100% this. There is a glass ceiling to the quality of a search engine if it's free; it starts with G.
The paid option hasn't been explored yet, and for good reason I think: in principle, you need training data for it to be any good. And, again in principle, the only way to amass user data is for the service to be free, leveraging that to sharpen the tool.
So in principle, I reckon this is doomed to fail. But I might be wrong. I HOPE I'm wrong. And that's enough.
Personally, I don't have a problem with a service using aggregated usage data to improve their algorithms, even if that is technically "tracking" me. It's the selling of personalized segment data that bothers me.
ohduran probably means that there is no a priori logical reason for the two to go together. In theory they could be separated. However, it is far too enticing of a profit opportunity to use aggregated data if one has it en masse to sell personalized data.
I happen to disagree; almost any for-profit business is going to be doing some sort of aggregated usage data. I mean at the most basic level they've got to be tracking the number of customers they have. That doesn't mean all for-profit businesses ultimately devolve into data selling businesses.
Although perhaps ohduran is advancing a more nuanced argument. In particular perhaps the more detailed usage data you track, the more likely the siren call of selling that data is to be attractive. In order to compete with Google on search quality, perhaps you do need sufficiently detailed usage data that the call becomes irresistible.
I'm still not convinced that's true, but I could see how it plays out.
Oh wow, perhaps I was too terse and left too much room for interpretation. I meant that there is no way for a for-profit company to eventually sell personalized segment data once it has it, even if there were initial promises not to do so.
In that regard, the "siren call", as dwohnitmok says, it's a very appropriate way of encapsulating what I meant. You can be bold and not do it, but as soon as you have investors, they are going to demand it , pressure you into doing it, and if you do not comply replace you with someone who will not be sitting in a potentially profitable line of business and do nothing.
That's not really true. Google & Facebook only sell targeting for a reason: it's more profitable than selling the data itself. Why would you sell the user data you worked so hard to collect when you can sell targeting on it again and again? It's actually in Google & Facebook's interest that no one except them have data on you.
What kind of training are the users providing that makes G better? I thought their secret was that they have better infrastructure to crawl and organize information?
I don't see how a paid search engine has a disadvantage here.
One very simple metric to improve search results is testing how long a user visits a site. When users search for something, click a link and return to google seconds later you can assume that the result did not match what the user was searching for.
Because they're so dominant they can make changes to the system that make it worse. Haven't you noticed the decline in quality of Google search results over these past few years?
i find google is useless at this. They throw out irrelevant results that the Wise Men of Google think you want to see, or that they'd like you to see. DDG pay more attention to your wording. The drawback is they have fewer indexed pages.
I'd also wager this is probably the most useful or close to the most useful metrics you can use. With this metric, plus the user's persona (male or female, teen or elderly, and so forth), you have a fairly accurate user driven ranking system.
because the underlying assumption is that what they'll tell you is the truth, and that's not necessarily the case. Think of a Firefox plugin in, AdNauseam style, that always says NO.
But there's nothing stopping the same people from gaming existing logic that tracks user behavior except security through obscurity. But you also get dirty data via tracking where it's indistinguishable from backend if user found what they want or just gave up on trying for example.
It's a good point. I'm no expert, so take this with a grain of salt, but assuming that it's just a matter of infrastructure, then Bing wouldn't suck so much. Microsoft has the means, the engineering power and the incentive to crush a direct competitor. And yet, it sucks.
So in practice, the more data you have, the better the engine is. I don't have a theoretical reason for why that is the case, but thing is I don't actually need it.
Every time you click a result link, and every time you bounce back from that link, probably also scroll position and hovering, you are providing potentially useful training data.
One possible upside is the Metafilter principle: If you charge $5, you get a higher quality signal by excluding a lot of chaff. The probability that your search engine user is human gets much closer to 1, and you save a lot (but not all) of the anti-abuse effort. This gives you better signal on which websites are interesting, so you need possibly orders of magnitude less data to do a good job.
The paid option hasn't been explored yet, and for good reason I think: in principle, you need training data for it to be any good. And, again in principle, the only way to amass user data is for the service to be free, leveraging that to sharpen the tool.
So in principle, I reckon this is doomed to fail. But I might be wrong. I HOPE I'm wrong. And that's enough.