The world needs a non-profit search engine

javajosh · on July 10, 2022

>Google makes $40bn...If I can create something that just a tiny fraction of people find useful, then I can create a huge amount of value.

You conflate two meanings of value: monetary value, and intrinsic value. Search engines are intrinsically but not monetarily valuable to users. Search engines are monetarily, but not intrinsically, valuable to advertisers. You can get into trouble when you conflate these two meaning of "value".

In fact, right here is the pivot on which the internet goes from an idealistic shang-ri-la for geeks, to a commercial hellscape for the unwashed masses. It is surprisingly easy to create intrinsic value with computers! You see it all day every day on HN: some geek had a thought, spends a weekend making it, and then deploys a solution.

It is surprisingly hard to extract monetary value from an intrinsically valuable solution. In fact, I believe that creating artificial scarcity is the hardest part of building an internet business, requiring invention on par with the intrinsically valuable part - and yet its the very thing that idealists rail against.

(And making something artificially scarce does seem morally repugnant. And yet I don't see any other way to pay developers. Full stop. Open source software + consulting fees is a good way to go, but that can't apply to hosted search for the public. Well I guess it could, you could teach businesses how to game your own engine!)

freediver · on July 10, 2022

> Search engines are intrinsically but not monetarily valuable to users. Search engines are monetarily, but not intrinsically, valuable to advertisers.

I”d like to offer a different view, one that thousands of subscribers at Kagi search can hopefully stand behind too. (Kagi founder here)

Searching has a monetary value and a cost, the only question is whether the user is paying for it or a third party is paying for the user (a cringy thought when you think about it).

This question is answered by the very business model of the search engine, which determines who its customer is. It can either be its users (like in the case of Kagi and mwmbl) or advertisers (like most other search engine).

Although it is really hard to break through the habit of getting search for "free", we are at least happy to be able to offer this choice to the consumer today. I am expecting to see many more paid search engines in the future.

techdragon · on July 10, 2022

As someone who has been architecting a paid “deep search” tool for a couple of years now ( my approach is trying to enable very deep search flexibility, like you can grip files on your hard drive, but balancing that against usability at internet scale is very hard ) I have been on board with the idea the “search is valuable enough to end users that they will pay” for awhile. The issue isn’t value, it’s marketing it successfully enough to gather users at a rate that covers costs. I haven’t launched my project even in private alpha to anyone because it still costs too much per search, for other search engines it will cost less and hopefully we get a healthy ecosystem of search tools eventually.

Based on what I’m seeing in google search quality… their days of unquestioned dominance died a few years ago. The king is dead, the prince (bing) is on vacation somewhere, and the crown is up for grabs.

mistrial9 · on July 10, 2022

How is 'kagi' related to CEO Kee Nethery ?

amelius · on July 10, 2022

> It is surprisingly hard to extract monetary value from an intrinsically valuable solution.

This is like saying that our entire economic system is backward ...

throwawaymaths · on July 10, 2022

I have no idea what gp means by this. It can be surprisingly hard, but it's not always. Go buy a cooler of water bottles and sell them for 50c in the summer on the side of the street

WJW · on July 10, 2022

It seems pretty obvious in an internet context. It's very difficult to make money with a product if the competition is giving their products away for free because it gets paid in a different way.

Even your "sell water on a hot day" idea probably won't sell a lot if you set up shop right next to an enormous promo stand from a global bottled water company that gives away bottled water for free. (And due to the magic of the internet, every spot you pick is right next to a global competitor with deep pockets)

p2hari · on July 11, 2022

No, even then it is the same concept. The big global bottled company thought that the intrinsic value of the water is more than 50c or maybe more than 50$ . Now you have bunch of people carrying their water bottles around, giving them even more advertising. People think that water bottle company is what I should get based on those carrying it around. So infact, they are giving away water bottle for free, which compared to competitors might be 50c or 50$. but they think it is worth losing that for ad

throwawaymaths · on July 11, 2022

Ok your explanation makes sense but I don't see how what you said related to the single qualifier presented: being intrinsically valuable, and gp states it as if it were an accepted rule of thumb for all economics, not specific to the internet.

jacobmischka · on July 10, 2022

Those sell for $5 in some places.

ItsMonkk · on July 10, 2022

It is. Capitalism relies on the basic system of supply and demand. When the marginal supply is free - the only proper price is zero - anything else isn't pure Capitalism. And so as free would mean no incentives, we give an even worse of a hack to deal with it, ownership.

A better way to deal with this is to discover what the intrinsic value for the product is, allow the creator to give it away, and then subsidize the creator the value they generated.

The nice thing with this system is that we can transition to it really easily. There are already many Open Source projects that exist, all we need to do is ear-mark a certain sized pot, figure out the weights of existing products, and hand it out. And as the economy gets stronger, allow the pot to grow, and eventually there might not be a thing as closed software, as open software will always be able to generate more value than closed.

javajosh · on July 10, 2022

There exist grant programs at most institutions designed to support such work and precisely for that reason. Sadly, writing grant applications are a job in-and-of-themselves. Consider the case of Justine Tunney, the creator of redbean. If anyone deserves a grant or three, it's her. But she's too busy making cool shit to a) research which grants are available, and b) satisfy their onerous application requirements.

Obviously the problem is that those in charge of giving away money are very passive, and in fact put up large barriers (the onerous application process) for anyone wanting the money. This, to me, is absurd. The admins of such a fund should themselves be active in one or more areas of interest, such that they themselves should be approaching people like Justine and telling her, "Here, please take my money."

I don't know much about grants, but like most things that are broken I suspect there are perverse incentives here. For example, someone is probably investing that grant money, and if they give it away the money leaves the market, and that will make the admins sad. (I don't know if that's the case, but that's the kind of thing I would expect to see.)

Oh, another surprisingly effective thing people are doing are subscriptions to people you like. You see this particularly on twitch, and sometimes on youtube. People can and will give money to the creators they like! It's kind of amazing to me. I suppose the analog for people like Justine is Patreon. It's fascinating though that people will happily subscribe to an entertaining personality, but will happily ignore someone who toils at making incredible things behind closed doors (but still use their tools). The obvious solution is that all open source projects should become a source of entertainment.

ItsMonkk · on July 10, 2022

Yep, you have outlined the problems with grants. As they look forward, and due so in a one-time lump sum, they hold considerable risk, and thus the providers of the grant money need to ensure they pick correctly, but this just leaves us with a centrally controlled economy and bureaucratic nightmare.

Instead, you should seek to discover the intrinsic value as it is occurring, and therefore no risk to the granters. I believe it is possible to do so through a modified Vickrey auction. A Vickrey auction is one in which the winners pays the price the second place bidder placed. In this modified system, the top 90%(or some other optimizing number we can work out) of bidders win, and they all pay the price of the highest non-winner. A certain subset users of the Open Source software are asked to give their true value for some sub-set of open source products as they use it and using statistics we could extrapolate that to the rest of the population.

We would then use these weights to give away all grant money. This might mean that Firefox can be funded exclusively through this pot, and they would no longer be beholden to Google for default Search. It would mean the grant writing process would go away. It would mean, as ads make products less valuable, that ads go away.

javajosh · on July 10, 2022

> a modified Vickrey auction

So, fund people based on github stars. I could think of worse solutions, honestly!

Oh, something else I remembered: Princeton's Institute for Advanced Study! This is an interesting idea. Basically they tell a scientist "You've changed human history, the least we can do is give you an income for the rest of your life that has no strings attached. You can teach, or not, research, or not." It would be cool if each of FAANG had one, or they pooled together to make one, or had a virtual distributed version. "Dear Fabrice Bellard, we will be depositing 10k euros a month into your account until you die. Yours, Google."

hrbf · on July 12, 2022

Given that I have a GitHub repository with 26k+ stars, what do you think an equivalent amount of money per month that should work out to?

elcomet · on July 10, 2022

This assumes an infinite number or clients. I don't agree with this. If the marginal cost is zero, the proper price is definitely not zero.

thatwasunusual · on July 10, 2022

First an foremost it says that "the search engine economy" is f...ed up.

spencerchubb · on July 10, 2022

I would also add that human attention is inherently scarce. Advertising businesses like google and facebook leverage this fact.

Google provides unlimited searches to users without charge. On the other side, google provides human attention to advertisers, and charges for it due to the scarce nature of attention. Sort of like a marketplace business.

jart · on July 10, 2022

I don't think artificial scarcity makes sense as a mental model here. A better way to make money is probably to just find some way where people need you. When you contribute your work to the commons through open source, it earns you a lot of love and admiration. However people don't need you. It's kind of the whole point. The biggest technological services in our society that are the most crucially needed, were all built on open source. Since the whole point of open source is to give you the power and opportunity to build things you can control yourself.

The people who build stuff with open source are very smart, so they usually don't need foss devs as consultants to explain to them how to do it. I make some of the most popular projects on this website. Been doing it for years. I get plenty of donations because my work makes people so happy. But no one has ever offered to pay me for something in return, since I don't have anything with any economic value, other than me myself. So I get plenty of job offers from people who would love to be able to say I'm their employee. Since controlling people is about as valuable as controlling the services people need.

It'd be nice if our cultural mythology about earning your keep doing an honest day's work through fair trade was how the system worked, rather than me needing to depend on the gift economy that's funded by control. But I think there was just so much abuse of the classic models of human cooperation that they just couldn't transition to the digital era, and as such we need to find a way to adapt.

solardev · on July 10, 2022

IMHO only, as an idealistic aside: It would be lovely to see a society in which programming was done as civil service, in a program similar to Americorps or Peace Corps or what the old CCC did: provide public infrastructure using public funds.

Programming reminds me of other professions with high inputs but low per-unit-cost outputs: teaching, music, movies, art, journalism, etc. -- basically anything that is create-once, share-often. All those (for the most part) are things that America's shortsightedly capitalist economy fail to adequately incentivize/reward unless you happen to become a celebrity.

There is already, and will continue to be, a class of human labor output that is intrinsically valuable but which our economy is unable to adequately price. I'd argue that's more an issue with the speculation-driven economy that we have than with the labor in question. It'll only get more drastic we we automate more and more and further amplify human creativity.

It'd be cool to see a nonprofit search engine/email service/office suite/whatever funded in a similar style as NPR, but of course that'd run into political issues at every level.

In an utopian cyberpunk future, what if there were multiple voluntary nonprofit "shadow governments" that you can choose to tithe every month, almost like churches? You can choose between e-governments red, blue, green, yellow, purple, gray, whatever... give them 1% of your income a year in exchange for a suite of services run and staffed by professionals who are salaried but own no equity; they work as a form of civil service, not a wealth-building scheme. Syndicates for the public good, I suppose. Lol, in all the video games, these usually turn into private military companies with killer androids, but what if they just, uh, provided really good email (and automatic online driver's license renewals) instead?

Such systems would probably never be able to attract the best talent (unless they turn into something like Mozilla, which is a big enterprise masquerading as a nonprofit), but you often don't NEED the best. Wikipedia, the sum total of human knowledge, is also the sum total of human mediocrity (with plenty of redundant efforts, infighting, unoptimized problems, etc.). But having stable collaborative communities is something that is critical for producing works of intrinsic value, and developing slow-trickle funding streams for such communities -- the kind that can sustain without turning them into potential get-rich-quick schemes, or subject them to violent boom-bust cycles -- is what allows them to both keep working effectively, AND keeps away the exploitative get-rich-quick types looking to subvert public labor for personal gain.

We need a funding model that provides enough income to attract people who want to do it for the public good, but not so much that it also attracts people who want to turn it into personal wealth. In high-input, low-per-unit-cost services, optimizing for maximum (as opposed to sufficient) monetization will too often mean that the users themselves become the product, as we see time and again with Google, Facebook, Twitter, and basically the entire modern online economy. It's the difference between Amazon and your local library, Oracle and Postgres, EvilOS and Linux, etc.

Things don't have to be artificially scarce if we can learn to ask, "How can I make this as widely accessible as possible while ensuring I have my own basics needs covered?" instead of "How can I make this as monetarily valuable as possible?"

There are always going to be people who want to just do things to make the world a better place, just as surely as there are always going to be people who want to optimize for personal profit. We have every system to fund the latter, but not so much the former right now. Is the for-profit business model the only, or best, way?

kolinko · on July 10, 2022

Perhaps a better approach would be building an open source www index or even a full current cache - as an enabler for people to build their own search engines?

Right now it is extremely difficult to build your own web crawler that would compete with Google. And that is not because of the technology, but because multiple sites will prevent your bot from accessing them if you're not Google or Bing - either through robots.txt, or through directly banning your IP if it's trying to crawl and it's not a confirmed google-bot.

Having a non-profit, open source, crawler that keeps an up to date index (or web cache) of the web would help competition spring up.

closedloop129 · on July 10, 2022

Why is https://commoncrawl.org/ not enough?

dewey · on July 10, 2022

Isn't that one more data sets for ML other research purposes instead of a highly up to date search index (For example with news from a few minutes ago).

jka · on July 10, 2022

Roughly speaking, yep - Common Crawl provides a sizable chunk of web data (420 TiB uncompressed, over 3 billion unique URLs, as of May 2022; historic statistics here[1]), and is updated on monthly basis. Not near-real-time, true, albeit relatively fresh.

A question to ask could be: how often do users care about information from a few minutes ago, compared to information that has been available for a longer duration of time?

[1] - https://commoncrawl.github.io/cc-crawl-statistics/

flexie · on July 10, 2022

Isn't that more a question of adding to the mix frequent scraping of

- a few thousand news-sites (like nyt.com, bbc.co.uk),

- a few thousand very popular blogs (based on what influencers people search for),

- a handful of social media sites (e.g. Twitter),

- a few hundred databases in areas like weather, airlines, sports (like ATP for people who look for Wimbledon results today)?

sudodude · on July 10, 2022

I mean, any time someone wants information on current or recent events is your use case right there. If you exclude news entirely, you could maybe disregard recent websites but I imagine that's statistically a pretty large portion of search.

kordlessagain · on July 10, 2022

I built something with an API that uses Selenium to image a site. It works on a large percentage of the sites I feed it.

I don't recursively call links found in the pages. I expect the user to give me the URLs to crawl and save.

In order to "find" new content, I let the user specify where they want to search for things the engine hasn't "crawled" yet. So, a search for scooters to buy might end up searching Amazon directly, then lets the user "save" the site by passing the Amazon URL for the scooter they like to the system for imaging.

I use GPT-3 or other ML models to do some of the heavy lifting for adding labels to the pages or documents the user uploads.

This ends up being a "curated" list of documents important to the individual user, not an exhaustive crawl of all things which are important to all users.

nix23 · on July 10, 2022

> Perhaps a better approach would be building an open source www index or even a full current cache - as an enabler for people to build their own search engines?

That's a excellent idea! In the spirit of open-data, and people can do with it what they want.

artificial · on July 10, 2022

I think this is a great idea. How does this work with copyright? Search engines seem to be able to download a reproduce content from scraped pages (and wrap it in ads, and derive content from it) this is called “indexing” when they do it but scraping when everyone else does it.

bosie · on July 10, 2022

> and wrap it in ads, and derive content from it

i am probably missing something but can you give an example where this happens?

Flashtoo · on July 10, 2022

E.g. on Google if you search for "how to tie a tie", a little info box may pop up with step by step instructions. This content is taken from some website, but that website gets no page hits or ad revenue. Instead, Google gets to serve ads on the search engine results page.

(I don't know if this happens for this specific example, but Google does this for some searches)

jefftk · on July 10, 2022

> that website gets no page hits

Part of why sites participate in the infobox program is that in practice you do get quite a lot of hits from it: many people click through to see the answer in context.

bosie · on July 10, 2022

Ok, I just tried but don't see that info box but that is exactly what i was asking for. Thank you very much, did not think of that.

zo1 · on July 10, 2022

I think they're referring to how Google "extracts" answers from your website and shows it on the search results page. Effectively meaning that the user doesn't even need to go to your site to get the answer, because Google extracted it and gave it to them directly.

thfuran · on July 10, 2022

It seems to me that what they usually extract is some junk only vaguely related to the query and often cut apart and reassembled in a way that's just wrong.

6510 · on July 10, 2022

> Right now it is extremely difficult to build your own web crawler that would compete with Google.

Nah, not a small task but you can break it down into well understood problems that have known solution.

The hard part is ranking everything.

kolinko · on July 16, 2022

Are you speaking from theory or experience?

If from experience - how did you get around multiple sites disallowing crawlers other than google or bing?

jimnotgym · on July 10, 2022

Yes, the article could better read, 'Why the world needs a range of non-profit search engines'

daoudc · on July 10, 2022

The data is available! Not currently documented but check out the code for the API.

onionisafruit · on July 10, 2022

I'm curious if you think a co-op would be feasible between Mwmbl and other like-minded crawlers who are interested in taking a divide-and-conquer approach to crawling the web.

gitgud · on July 10, 2022

> Perhaps a better approach would be building an open source www index...

Wouldn't Google just use this too? Which would give Google in greater dominance over alternatives...

larsrc · on July 10, 2022

Disclaimer: I work for Google, though far away from Search.

Regardless of search engine design, there's HUGE money in SEO. Any successful search engine will be gamed. Do you have the developer power to go red-queen against all the large companies in the world?

O__________O · on July 10, 2022

For clarification, “red queen” means a conflict between two or more entities where the cost of engagement grows, but the relative advantage does not change.

Simply put, search engines have been at war with SEO for over 30-years, which has significantly raised the bar not only being a search engine, but producing content; not to mention knowing how to search for information. With the introduction of machine generated content, information wars between countries, global dependence of online commerce & information, etc — the speed of change shows no signs of letting up.

In my opinion, for the average person, knowing how to search for information is the real issue, not that the quality of information available has become worse or that Google has become a worse search engine. If anything, Google has reduced its advantage search capabilities not for financial gain, but because average user is just too lazy to learn how to search and keep up with changes required to continue to be an advanced searcher.

IgorPartola · on July 10, 2022

To your last point, I don’t quite agree. Google’s incentives are misaligned such that keeping you on Google.com just a little longer is better than not because you are more likely to click on an ad.

But also yes the users and the UI both fail. When I used to search for something I would type in something like “gutter clog clean” but slowly started noticing that Google likes longer sentences like “how do I clean a clog in my gutters?”. In pursuit of making Knowsmore (from Ralph Breaks the Internet), Google lost the power user features. Search would be infinitely better if they actually fucking respected literal mode and stopped trying to treat me like an idiot with no attention span. Having search results that contain one out of like 8 words in my query and asking me if I want to include others and then when I say I do still showing me results without them is broken UI and not a user problem.

larsrc · on July 24, 2022

Google search, from very early on, considered it a success metric when users went quickly to a result. I have no idea how that factors into the current surely hideously complex ranking algorithms, though.

As for the parsing of queries, that's probably based on how most users use search. Not everyone is familiar with keyword -based search. I expect they've done tons of A/B tests to determine what kind of query interpretation makes most users get better results. We're just not "most users".

mrkramer · on July 10, 2022

>Search would be infinitely better if they actually fucking respected literal mode and stopped trying to treat me like an idiot with no attention span. Having search results that contain one out of like 8 words in my query and asking me if I want to include others and then when I say I do still showing me results without them is broken UI and not a user problem.

Use Verbatim?

O__________O · on July 10, 2022

Agree verbatim in this specific situation is likely answer, though did not point it out since they clearly think they understand how to search; verbatim has been an option as long as non-verbatim search has been used by Google.

Beyond that, complaining Google does not do XYZ misses the point. Google is a search engine designed for the average user and the average user does not want verbatim search. They also do not want: advanced search operators, true Boolean search, regular expressions, API access to search, open source code, real-time streams of pages Google’s crawling, etc.

What they do want and always have is natural language based searches in there language of preference with clarifying responses from the search engine in natural language; that is, they want to treat a search engine like a person and be treated like a person; which was odd that they referenced Knowsmore, since Knowsmore [1] used keyword based searches, not plain language searches.

Google is not the primary problem, the average user is the issue. Unless people realize that — they’re fighting in a war they do not even understand.

To make it even more clear, Google is easily able to detect and block users blocking ADs, but they do not. More than 60% of users still don’t block ADs; not because they love ADs, but because effort to figure it out simply is not worth it to them, they like ADs, etc.

[1] https://m.youtube.com/watch?v=T3wiGSXbeQE

mrkramer · on July 10, 2022

>What they do want and always have is natural language based searches in there language of preference with clarifying responses from the search engine in natural language; that is, they want to treat a search engine like a person and be treated like a person

I agree with you but Google is not yet at that point where it can act and serve people like an Answer Machine that knows everything; both the people's preferences and the perfect answers.

>Google is not the primary problem, the average user is the issue. Unless people realize that — they’re fighting in a war they do not even understand.

Again I agree that casual users are the problem but how we can help them? This is the The Innovator's Dilemma[0] where if we ask casual users what new stuff they want from Google Search, they will answer "nothing". Because even they themselves don't know how their UX can be or should be improved and on top of that they are satisfied with Google's mediocrity. They would just respond "Google is Google".

>Beyond that, complaining Google does not do XYZ misses the point. Google is a search engine designed for the average user and the average user does not want verbatim search. They also do not want: advanced search operators, true Boolean search, regular expressions, API access to search, open source code, real-time streams of pages Google’s crawling, etc.

Complexity of constructing "complex" search queries needs to be simplified so casual users can use such features and queries.

[0] https://en.wikipedia.org/wiki/The_Innovator's_Dilemma

Gareth321 · on July 10, 2022

>To your last point, I don’t quite agree. Google’s incentives are misaligned such that keeping you on Google.com just a little longer is better than not because you are more likely to click on an ad.

I agree, which leads me to the conclusion that subscription is the best way to avoid this conflict of interest. Unfortunately, most of the world won't subscribe to a search engine, and doesn't seem to mind ads - to a degree. With Google looking more and more like AltaVista before its demise (to Google), my conclusion is that Google will strangle itself out of existence and give way for the next "new, streamlined, not-full-of-ads" competitor.

rntksi · on July 10, 2022

Here's a search engine that I'm subscribed to: https://kagi.com

In the 20-30 searches that I do in a day, I still have to google about half of them. Either because it's stuff Google does well (currency conversion, for example), or Kagi just doesn't get what I'm trying to search.

I remember starting out with the Internet searching on Altavista and Yahoo and Lycos. The information that was present was nowhere near as now, and it was more "exploratory". Nowadays people just kind of know what they want and just wants to quickly get there.

freediver · on July 10, 2022

> In the 20-30 searches that I do in a day, I still have to google about half of them. Either because it's stuff Google does well (currency conversion, for example), or Kagi just doesn't get what I'm trying to search.

Currency conversion is not technically a search. It is question answering and Kagi capabilities are still being built. Google only has a 20 year headstart. Can you report all such cases to kagifeedback.org so they are on our radar?

rntksi · on July 11, 2022

Thanks, currency conversion was an example off the top of my head only. I am active on orionfeedback and kagifeedback, I find that they're really prompt and effective in answering to feedback.

The other examples are a bit harder to describe and I can't quite describe how Google gets it right. I think I might need more time to describe it out, as it involves search in another language.

usr1106 · on July 11, 2022

Currency conversion is nothing you have to sell yourself to Google for. Just bookmark a bank, a financial or an academic research site that seems trustworthy. I have used the same ones for over 20 years, probably found them using Altavista at the time...

IgorPartola · on July 10, 2022

What about one funded by universities or libraries as a research project?

There have been lots of no ads (for now) attempts. DDG had like one small ad at one point. But people didn’t leave in droves. It’s almost like people are ok with ads.

carvking · on July 10, 2022

Allow users to blacklist sites. Share blacklisted sites. Have the option of instead of hiding blacklisted sites entirely - show them in a different column.

Have 4 different types of list.

Whitelist - highest scoring

Not listed - these websites are not ranked.

Yellowlist - show but keep in a separate column

Blacklist - don't show

https://search.brave.com/help/goggles is interesting.

slazaro · on July 10, 2022

99%+ of people wouldn't use this, they would just try the search engine, see that the results suck, then stop using it.

It could be useful for power users, though.

lrvick · on July 10, 2022

99%+ of people in the medical industry did not sanitize hands or equipment 100 years ago.

99%+ of people in the tech industry currently do not care to do the extra steps required for data neutrality, and privacy.

99%+ of people are lazy to the point of harming themselves and others.

1%- of people examine how the 99%+ do things and pioneer harm reduction tactics in spite of everyone constantly reminding them that no one wants their help.

permo-w · on July 10, 2022

>99%+ of people in the medical industry did not sanitize hands or equipment 100 years ago.

this isn't true. 170 years ago, maybe. handwashing became a thing in the late 1800s after Semmelweiss and Pasteur

lrvick · on July 10, 2022

I will adjust this analogy by 70 years in the future, just for you.

viach · on July 10, 2022

99% of analogies are correct

codedokode · on July 10, 2022

Why should we care about 99% of people? They are people who upload all their personal data to social networks agreeing to "we can do whatever we want" terms, they pay with a card, use Apple's and Microsoft's spyware (which marketing people call "telemetry") ridden operating systems, and install Chrome that sends a signal to Google every time they open a new tab and sends data about every form on every website they visit (which Google developers call "crowdsourcing" in the code [1]).

Make a customizable and privacy-respecting search for us, power users.

Also, I have noticed that Firefox internal search works good for sites you had visited. So when I want to visit a page I have seen earlier, I can go straight to it skipping Google.

Also, you can click on any search box and add it with a prefix, so that you can search MDN or Wikipedia directly, again, without informing Google.

[1] https://source.chromium.org/chromium/chromium/src/+/main:com...

Brybry · on July 10, 2022

20% to 40% of internet users use an adblocker depending on whose statistics you use.[1][2][3]

SEO is basically another form of advertising, so evidence suggests to me that people would use something like this.

[1] https://en.wikipedia.org/wiki/Ad_blocking#:~:text=users

[2] https://earthweb.com/how-many-people-use-ad-blockers/

[3] https://www.insiderintelligence.com/insights/ad-blocking

liftm · on July 10, 2022

A quick, entirely unrepresentative look at the user count for uBlock Origin and uBlacklist in Firefox make me somewhat less optimistic than you are. Or is there a more popular way of blocking sites from search results than uBlacklist out there, which I simply don't know about?

Brybry · on July 11, 2022

uBlock Origin and Adblock Plus are the most popular extensions on Firefox, with each around 5 to 5.5 million average daily users over a whole year[1].

I believe the addon page is average daily users over a week or 6 days?[2]

Firefox monthly active users (unique over 28 days) is around 200 to 210 million worldwide (26 to 22 million in the USA)[3].

It's probably wrong to compare those numbers but a very naive 200 / 28 = 7.41 million average unique users per day. 5 / 7.41 = 67% (probably wrong)

[1] https://addons.mozilla.org/blog/firefoxs-most-popular-innova...

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1168642

[3] https://data.firefox.com/dashboard/user-activity

christophilus · on July 10, 2022

Add a thumbs up / down icon, and your average Facebook user would likely use it without knowing what it was.

april_22 · on July 10, 2022

More and more search engines are now giving you the option to customize search results. Brave has Goggles, You.com has thumbs up / down icon and other alternative search engines have similiar capabilites too. I have been really enjoying the ability to tailor my search to how I like it and e.g. rank reddit (although it seems like it is not liked by some folks here) higher.

skeeter2020 · on July 10, 2022

why would a search engine use this for a different purpose than FB does?

fauigerzigerk · on July 10, 2022

It could work like spam filtering. Once a certain number of users have marked a site as spammy (or put it on their blacklist), it gets downranked for everyone. But jstummbillig is right. It could easily get gamed.

mhoad · on July 10, 2022

I love how this is a problem that a huge number of people have worked on for around 20 years and have thrown stupid amounts of money at and people just jump in and boldly proclaim that they have the answers that just popped into their mind. Peak HackerNews.

christophilus · on July 10, 2022

I don’t think these ideas just popped into his mind. They’ve been discussed for a long time on HN, and some alternative search engines have started to incorporate them.

It’s worth the experiment.

_abox · on July 10, 2022

But a company like Google would want supreme control. They're not inclined to rely on a community for anything

Wikipedia shows it can work at scale.

mhoad · on July 10, 2022

Google have actually followed a similar model for like ten years now with like 10,000 temp workers following this 172 manual to essentially as the basis to train a lot of their ranking models.

[1] https://static.googleusercontent.com/media/guidelines.raterh...

grogers · on July 10, 2022

Gmail's spam filter is basically community trained by people clicking spam/not spam (at least, that was what they said many years ago, it might have silently changed). What's different about search?

permo-w · on July 10, 2022

spam filter isn't their key flagship aquila product

carvking · on July 10, 2022

But not my idea - Googles from Brave is something to look into, don't know entirely how it works - but it's in this direction.

rrdharan · on July 10, 2022

Goggles

alxjsn · on July 10, 2022

I recently made a tool to create Brave Goggles using subreddits as a signal source. I already use a netsec goggle for daily searching.

https://github.com/forcesunseen/narwhalizer

jstummbillig · on July 10, 2022

How do you prevent the gaming of blacklisting?

hnarn · on July 10, 2022

That’s the same question as “how do you ensure complete trust” which is, of course, not possible. That doesn’t mean that “distributed trust-based blacklisting” still isn’t better than what Google is offering today, which is nothing.

Kinrany · on July 10, 2022

Source lists from friends-of-friends.

allendoerfer · on July 10, 2022

I found the thought of adding a "yellowlist" funny, as some people are starting to find the use of "whitelist" and "blacklist" racist.

pizzathyme · on July 10, 2022

The proper terms now are allowlist and blocklist

fauigerzigerk · on July 10, 2022

It would be more consistent to call it greenlist, yellowlist, redlist then.

recuter · on July 10, 2022

How dare you, some of my best friends identify as traffic lights.

swayvil · on July 10, 2022

That term has been judged problematic. The appropriate term is "person of vehicular illuminativeness".

salawat · on July 10, 2022

How dare you. Some of my best friends are color blind! You insensitive clod!

swayvil · on July 10, 2022

Some people couldn't find their butt with both hands and a map. Best not to take advice from them. Unless you find yourself surrounded. In which case smile and nod.

rvz · on July 10, 2022

Don’t forget to avoid saying ‘blacksmith’ too. Another somewhat ‘racist’ one. /s

Still waiting for Mastercard to change their incredibly ‘racist’ company name. /s

Parzival99 · on July 10, 2022

lmao, feeling left out with no brown list then.

amelius · on July 10, 2022

I think Google search is using the wrong approach. When I'm looking for e.g. a new camera, I want to use my network which I trust. E.g. I want to ask "what camera would HN recommend?" We should think more about how we can use trust as a basis for how we explore the internet.

sevazhidkov · on July 10, 2022

Given that enough people use this heuristic, there will be companies focusing on earning karma on HN, writing comments and voting for products they are getting paid for.

cloverich · on July 10, 2022

While true, it is also not a given that you'd need to trust _all_ of HN. I visit sufficiently regularly that I see some people post and recognize their username. I often think -- I wish I could _follow_ them. Not so much of a stretch to think of rings of trust built around particular users. Bringing people in and kicking them out of these trust circles could play a role. PageRank -> TrustRank? Of course it would also be only one metric, among many possible trust rankings and many possible other signals and settings.

I bet the (niche) product as such wouldn't be as hard to build as it would be to scale. Imagine every user constantly tweaking (directly or indirectly) their search result settings, and having that impact millions (or more) indexed items, for every user.

kreeben · on July 10, 2022

Wouldn't the world become a little bit better if they did? Earning karma on HN is not the easiest thing. I would hate to speak for all of us but collectively, don't you think we have a pretty good marketspeak alert system here?

blowski · on July 10, 2022

I’ve got 10k karma by making pithy criticisms of Node.js and Kubernetes. Do you therefore trust my opinions on US healthcare?

kreeben · on July 10, 2022

I would, if what you say were to make a whole lot of sense, otherwise, no.

You know how it works here. We would strip you of your internet points if you start being nonsensical.

jstummbillig · on July 10, 2022

I think you are kind of ignoring the issue that parent brought up.

The hn machine is not all that smart and can easily be gamed, and would, if the stakes were to become high enough. Farming hn karma by making pointed statements on crowd favorites (parent named two, oss licensing or privacy also spring to mind) gets you your 10k, no originality or honesty required, in no time.

What's protecting hn is a lot of moderation + relative irrelevance. If those 10k were to systematically bring you enough eyes (by driving search results), you are in effect printing money. There is no reason to assume the number of people doing it would not scale with the return attached to doing it.

np1810 · on July 10, 2022

Anything (points/karma/coins) which is free & unlimited will find it's way to get exploited. "What if" there's a barter system for karma/points, you do a +1 and get a -1.

Though I do not know how the initial allotment of karma/points could be distributed for the pioneers and for the new growing community, maybe allot 'n' points for new user after a year...

DoctorDabadedoo · on July 10, 2022

Show us the way, reasonable man.

amelius · on July 10, 2022

Trust can work both ways. They'd have to be really careful to not lose the earned trust. On the way, they would have to write a lot of high quality HN posts. I'd say this model is a win over current spamming and fake reviewing practices.

lotsofpulp · on July 10, 2022

The other side of them having to write a lot of high quality HN posts is me having to read and evaluate more HN posts trying to game me. That is work, and if I sense a lot of it (like I did on Reddit), I will leave, and I suspect others would too.

Playing defense is exhausting when playing offense is extremely cheap.

amelius · on July 10, 2022

> Playing defense is exhausting when playing offense is extremely cheap.

HN is quite a large crowd, but not extremely big. An attacker must penetrate a lot of small crowds to be successful.

rocketbop · on July 10, 2022

Personally I don't ask my network or friends for options on such things because people tend to have a positive bias towards things we have invested money in.

Interestingly summarized on a Lifehacker some time ago. https://lifehacker.com/the-psychology-of-a-fanboy-why-you-ke...

I like my friends but I don't want to get my news or politics or shopping advice from them.

amelius · on July 10, 2022

The question is: is that worse than the advice from marketeers?

abenga · on July 12, 2022

Marketers' information would have a known bias that you can mentally correct for.

wanderingstan · on July 10, 2022

In 2005 I had an open source project doing this that turned into a venture backed company.

See: http://getoutfoxed.com/node/46

Some approach like this could still work, but it’s incredibly hard to maintain/define the right “network”, and across different domains. (E.g. HN probably not so good a community for latest fashions or sports trivia)

ancientsofmumu · on July 10, 2022

I put your search into SearX with a lot of the major engines enabled (Google, Bing, Qwant, Brave, DDG, etc.). Arguably, Google did a better job giving me HN results (but it's a very small sample set).

https://searx.be/search?q=what%20camera%20would%20HN%20recom...

Results 1-3: SEO "best cameras" from DDG, Qwant

Result 4: Ask HN from DDG, Qwant

Result 5: Ask HN from Google

Results 6-8: SEO "best cameras" from DDG, Qwant

Result 9: Ask HN from Google

Result 10: Camera forum from DDG, Qwant

Edit: if you're not familiar with SearX, everyone who visits will get a slightly different result based on dynamic results. Even if the same person refreshes a few times the exact results and ordering will vary; I've learned to try the same search a few times to get better results, it's just a quirk of how it works and how each remote engine reacts at that given instant.

mrkramer · on July 10, 2022

Google's end goal is Answer Machine but we are decades away from that. Answer Machine would be something like God-like software which gives you the ultimate answer to your query and that answer would be 100% accurate, true and personalized/suited for you. Again blackbox solution but it will be so advance with the help of AI that it would be trustworthy by default/design. Hard to achieve but looking at the Moore's Law and the advance of AI it will eventually be achieved.

nisa · on July 10, 2022

> Do you have the developer power to go red-queen against all the large companies in the world?

Let's try? It's also an interesting research topic in itself and might be a topic for academic research. At the moment Google is a black box and their incentives are not really aligned to stop SEO. It's good for them to show more ads, it's good for them to show you copycat pages of github/stackoverflow with ads. Not saying that Google is doing this on purpose - I doubt it - but we don't know. It's surely possible to create an index and ranking that prefers different things than Google.

Let's try something at least. It will be gamed, will be worse probably but it's open and can be a playground for academic research.

Best that can happen is that there are ways to for a better ranking and Google was dishonest to maximize profit. If it's still gamed at least the mechanics can be studied and analysed and maybe someone can figure out a switch like 'be unfair in ways Google can't' - would love a 'no ads on the page' switch that would probably solve quite a few problems.

Trust our black-box and you won't have enough devs for this is just a bad answer for such an important problem. The amount of stupid simple redirection spam in my results in the last few years also looks like that Google just doesn't care alot about this anymore.

marginalia_nu · on July 10, 2022

I think this is mostly a problem if you are in Google's position, near total market dominance.

This not only means you exert massive selection pressure on the shape of websites. The SEO spammers don't need to be good or know what they are doing, they just need to be lucky once. If they get it right, they float to the top, and can iterate on that design. This effectively is saying that no matter how secret or smart your algorithms are, it doesn't matter if you're in Google's position. The numbers are stacked against you.

To make matters worse, any company with that sort of a market share has serious handcuffs in how heavy handed and "unfair" you can be without risking litigation for anti-competitive practices.

I think the best thing that could happen for Google is ironically serious competition in the search market. This would help both problems at once.

groffee · on July 10, 2022

> Any successful search engine will be gamed.

Only because it's profitable for them to allow it to be gamed, like all the spam sites now when you search for SO, Google allows them to be ranked because they're filled with Google ads. But it'd be trivial to just delist them all, that'd be beneficial for the user but not the search engine.

It's not a matter of 'developer power' just flip a boolean somewhere and delist the site.

bsenftner · on July 10, 2022

Actually, it is entire possible to create search criteria that cannot be gamed. That is what Google Research's various arms ought to be working on. But we all know if is far more profitable to have a manipulatable system.

jonathanstrange · on July 10, 2022

Could you expand on this a bit more? I don't see any obvious way to develop non-manipulable search criteria, I was suspecting that there might even be an impossibility theorem about this (which would depend a lot on the exact formulation, though), and I'd like to know what you have in mind.

Terry_Roll · on July 10, 2022

Because of the filter bubble highlighted by Eli Pariser, there is massive opportunities for SEO companies to trick their customers into thinking they have got high up in the SE results, when all they are seeing is their filter bubble!

https://www.ted.com/talks/eli_pariser_beware_online_filter_b...

The last link someone clicks on, is usually the right link to associate with the search term for future users. Its not hard to make a search engine!

I'm not a Robot!

DisjointedHunt · on July 10, 2022

And that money goes to support many industries beyond search itself. The author really needs to get off a computer for a minute and understand the economics of the web as it stands today and how the free ad supported model supports millions of peoples livelihoods before jumping to "This is all bullshit"

vineyardmike · on July 10, 2022

While I agree that they may need to better consider the economics - both of the engine and the websites that may SEO it- that doesn’t mean we should just assume as supported is the way to go or the only way to support people. The economy of today looks different than 20 years ago or 20 years before that. Doesn’t mean we shouldn’t grow and change.

DisjointedHunt · on July 10, 2022

I respect that. I guess my point is "Grow and change with a decent understanding of what the present state enables"

A search engine like Google isn't just a search engine as the author describes it. It is a very integral part of the economy of the internet and just labelling a simplistic interpretation of the present state as "Evil" with an academically poor write up of what a viable alternative is does little good.

6510 · on July 10, 2022

I want to write articles and read articles written for me by others. Ideally as few as possible should profit from this process.

As google is now a turd, not just no longer capable of delivering this service but actively destroying the good part of the web by refusing to index it.

It is my attention, it doesn't belong to anyone else.

My access to information and educated opinion is a far more integral part of the greater economy.

Google is like a screaming man at a town meeting making sure no one else can get a word in. The meeting is now pointless.

DisjointedHunt · on July 12, 2022

Google is a catalogue. There is no physical analogy that comes close.

Your point about wanting to read articles written for you by others is certainly possible. The very fact that such a desirable outcome drives you to Google and nowhere else should suggest the complexity of the problem they’re solving and how there isn’t really anything else out there doing so well.

>Ideally as few as possible should profit from this process.

Why?

When you accepted a job offer in the software industry, did you stipulate that your mission is to write code for your employer and you will be charging as little as possible for that privilege? Minimum wage should get you by just fine, right?

I hate fully grown adults behaving as though anyone except them making a profit is somehow evil.

6510 · on July 13, 2022

> The very fact that such a desirable outcome drives you to Google and nowhere else should suggest the complexity of the problem they’re solving and how there isn’t really anything else out there doing so well.

Yes and I'm not impressed.

> > Ideally as few as possible should profit from this process.

> Why?

> When you accepted a job offer in the software industry, did you stipulate that your mission is to write code for your employer and you will be charging as little as possible for that privilege? Minimum wage should get you by just fine, right?

I'm not a good example as I indeed live wonderfully on minimum wage and write software for free.

> I hate fully grown adults behaving as though anyone except them making a profit is somehow evil.

Don't worry, my philosophy is not that superficial. We have people who make things, people who organize the making of things and people who organize the things made.

It can be true that the meta data is more valuable than the data it self and organizing an effort can be much more intense than any of the tasks involved. But lets not pretend that is always the case.

Before money and before the written word we had the exchange of thoughts, observations and ideas. I believe this to be somewhat like the foundation on which everything else we did is build. I want to see this process benefit from technology.

You wrote your comment perhaps a bit limited by the ropes of the platform but sincerely, free from any agenda, you wrote pretty much what you think.

Now if we [beyond HN] add additional layers of agendas between our exchange, each interested in maximizing their profit from it perhaps not you but many others will resort to self-moderation.

You wont be able to state it simply like: "I hate fully grown adults behaving as though anyone except them making a profit is somehow evil."

It could become something like "I don't understand why some people don't like others making money" stripped from how strong you feel about the subject. You could also chose not to say anything.

At that point we are messing with the very fabric of our collective reality.

If I had to chose between freely communicating and the economy it wouldn't be a hard choice.

Nextgrid · on July 10, 2022

> the free ad supported model supports millions of peoples livelihoods

That doesn't necessarily make it right. Online scamming also supports a lot of people's livelihoods.

DisjointedHunt · on July 12, 2022

One is a business model, one is a crime and fraud.

I’d expect you to know the difference.

Nextgrid · on July 14, 2022

Lying in a privacy policy and breaching privacy-related regulations can also be fraud and illegal. Think about why there's so much pushback against the GDPR despite it only primarily mandating transparency with regards to data usage (if they were doing things above-board why would they be afraid?).

mattpavelle · on July 10, 2022

This is a great point and a really hard problem to solve.

Do you think a Wikipedia style team of human reviewers with upvote/downvote capabilities can help reduce the SEO spam?

Obviously it’s hard to review the reviewers as well but Wikipedia seems to have done that.

philjr · on July 10, 2022

That’s a first world problem. The big problem is becoming successful.

fartcannon · on July 10, 2022

Authorship attribution AI's

TulliusCicero · on July 10, 2022

> Instead of looking at how long people spend on a site, we would encourage users to give explicit feedback on rankings and use this to improve our ranking system.

While they're not wrong about how the way Google determines ranking has its issues, this way has its own set of problems. If you explicitly use user ratings as part of your rankings in some way, people can punish sites they don't like, ala review bombing on Yelp, Steam, etc.

Not saying it's necessarily a bad idea because of that, but I hope they don't fall victim to the mentality of, "let's just trust the users" as an ironclad rule, because that doesn't always work out well.

kolinko · on July 10, 2022

Bombing is one thing, but you also have a whole SEO industry now that will exploit any way possible to get to the top of the rankings.

The moment you have community rankings on search, and your search gets popular, you land in a war zone with bots trying to mangle those. Reddit is kind of good dealing with that, but it is very resource intensive.

closedloop129 · on July 10, 2022

What if you limit accounts to real people and then keep track of their credibility? It's some initial effort but how could the ranking be manipulated when all dishonest people have burned their credibility?

kevincox · on July 10, 2022

I've always wondered if you could combat via referal-only sites. To get in you need to achir your humanness on someone else's account. If an account is found to invite too many spammers, robots, or otherwise it is banned or disallowed to invite more accounts.

I'm sure you could still manage to make "fake" accounts but it would be much more difficult, and linking them together would be much easier.

Of course starting a site like this would be very difficult. But maybe you could start without it then add it in once you get to a decent popularity such that many people can find a referral if they need to.

throwamon · on July 10, 2022

Afaik that's how https://lobste.rs works.

kevincox · on July 10, 2022

Of course it is a much smaller site so it isn't clear how effective this strategy would be at a large scale. Even a referral-based site approaching HN levels would be very interesting to see.

netheril96 · on July 10, 2022

> What if you limit accounts to real people and

In China, all social accounts must be associated with a phone number, and phone numbers are tied to government identities. It doesn't stop any manipulation of scores and rankings.

> then keep track of their credibility?

It is very likely China will do that too soon. I think you can already imagine the ramifications.

RL_Quine · on July 10, 2022

You can’t limit to real people. If you managed to, I would make a service where people can sign up and I’ll pay to use their account. No cost to them and they make some money from it, seems totally reasonable.

closedloop129 · on July 10, 2022

That's just another group of people with reduced credibility. At some point, the price to incentivize the next person to offer their account is more expensive than the benefits of link manipulation. Every corrupted account can be discovered because the manipulated content stands out and will be reported.

ThunderSizzle · on July 10, 2022

The internet-should-be-anonymous crowd wouldn't like that. But it's no worse than Google tracking everything A-Z.

_abox · on July 10, 2022

I'm very much part of that community but I could see verified moderators using it.

However it could just be reputation based like on Wikipedia

asah · on July 10, 2022

if this becomes popular, that same industry will recruit gangs to manipulate this new system.

source: worked in ads and search for decades, incl google.

whoknows2 · on July 10, 2022

Also let's not forget Reddit. There's a lot of bullshit on top.

zo1 · on July 10, 2022

I went back to Reddit a week back after a 1 or so year, and what I saw was just disgusting on the main feed. One click, a bit of scroll and I was watching people calling republicans terrorists and being praised/upvoted/gifted. It's not just about bullshit at the top, it's also about echo chambers and the online equivalent of mob-behavior.

lotsofpulp · on July 10, 2022

> it's also about echo chambers and the online equivalent of mob-behavior.

This particular example might be about the Republican party-wide support of echo chambers and offline mob behavior that led to an invasion of the building containing politicians certifying the vote of a newly elected leader and fueling disinformation about elections to weaken the trust and integrity of the system?

AussieWog93 · on July 10, 2022

Regardless of this particular example, there has always been a pretty strong bias towards one side of US (and Australian, for that matter) partisan politics on the front page of Reddit; with over-the-top accusations towards the Republican Party as well as the LNP both receiving thousands of upvotes despite their poor quality.

daoudc · on July 10, 2022

Good point, this is not going to be easy to get right.

dieselgate · on July 10, 2022

Good point, especially on the internet

irrational · on July 10, 2022

I use duck duck go. Recently someone showed me their screen where they were using google to do a search. I was absolutely aghast. The last time I used google when you searched for something you saw a simple text list of sites (which is how DDG still works). Instead the google results were… a disaster. You had to scroll through some much garbage before finding actual search results - a list of sites. It was like google was saying, “here, look at all this trash instead of clicking a link and going to a different site”. When did google become so bad?

unicornporn · on July 10, 2022

> I use duck duck go.

You use a proxied Bing. If you want a proxied Google (what I prefer), you can use https://startpage.com

FridgeSeal · on July 10, 2022

I believe a few years ago startpage was sold to an advertising company.

jstummbillig · on July 10, 2022

Which is also what DDG is.

_def · on July 10, 2022

Is it?

jstummbillig · on July 10, 2022

Yes? Offering ad space (and placing affiliate links) is how they make money.

Difference to Google is how they position themselves in regards to privacy, and that Google actually built a search engine. Both make their money by providing ad space.

irrational · on July 10, 2022

Where are the ads?

nemothekid · on July 10, 2022

Maybe search with DDG with your ad blocker off? The first 75% of the page was ads for me.

irrational · on July 10, 2022

Why are you using the internet without multiple ad blockers?

beebeepka · on July 10, 2022

Oh boy. I loved ixquick/startpage until 2020(?) when it suddenly stopped working and demanded I turn on JS.

I did no such thing but looked around and found the company had been sold to an advertiser/tracker. It used to work much, much better than ddg:(

YetAnotherNick · on July 10, 2022

Is DDG proxied bing? I found DDG search results to be much better than bing.

unicornporn · on July 10, 2022

Yes, it is. Bing has gotten a lot better recently. This is a good write up on indexes: https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

beej71 · on July 10, 2022

Bing still refuses to index one of my pages, telling me to follow their rules. They won't tell me what rule I'm in violation of, though, and I can't tell that I'm in violation of any of them.

And this is educational content, text only, no ads or popups, no SEO hacking. Bing's analysis tool told me only that I was missing the "lang" attribute from my HTML tag. So I added it, but of course that wasn't the issue.

I reached out to them, and they replied saying that the page didn't meet the requirements for listing, but didn't elaborate.

It certainly makes me wonder what content their broken algorithm is missing.

And it sucks because it means DDG is missing that content, too.

christophilus · on July 10, 2022

No. It’s Bing in the same way that Uber is Elastic Search. It’s built on Bing and other tools and adds, tweaks, adjusts, etc. Calling DDG a Bing proxy is somewhere between misleading and dishonest.

skyfalldev · on July 10, 2022

I believe DDG uses Bing's index but has their own ranking algorithm.

jpalomaki · on July 10, 2022

They use Bing API on the background. Likely mixing in their own content.

vineyardmike · on July 10, 2022

You can also use Whoogle if you’d rather self host.

https://github.com/benbusby/whoogle-search

Liquix · on July 10, 2022

Whoogle is best in class, but doesn't provide much benefit unless combined with a rotating VPN. It also doesn't solve the "GBY" problem, where the majority of search engines rely on Google/Bing/Yandex's indices instead of using their own.

This is especially dangerous because it propagates an illusion that there's dozens of engines to choose from. The reality is these three companies control more and more of humanity's ingress to information, censoring what they see fit for political/financial gain.

gnuj3 · on July 10, 2022

Startpage results are so bad I just thought it was a home-made search engine rather than a Google proxy. How is this possible?

unicornporn · on July 10, 2022

Funny, I find them to be identical to Google's results localized for Sweden. Are you possibly using Google Search logged in or saving cookies between closing tabs (i.e. not using Cookie AutoDelete)?

gnuj3 · on July 10, 2022

Not at all, I dont have Google account for the last 12 years or so so no idea how that is.

hrbf · on July 10, 2022

I do too. However, I am responsible for a couple of sites that Bing absolutely refuses to even index. Google has no issues with them, not for 10 years running. Those sites are effectively invisible in Bing and therefore DuckDuckGo. I’m not talking about low ranking, the entire domain is ignored.

My own personal website for example is not even listed in the search engine I use. Microsoft support see the issue but have no explanation and escalate and quote “quality requirements” for weeks now. A human review yielded positive results regarding quality. Meanwhile, I see lots of SEO spam on DDG when searching for generic technical terms.

Semaphor · on July 10, 2022

Interesting, can you give an example page that’s not listed?

hrbf · on July 10, 2022

Sure. A very simple one would be https://ipbl.herrbischoff.com, a public blocklist page referencing a resource used by a couple dozen users. The HTML doesn't get a lot simpler than that and is entirely valid markup. It's (unsurprisingly) low ranked in Google but it's there.

Not so in Bing, it's simply not there. The page exists since March 2021. Bing Webmaster Tools reads "Discovered but not crawled. URL cannot appear on Bing", giving no further reason. Also: "Last crawl attempted 01 Feb 2022 at 19:35", which means that Bing did not bother to retry for months, despite me submitting it manually on a regular basis. Clicking the "Live URL" tab results in entirely green checkmarks along with "URL can be indexed by Bing".

Another example would be my personal site: https://herrbischoff.com. Same issue. That one is listed on Google for more than 10 years.

Semaphor · on July 11, 2022

This is fascinating, thanks. Have you experimented on allowing the bing ad bot that you have blocked? If they have some kind of retaliatory non-crawling?

hrbf · on July 12, 2022

Interesting theory. But the IPBL doesn’t even have a robots.txt and a different, larger site from a German celebrity I host does have the same directives and is indexed, although incompletely.

My working theory is that Bing’s selection algorithm is biased towards large and already popular sites. In the server logs, I don’t see Bing even attempting to crawl the sites I mentioned, except requesting robots.txt and the root page. Bing appears to be excruciatingly slow to update anything but high traffic sites.

Again, Microsoft Support was unable to explain this behavior even after manual, human review found everything to be in order.

I tried deleting robots.txt entirely and got only Chinese crawlers and SEO bots, but still no Bing crawl. All organic traffic comes from blogs linking directly and Google.

hrbf · on July 13, 2022

Just received a final non-answer from Microsoft support after weeks of escalating:

—————

Thank you for your patience!

After further review, it appears that your site < http://herrbischoff.com https://ipbl.herrbischoff.com/> did not meet the standards set by Bing the last time it was crawled.

Bing constantly prioritizes the content to be indexed that will drive highest users satisfaction. Please follow Bing Webmaster Guidelines to better understand criteria for most valuable content.

TheChaplain · on July 10, 2022

I have family members who use google to navigate.

Instead of writing facebook.com in the URL-bar, they search for facebook and click the first link...

ffguuficc · on July 10, 2022

Honestly that is probably safer. Having a typo in the url could Easley give you a phishing link. However, I also have gotten fishing adds when looking websites up so it’s not cut and dry at all.

Cannabat · on July 10, 2022

Google search results and ads have been abused successfully:

https://www.abc.net.au/news/2022-06-21/scammers-using-text-m...

https://www.channelnewsasia.com/singapore/scam-bank-google-a...

I think the only solution is education.

6510 · on July 10, 2022

For a good while if one searched for a (Dutch) gov institution or business google shows you the (free to call) phone number as a clickable link but the anchor has a different link to a paid per minute redirecting service. I know plenty of people who found the weird 15-50 euro entry on their bill.

kolinko · on July 10, 2022

Oh no, it's not. Google's ads have been used to do phishing a lot. And - at least a few years ago, it was extremely difficult to report such ads.

Perhaps it has improved recently, but it used to be a plague in crypto - people getting ads for phishing sites instead of legitimate ones, losing money, and Google being unresponsive to reports.

Pakdef · on July 10, 2022

The second time, they only have to type "f" in the address bar, since they probably don't clear their history...

marak830 · on July 10, 2022

Easley, I have to ask, was that a typo or highlighting your point?

Timwi · on July 15, 2022

When I see something like that, I tend to assume that some people use speech recognition to write comments. (No idea if that was the case here.)

ffguuficc · on July 10, 2022

Haha just a mistake! Thanks for pointing it out. Sadly I can’t edit it anymore :(

orangepanda · on July 10, 2022

I do that. Mostly because I never remember if it was .com or .net or .dog or .io or whatever else cute tld

stormbrew · on July 10, 2022

Honestly I think while this seems absurd if you're relatively knowledgeable about the internet, it's really not something that should be surprising or even particularly shameful? Like, why wouldn't they if it works, is easier to remember, and makes sense to them?

The reasons this is actually potentially bad are pretty deep in the internet wonk weeds, where you get into questions of gatekeepers and provinance of information and it shouldn't be surprising most people don't care about those things: those of us who do have failed to provide them with better tools.

On some level it's a little like saying "my dad sent me an email and he didn't use pgp! Can you believe it!??"

bolasanibk · on July 10, 2022

I guarantee you that at least 60-70 out of the top 100 queries are navigational queries.

wizofaus · on July 10, 2022

Yep, seen that repeatedly among anyone who's not fully tech savvy (I.e. most people). Just the way Google/MS/Apple like it, I imagine.

dieselgate · on July 10, 2022

I use the ol search-engine-manual-url-finder-technique for my time card every week. If it ain’t broke..

rlv-dan · on July 10, 2022

Tip: When on someone elses computer, type ddg.co/yourquery in the adress field to quickly search on duckduckgo

rvz · on July 10, 2022

Who's going to finally tell them who DDG really relies on?

neoglow · on July 10, 2022

I get your point, but the fact remains that the interface is miles ahead of Google, just by not having all the garbage.

The results are fine for me in day to day usage and I find that Google will not provide me with better results if I cant find it on DDG.

Would it be better if they had their own indexer? Maybe...

dotcoma · on July 10, 2022

Brave, Kagi, Neeva and Mojeek have their own indexes, I think.

irrational · on July 10, 2022

I know where DDG gets the results from, but in all the years using it, it has never failed to find what I am looking for. Or, I’ve never needed to check google because DDG didn’t find what I was looking for.

2Gkashmiri · on July 10, 2022

ads. SEO.

i have a guy in digital marketing tell me that his friend does SEO and he does wonders with obscure keywords and shit. that friend is a freelancer and earns a good payday.

When you want to insert your brand in every fucking imaginative keyword as opposed to people "searching for something",

why does internet advertising revolve around everyone assuming every person googling something "WANTS TO BUY SOMETHING"?

RileyJames · on July 10, 2022

If every heavily SEO’d result was produced by a company that produced a directly relevant product, I don’t think we’d be as disappointed with the content.

The truely garbage content is produced as cheaply as possible (scraped, generated from a data source or generated via “ai”) to capture advertising revenue, often via sub prime advertising networks (or a number of middleman networks).

But to your point, not everyone wants to buy something, and not everyone needs to.

Much of the content out there is simply trying to capture your attention and make you available to some of the worst advertising and ad networks (read scams, lead gen, fake buttons, affiliate crap).

charcircuit · on July 10, 2022

>why does internet advertising revolve around everyone assuming every person googling something "WANTS TO BUY SOMETHING"?

Because people googling SOMETHING are more likely to buy SOMETHING than people googling SOMETHING_ELSE.

vineyardmike · on July 10, 2022

> why does internet advertising revolve around everyone assuming every person googling something "WANTS TO BUY SOMETHING"?

Rather I think it’s because everyone who buys ads has something to sell.

Search ads are “direct action”. You click a link to do something. Ads on eg. TV are more about “brand memory” - reminding you they exist. When you watch tv you’re passively taking in information, but when you’re searching you’re actively trying to click something already. It’s a better fit behaviorally.

jacooper · on July 10, 2022

I think brave search is really the best alternative search engine, its actually independent.

mattrick · on July 10, 2022

One feature that I really wish more search engines would have is the ability to blocklist certain domains, particularly ones whose results are never relevant or helpful to the query itself (Pinterest, Quora, etc). It could even be used as a factor in the site’s search rankings.

norman784 · on July 10, 2022

I think kagi does that, I use while was in beta, you also can assign a priority to sites, like normal or boost. Kagi is a paid service and doesn’t shows you ads.

kevincox · on July 10, 2022

It does. This is one of the best features.

andrelaszlo · on July 10, 2022

Yeah, each user can apply their own weight per domain: block, lower, normal, higher, pin.

It's really useful!

dazc · on July 10, 2022

> It could even be used as a factor in the site’s search rankings.

They did that for a while without considering the obvious downside - the incentive to mass blacklist your competitors.

cfiggers · on July 10, 2022

Have a look at Brave Goggles: https://search.brave.com/goggles