Hacker News new | past | comments | ask | show | jobs | submit | chris_f's comments login

Any specific sites? Happy to spin one of these up for you focused on Web3. These work the best when the search engine creator has domain expertise to showcase the best sources.

I could try to find some sources, but my guess at the best Web3 sites would probably miss the mark.


Yeah, that's a good point. As new ones get built, we will need a way to make them discoverable.

Also, just as a heads up, there is a link at the bottom of the search page [0] if anyone would like to build one of these search pages on any topic that might be interesting. It's just a form now bc we are working through the process.

[0] https://form.asana.com/?k=13NZtQkfNVw3CTE1jxngvg&d=119356944...


Thank you Chris. If I may add, maybe have an approach to get some community feedback on such niche search as well. This can help to narrow down the list. Great work. I'm going to use this search for my work


Good stuff! Github is one of the sources, but not specific repos. I actually think we can break out Github into individual repo sources pretty easily.



For the first example the snippet isn't correct, it just says `conda list`. The website (TutorialPoint) it links to is also wrong and useless.


Appreciate the feedback, it is helpful! The relevance tuning is a continuous work in progress.


Sounds like a good idea, but I'd like to see how is it better than googling these phrases together with pytorch keyword. Not saying it's not better, just not clear if it is.


In the broadest sense, a highly targeted search engine (like this) can provide better results because Google has to determine user search intent AND return the right results from trillions of webpages. The advantage of this search engine is that all of the users are looking for the same type of information, and the result sources can be curated to ensure high quality and relevant results.

The more targeted the topic, the harder the time Google has to provide quality signal through the noise of SEO and sheer volume of content on the web.

In addition to the above, the UI provides some cool features like allowing horizontal scrolling of sources to provider higher information density (important for discovery), and some source content can be viewed in the side pane without leaving the page.

But ultimately it would be good to hear if this approach does make it easier to find relevant and higher quality Pytorch info.


I just tried searching for: pytorch Distributed model parallel, and google fails completely to return relevant results in the first few pages, even if I put Distributed model parallel in quotes. You.com also mostly fails, but it returns a section specifically from Discuss Pytorch forums which does have relevant results, so yes, this is a good sign.


That's great! And I think the results even improve a little more if you just search 'distributed model parallel' without 'pytorch' [0]

So still some work to do, but I've personally found this type of search engine really valuable for some of my interests where I am able to curate the sites that might not normally rank, but I know are higher quality.

[0] https://you.com/niche/pytorch?q=distributed+model+parallel


Worth mentioning is the Alexandria.org project [0]. It is a non-profit search engine built on data from Common Crawl. The coverage is limited because of Common Crawl, but the relevance is decent. They also provide an API.

I believe one of the biggest impacts toward breaking up Google's monopoly on search is making them open up access to their index, even requiring Google to provide direct API search access for others to build alternative search products. They have a search API today, but it is prohibitively expensive to build on ($5/1000 calls).

I built a fairly popular search engine a couple years back, but the cost of Google's search API and increasing number of bot attacks make it difficult to reason keeping it online.

[0] https://www.alexandria.org/


I just finished reading the PaLM [0] blog post and being genuinely impressed with all the NLP advancements. This search result was a fun juxtaposition.

SERP: https://imgur.com/a/YbEtrLZ

Note: The answer 9.6076 is not wrong. It's for an imperial gallon, but it is an odd results being in the US.

[0] https://ai.googleblog.com/2022/04/pathways-language-model-pa...


Credit to them for trying some new things on the UI front, but it looks like the organic results are from Bing (like most other alt/privacy search engines). It would be interesting to learn more about how/if they plan to build their own index, or set themselves apart.

IMO, Kagi and Brave search are the two best alternative general search engines right now.

Runnaroo was pretty good as well ;-)


I think Mojeek is a better alternative "general" engine than Brave because Brave's ranking algorithm was optimized against Google SERPs (back when it was called Cliqz), making many SERPs too similar to Google's.

Kagi uses a mix of Teclis and other engines (claims to use Bing and Google) but the ability to adjust ranking yourself is its wild card. Neeva is similar, combining its own index with some ability to influence ranking.

But personally, I'm trying to reduce my use of "one engine to rule them all" and instead use specific engines for specific tasks they're good at.


Mojeek is excellent, and because they use 100% their own index they have a much higher hill to climb.

When I say "Best"for a general search engine, my definition is that it would fulfill the needs of myself and my non-technical family members. Kagi and Brave Search both do that while being different enough to not be just another Bing clone. I use Mojeek often, think it is great, and having their own index is a tremendous asset, but it doesn't quite meet that full definition yet.


An additional thing of interest. The text of the results such as heading and summary on andisearch is not the same as the other search engines. I think it displays more of each web page's content from its index.


Interesting. With search 'java lambda function equivalent' I see results different to Bing, Google and DuckDuckGo. Greater difference still for "elon musk latest news". My guess, they use Bing where they lack their own index.


Thought that it was interesting that this is a centralized platform, especially considering the coalition of companies. This type of use case has been a typical example of something that a blockchain based solution could be applicable.


Doing this in a decentralized yet private way would take 10x longer to develop.


This is really an analysis of the use of biased language in news articles, which is interesting but only one dimension of potential bias.

It is very possible to use non "charged" language, but still report a topic with a strong bias. For example, Slate is left leaning by most measures, but the below landscape chart from the study has them dead center. Maybe they are better at using neutral terms?

https://space.mit.edu/home/tegmark/phrasebias.jpg


> dead center

That's a bias that gets tugged around by the bias of extremes rather than the sensible. Sorry, I hate that term because it is often used to imply that you can average a wrong and a right and get something more correct. Often enough, one side[1] is decently close to right on an issue and the other is pretty much wrong. Picking the center of that isn't more right.

Often the quest for "neutrality" is bunk. In flat-world vs round-world, the flat-world is not with equal standing but the "neutral" seeking would often present as if the flat-world idiots have a case. While issues may have some subjectivity, we should not constantly pretend that there's an even distribution.

A neutral language meter can't ever hope to be right. It's not an analysis of what's well supported. Rather it's an analysis of assertiveness. I'm very assertive about the world being fucking round. I'm a red flag for such a bias meters.

Sorry if I'm going off but--I'll just go a head and say it--that term triggers the fuck out of me for getting undeserved validity.

[1] On a per issue basis. This is not to imply that one side is more consistently correct across issues.


Can someone explain this chart to me? What does the position on the chart indicate? Slate is left leaning, but it is correctly marked in blue.

EDIT: from the paper "Our method locates newspapers into this two-dimensional media bias landscape based only on how frequently they use certain discriminative phrases, with no human input regarding what constitutes bias. The colors and sizes of the dots were predetermined by external assessments and thus in no way influenced by our data. The positions of the dots thus suggest that the two dimensions can be interpreted as the traditional left-right bias axis and establishment bias, respectively"


It’s a projection of the NLP’s vectors into 2D space. Remember the illustrations for the king - man = queen example for word embedding? They also often used a 2D space. You can sometimes, but rarely Intuit a sense for these dimensions, but they don’t come with any natural definition or unit.


I still don’t get it. Is the chart supposed to show axes in addition to the left/right and pro-establishment/critical, currently represented by colors and sizes? How do the “lack of human input” and “external assessments” fit into the explanation?


As an exercise I once went through the process of manually requesting my information removed from most of the top brokers.

It would be difficult to automate because the opt-out processes usually aren't straightforward like unsubscribing from an email list. Many sites make it purposely difficult and involve going through multiple steps, providing verification like a drivers license, and email confirmations.

Here is a really good check list of different brokers: https://inteltechniques.com/data/workbook.pdf


And youre not worried that all youve done is fill in the gaps of their data with your drivers license, etc?

"Delete" is a boolean tag in a databse that never actually gets deleted.


They key is to only submit information that they already have, not anything new. For documents that need to be uploaded in some cases, the options are either to use a heavily redacted real document with everything blacked out except for the essential info, or just upload a random file because most of the time no one checks and it is just a required field to submit a form.

You are correct it is not actually "deleted", but it will stop your information from showing up on the website.

IMO, the "best" free people search is https://www.truepeoplesearch.com/. Their opt-out process is also pretty easy.


And if your information is truely deleted, then when they come across your information again, they'll have no record of you, so they'll just fill in the missing gaps.

For them not to share your information, they need to know what information not to share.


Any online process that endeavors to be secure will be indistinguishable from a malicious system that tries to make your task purposefully difficult.

I just got refused by a govt site recently because of the six different forms to info they wanted in order to sign in again, one of them wasn't accepted. Namely my cell phone number was prepaid which apparently isn't acceptable anymore. So now I need to mail in forms and wait for return mail taking weeks.


I also tried to do this manually using the same resources from Bazzell and eventually just got tired of dealing with it and paid for Deleteme. To their credit, they did a pretty solid job removing my public data from data broker sites.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: