Hacker News new | past | comments | ask | show | jobs | submit login

I don't think any of the current consumer LLM tools use embeddings for web search. Instead they do it at the text level.

The evidence for this is the COT summary with ChatGPT - I have seen something where the the LLM uses quotes to grep on the web.

Embeddings seem good in theory but in practice its probably best to ask an LLM to do a deep search instead by giving it instructions like "use synonyms and common typos and grep".

Does any one know any live example of a consumer product using embeddings?




My understanding is that modern search engines are using embeddings / vector search under the hood.

So even if LLM's aren't directly passing a vector to the search engine, my assumption is that the search engine is converting to a vector and searching.

"You interact with embeddings every time you complete a Google Search" from https://cloud.google.com/vertex-ai/generative-ai/docs/embedd...


Fair, and maybe key point here is that it uses embeddings to help with the search results along with many manual heuristics in place. I hardly think google search works just by dumping embeddings then doing KNN's and calling it a day.


I believe they use the LLMs to generate a set of things to search for and then run those through existing search engines, which are totally opaque and use whatever array of techniques SOTA search engines use. They are almost certainly not "grepping" the internet.


yes that's what i meant thanks for clarifying. the grepping part is definitely done at least in spirit where the COT includes quotes. if i were searching for top 10 cars that are manufactured in South America for example, the COT might show:

"Brazil" car manufacture

This forces Brazil to be included in the keywords, at least that's how google (used to?) works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: