Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you know that your search queries will be actual questions (like in the example you listed), you can possibly use the HyDE[0] to create a hypothetical answer which will usually have an embedding that's closer to the RAG chunks you are looking for.

It has the downside that an LLM (rather than just a embedding model) is used in the query path, but it has helped me multiple times in the past to strongly reduce problems with RAG like the ones you outlined, where it likes to latch onto individual words.

[0]: https://arxiv.org/abs/2212.10496



Thanks, sounds interesting, not-dissimilar from some of the query expansion techniques. But in my case (open source, zero budget) I'm doing (slow) CPU inference, so an LLM in the query chain isn't really viable. As it is there is a near-instant "Source: [url]" returned by the vector search, followed by the LLM-generated "answer" (quite some time) later. So I think next steps will be "traditional" techniques such as query re-ranking and hybrid search, in line with the original "Build a search engine, not a vector DB" article.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: