You're right and I was imprecise. 2. is how RAG is implemented in most cases. (I...

simonw · on May 22, 2024

I'd definitely call 1 (the FTS version) RAG. It's how Bing and Google Gemini and ChatGPT Browse work - they don't have a full vector index of the Web to work with (at least as far as I know), they use the model's best guess at an appropriate FTS query instead.

bigfudge · on May 22, 2024

HYDE is a related technique. Ask the model to generate a response with no context, then use this for semantic search agains actual data and respond by summarising these documents.