You don't have to use vector search to implement RAG. You can use other search mechanisms instead or as well - whatever it takes to populate the context with the material most likely to help answer the user's question.
Two common alternatives to vector search:
1. Ask the LLM to identify key terms in the user's question and use those terms with a regular full-text search engine
2. If the content you are answering questions about is short enough - an employee handbook for example - just jam the whole thing in the context. Claude 3 supports 200,000 tokens and Gemini Pro 1.5 supports a million so this can actually be pretty effective.
I'd definitely call 1 (the FTS version) RAG. It's how Bing and Google Gemini and ChatGPT Browse work - they don't have a full vector index of the Web to work with (at least as far as I know), they use the model's best guess at an appropriate FTS query instead.
HYDE is a related technique. Ask the model to generate a response with no context, then use this for semantic search agains actual data and respond by summarising these documents.
Two common alternatives to vector search:
1. Ask the LLM to identify key terms in the user's question and use those terms with a regular full-text search engine
2. If the content you are answering questions about is short enough - an employee handbook for example - just jam the whole thing in the context. Claude 3 supports 200,000 tokens and Gemini Pro 1.5 supports a million so this can actually be pretty effective.