The parent's description isn't quite correct. It's kinda sorta describing the im...

The parent's description isn't quite correct. It's kinda sorta describing the implementation; RAG is often implemented via embeddings. In practice, you generally get better results with a mix of vector and, e.g., TF-IDF.

An example of RAG could be: you have a great LLM that was trained at the end of 2023. You want to ask it about something that happened in 2024. You're out of luck.

If you were using RAG, then that LLM would still be useful. You could ask it

> "When does the tiktok ban take effect?"

Your question would be converted to an embedding, and then compared against a database of other embeddings, generated from a corpus of up-to-date information and useful resources (wikipedia, news, etc).

Hopefully it finds a detailed article on the tiktok ban. The input to the LLM could then be something like:

> CONTEXT: <the text of the article>

> USER: When does the tiktok ban take effect?

The data retrieved by the search process allows for relevant in-context learning.

You have augmented the generation of an LLM by retrieving a relevant document.