I'm still thinking about how RAG being conceptually simple and easy to implement...

mdp2021 · 2025-07-20T17:27:30 1753032450

Why would be a proper documents-at-hand based inquiry be «simple».

Information is at paragraph #1234 of book B456; that paragraph acquires special meaning in light of its neighbours, its chapter, the book. Further information is in other paragraphs of other books. You can possibly encode with some "strong" compression information (data), but not insight. The information that a query may point to can be a big cloud of fuzzy concepts. What do you input, how? How big should that input be? "How much" of the past reflection does the Doctor use to build a judgement?

RAG seems simple because it has simpler cases ("What is the main export of Bolivia").

rybosome · 2025-07-20T19:21:20 1753039280

Well, even if we assume for a moment that we aren’t talking about non-public data…

Then RAG which serves up knowledge already in the model’s pretraining data is still useful, because it primes the model for the specific context with which you want to engage it. I maybe can see what you are saying, like why can’t the model just do a good job without being re-reminded? But even in that sense, any intelligence, artificial or otherwise, will do better given more context.

And that ignores the reality of data outside the model’s pretraining corpus, like every single business’ internal data.

Mars008 · 2025-07-23T21:56:40 1753307800

It still makes sense to use external data storage for smaller local models. They just can't hold that much.

Mars008 · 2025-07-23T21:54:23 1753307663

One problem is in datasets which use RAG. Training foundational model requires a lot of samples, and there aren't many. The only option is to use other models to generate, so called distillation.

BTW, RAG is similar to web search. Models can do it. Web server for RAG can be implemented.

bavell · 2025-07-20T13:37:47 1753018667

RAG is a prompting technique, how could they possibly incorporate it into the pre training?

maleldil · 2025-07-20T14:04:39 1753020279

CoT is a prompting technique too, and it's been incorporated.

bavell · 2025-07-20T14:45:36 1753022736

IIUC, CoT is "incorporated" into training by just providing better quality training data which steers the model towards "thinking" more deeply in its responses. But at the end of the day, it's still just regular pre training.

RAG - Retrieval augmented generation - how can the retrieval be done during training? RAG will always remain external to the model. The whole point is that you can augment the model by injecting relevant context into the prompt at inference time, bringing your own proprietary/domain-specific data.

impossiblefork · 2025-07-20T16:41:00 1753029660

These things with <think> and </think> tokens are actually trained using RL, so it's not like GSM8k or something like that where you just train on some reasoning.

It's actually like QuietSTaR but with a focus on a big thought in the beginning and with more sophisticated RL than just REINFORCE (QuietSTaR uses REINFORCE).

bsenftner · 2025-07-20T15:02:05 1753023725

Who says "during training"? RAG could be built into the functionality of the LLM directly - give it the documents you want it to incorporate, and it ingests them as a temp mini-fine tune. That would work just fine.

bsenftner · 2025-07-20T15:04:09 1753023849

The same way developers incorporate it now. Why are you thinking "pre-training", this is a feature of the deployed model: it ingests documents and generates a mini-fine tune right then.