I’ve actually found the opposite. At work, we went from a fine-tuned model to a RAG system for internal and external documentation and a generic coding-focused model for code.
Fine tuning against in-house code seems like a small gain over a base model and search. It’s unlikely your code is unique and special and big enough that it’s hard to get results from a base model. You’ll be pinned to a certain version of a certain model, and you won’t be able to upgrade to future models nearly as quickly. Of course, you’re also fighting time again on each commit changing the code unless you continually fine tune it.
A RAG model might still struggle with a super vague question like “where does the foo cal bar with bax set” but it’s unlikely that this would work for fine tuning as well. This is where static code search by symbols really should be used.
There are frameworks for graph-based RAG that mix both approach. One LLM encodes info as a knowledge graph, gradually building up an ontology. Another LLM is used to query this knowledge graph by emitting speculative queries. As the database grows, the second LLM is fine-tuned again and again with exemple queries using the ontology the first LLM came up with.
RAG definitely is helpful! Fine-tuning imo is extremely powerful but it's still relatively alchemy - technically gpt4, Claude any large model is a finetune of a base model! Reasoning finetuning is also very powerful!
Tbh the hardest part is the lifecycle - ie new data, updating, serving etc - that seems to be the biggest issue
Is anyone having success with iteratively feeding chunks of code (or other documents) to LLM for search? I understand 'haystack' issues with LLMs are quite bad, but RAG is quite bad too and a lot of that haystack research seems to be with feeding very large contexts in.
Well, why not both? If you've already got a tuned model why not use RAG on that to get even better results? It already knows the big picture, it just needs the details so it doesn't have to hallucinate them.
Fine tuning against in-house code seems like a small gain over a base model and search. It’s unlikely your code is unique and special and big enough that it’s hard to get results from a base model. You’ll be pinned to a certain version of a certain model, and you won’t be able to upgrade to future models nearly as quickly. Of course, you’re also fighting time again on each commit changing the code unless you continually fine tune it.
A RAG model might still struggle with a super vague question like “where does the foo cal bar with bax set” but it’s unlikely that this would work for fine tuning as well. This is where static code search by symbols really should be used.