See if the article said this, I would have agreed - fine-tuning is a tool and it should be used thoughtfully. Although I personally believe that in this funding climate it makes sense to make data collection and model training a core capability of any AI product. However that will only be available and wise for some founders.
Agreed, model training and data collection are great!
The subtle bit is just doesn't have to be for LLMs, as these are typically part of a system-of-models. E.g., we <3 RAG, and GNNs for improving your KG is fascinating. Likewise, dspy's explorations in optimizing prompts, vs LLMs, is very cool.
Yeah I would recommend sticking to RAG on naively chunked data for weekend projects by 1 person. Likewise, a consumer tool like perplexity's search engine where you minimize spend per user task or go bankrupt, same thing, do the cheap thing and move on, good enough
Once RAG projects become important and good answers matter - we work with governments, manufacturers, banks, cyber teams, etc - working through data quality, data representation, & retrieval quality helps
Note that we didn't start here: We began with naive RAG, then relevancy filtering, then agentic & neurosymbolic querying, then dynamic example prompt injection, and now are getting into cleaning up the database/kg itself
For folks doing investigative/analytics projects in this space, happy to chat about what we are doing w Louie.AI. These are more implementation details we don't normally write about.
We tried dspy and a couple others like it. They're neat and I'm happy those teams are experimenting with these frameworks. At the same time, they try to do "too much" by taking over the control flow of your code and running autotuning everywhere over it. We needed to write our own agent framework as even tools like langchain are too insecure and inefficient for being an enterprise platform, and frameworks like dspy are even more far out there.
A year+ later, the most interesting kernel of insight to us from dspy is autotuning a single prompt: it's an optimizeable model just like any other. As soon as you have an eval framework in place for your prompts, having something like dspy tune your prompts on a per-LLM basis would be very cool. I'm not sure where they are on that, it seems against the grain for their focus. We're only now reaching the point where we would see ROI on that kind of thing, it took a long time to get here.
We do run an agentic framework, so doing cross-prompt autotuning would be neat too -- especially for how the orchestrator (ex: CoT) composes with individual agents. We call this the "composition problem" and it's frustrating. However, again, dspy and friends do "too much", by trying to also be the agent framework & runtime, while we just want the autotuner.
The rest is neat but scary for most production scenarios, while a prompt autotuner can give significant lift + resilience in a predictable & maintainable way to most typical LLM apps
Again... I'm truly happy and supportive that academics are exploring a wild side of the design space. Just, as we are in the 'we ship code people rely on' side of the universe, it's hard to find scenarios where its potential benefits outweigh its costs.
Entity resolution - RAG often mixes vector & symbolic queries, and ER improves reverse indexing, which is a starting point for a lot of the symbolic ones
Identifying misinfo - Ranking & summarization based on internet data should be a lot more careful, and sometimes the controversy is the interesting part