See if the article said this, I would have agreed - fine-tuning is a tool and it...

lmeyerov · on June 1, 2024

Agreed, model training and data collection are great!

The subtle bit is just doesn't have to be for LLMs, as these are typically part of a system-of-models. E.g., we <3 RAG, and GNNs for improving your KG is fascinating. Likewise, dspy's explorations in optimizing prompts, vs LLMs, is very cool.

solidasparagus · on June 2, 2024

> we <3 RAG, and GNNs for improving your KG is fascinating

Oh man I am so torn between this being a fantastic idea and this being "building a better slide-rule in the age of the computer".

dspy is definitely a project I want to dig into more

lmeyerov · on June 2, 2024

Yeah I would recommend sticking to RAG on naively chunked data for weekend projects by 1 person. Likewise, a consumer tool like perplexity's search engine where you minimize spend per user task or go bankrupt, same thing, do the cheap thing and move on, good enough

Once RAG projects become important and good answers matter - we work with governments, manufacturers, banks, cyber teams, etc - working through data quality, data representation, & retrieval quality helps

Note that we didn't start here: We began with naive RAG, then relevancy filtering, then agentic & neurosymbolic querying, then dynamic example prompt injection, and now are getting into cleaning up the database/kg itself

For folks doing investigative/analytics projects in this space, happy to chat about what we are doing w Louie.AI. These are more implementation details we don't normally write about.

qeternity · on June 2, 2024

Have you actually used DSPy? I still can't figure out what it's useful for beyond optimizing basic few shot prompts.

lmeyerov · on June 3, 2024

We tried dspy and a couple others like it. They're neat and I'm happy those teams are experimenting with these frameworks. At the same time, they try to do "too much" by taking over the control flow of your code and running autotuning everywhere over it. We needed to write our own agent framework as even tools like langchain are too insecure and inefficient for being an enterprise platform, and frameworks like dspy are even more far out there.

A year+ later, the most interesting kernel of insight to us from dspy is autotuning a single prompt: it's an optimizeable model just like any other. As soon as you have an eval framework in place for your prompts, having something like dspy tune your prompts on a per-LLM basis would be very cool. I'm not sure where they are on that, it seems against the grain for their focus. We're only now reaching the point where we would see ROI on that kind of thing, it took a long time to get here.

We do run an agentic framework, so doing cross-prompt autotuning would be neat too -- especially for how the orchestrator (ex: CoT) composes with individual agents. We call this the "composition problem" and it's frustrating. However, again, dspy and friends do "too much", by trying to also be the agent framework & runtime, while we just want the autotuner.

qeternity · on June 4, 2024

It’s funny: I found the optimizer (which you could quite easily rip out from DSPy) to be the most underwhelming part of the equation.

lmeyerov · on June 5, 2024

The rest is neat but scary for most production scenarios, while a prompt autotuner can give significant lift + resilience in a predictable & maintainable way to most typical LLM apps

Again... I'm truly happy and supportive that academics are exploring a wild side of the design space. Just, as we are in the 'we ship code people rely on' side of the universe, it's hard to find scenarios where its potential benefits outweigh its costs.

tarasglek · on June 2, 2024

Can you give a concrete example of GNNs helping?

lmeyerov · on June 2, 2024

Entity resolution - RAG often mixes vector & symbolic queries, and ER improves reverse indexing, which is a starting point for a lot of the symbolic ones

Identifying misinfo - Ranking & summarization based on internet data should be a lot more careful, and sometimes the controversy is the interesting part

For both, GNNs are generally SOTA