"A neural network layer is just a matrix. Why abstract that matrix and learn it?" Well, because it's not your job to figure out how to hardcode delicate string or floats that work well for a given architecture & backend.
We want developers to iterate quickly on system designs: How should we break down the task? Where do we call LMs? What should they do?
---
If you can guess the right prompts right away for each LLM, tweak them well for any complex pipeline, and rarely have to change the pipeline (and hence all prompts in it), then you probably won't need this.
That said, it turns out that (a) prompts that work well are very specific to particular LMs, large & especially small ones, (b) prompts that work well change significantly when you tweak your pipeline or your data, and (c) prompts that work well may be long and time-consuming to find.
Oh, and often the prompt that works well changes for different inputs. Thinking in terms of strings is a glaring anti-pattern.
I agree with you on all of those points - but my conclusion is different: those are the reasons it's so important to me that the prompts are not abstracted away from me!
I'm working with Llama 2 a bunch at the moment and much of the challenge is learning how to prompt it differently from how I prompt GPT-4. I'm not yet convinced that an abstraction will solve that problem for me.
> People seem to underestimate and overlook the importance of prompts.
We do this to each other as well. Being able to communicate clear, concise, and complete requests will produce better results with both humans and LLMs. What is interesting is that we can experiment with prompts against machines at a scale we cannot with other people. I'd really like to see more work towards leveraging this feature to improve our human interactions, kind of like empathy training in VR
1] when prototyping, it's useful to not have to tweak each prompt by hand as long as you can inspect them easily
2] when the system design is "final", it's important to be able to tweak any prompts or finetunes with full flexibility
But we may or may not agree on:
3] automatic optimization can basically make #2 above only very rarely needed
---
Anyway, the entire DSPy project has zero hard-coded prompts for tasks. It's all bootstrapped and validated for your logic. In case you're worried that we're doing some opinionated prompting on your behalf.
It sounds fascinating! Is there anything one could read to figure out more about how this is being done (From reading the docs by the "Teleprompter"s right)?
One time (ironically, after I learned about model free methods) I got sucked into writing a heuristic for an A* algorithm. It turned into a bottomless pit of manually tuning various combinations of rules. I learned the value of machine learning the hard way.
If prompts can be learned, then eventually it will be better to learn them than to manually tune them. However, these ideas need not be mutually exclusive. When we reject the tyranny of the “or” and we can have a prompt prior we manually tune and then update it with a learning process, right?
P.S. whoever wrote the title, I think it’s pretty silly to write “The Framework…” for anything because this presumes you have the only member of some category, which is never true!
We want developers to iterate quickly on system designs: How should we break down the task? Where do we call LMs? What should they do?
---
If you can guess the right prompts right away for each LLM, tweak them well for any complex pipeline, and rarely have to change the pipeline (and hence all prompts in it), then you probably won't need this.
That said, it turns out that (a) prompts that work well are very specific to particular LMs, large & especially small ones, (b) prompts that work well change significantly when you tweak your pipeline or your data, and (c) prompts that work well may be long and time-consuming to find.
Oh, and often the prompt that works well changes for different inputs. Thinking in terms of strings is a glaring anti-pattern.