Should we be checking in our prompt history into version control as a kind of source code or requirements spec? Seems prompt revision and improvement would be valuable to keep a history of.
It's not even that. Those showcasing the power of LLMs to code up a whole project should be disclosing their prompts. Otherwise it's quite meaningless.
The responses to those prompts are not independently reproducible without the weights, the seed and software used to run the LLM, and in many cases, none of those are available.
So that's all part of the "source code" for projects like this.
It's amusing that on one hand, there's been a push for "reproduceable builds", where we try to make sure that for some set of input (source code files, configuration, libraries), that we can get an identical output. On the other hand, we have what we see here where, without a huge amount of external context, no two "builds" will ever be identical.
Unless the ai model is being run at build time to generate inputs to the build I disagree that the model, its prompts or its weights and seeds constitute part of “the build” for reproducibility purposes. Static source files generated by an AI and committed to source control are indistinguishable from a build perspective by source files generated by a human and committed to that same source control. We (reasonably) dont consider the IDE, the auto-complete tools or the reference manuals consulted to be requirements for a “reproducible” build and I don’t think the AI models or prompts are any different in that respect. They might be interesting pieces of information, and they might be useful for documenting intent of a given piece of code but if you can take the same source files and run them through the same compile process and get the same outputs, that’s “reproducible”.
Should we be checking in our prompt history into version control as a kind of source code or requirements spec? Seems prompt revision and improvement would be valuable to keep a history of.