Having only a basic knowledge of how GPT works under the hood - is it not computationally expensive to prepend these instructions to every single prompt given? I mean, is there a way to build the model with these instructions already "built in" somehow?
It is expensive, yes. Fine-tuning is a way to encode instructions without having to resubmit them every time. You also have to resubmit _past iterations_, such that the agent has “memory”, so that’s also quite wasteful
Openai is allegedly launching some big changes nov 6 that’ll make that less wasteful, but I don’t think there’s a ton of info out there on what exactly that’ll be yet
Not really. Most of it can be cached. And prompt processing is quite fast anyway. See vllm for an open source implementation that has most optimizations needed to serve many users.
Yes, you finetune the model on your example conversations, and the probability of the model replying in the style of your example conversation increases.
You'll need to feed about 1000 to 100000 example conversations covering various styles of input and output to have a firm effect, though, and that's not cheap.