you can just feed that output into another call, to have the next call continue ...

DelightOne · on June 16, 2024

How do you know it will have the same state of mind? And how much does that cost.

jhgg · on June 16, 2024

Because the state of mind is derived from the input tokens.

DelightOne · on June 16, 2024

Is there a study or anything that that is guaranteed adding an incomplete assistant: response as the input and the API taking off exactly the same way on the same position?

sshumaker · on June 16, 2024

It’s how LLMs work - they are effectively recursive at inference time, after each token is sampled, you feed it back in. You will end up with the same model state (not including noise) as if that had been the original input prompt.

DelightOne · on June 16, 2024

LLMs sure. My question is whether it is the same in practice for LLMs behind said API. I found no official documentation that we will get exactly the same result as far as I can tell.

And no one here touched how high a multiple the cost is, so I assume its pretty high.