Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The main limit is that you can have 128k tokens of input, but only 4k tokens of output per run. gpt4-32k lets you have up to 32k tokens of output per run. Some applications need that much output. Especially for token dense things like code and JSON.


So it's a price concern? Because you could run for 4k output 8 times to get 32K? Or does the RLHF stuff prevent you from feeding the output back in as more input and still get a decent result? The underlying transformers shouldn't care because they'll be doing that already effectively.


I'd say it's less a price concern and more a consistency of output concern. It doesn't make much sense to continue incomplete JSON like that I don't think. I need to do some more research.


you can just feed that output into another call, to have the next call continue it, since you have more than 28k extra context. The output per token is faster anyways right, so speed isn't an issue. It's just slightly more dev work (really only a couple lines of code)


How do you know it will have the same state of mind? And how much does that cost.


Because the state of mind is derived from the input tokens.


Is there a study or anything that that is guaranteed adding an incomplete assistant: response as the input and the API taking off exactly the same way on the same position?


It’s how LLMs work - they are effectively recursive at inference time, after each token is sampled, you feed it back in. You will end up with the same model state (not including noise) as if that had been the original input prompt.


LLMs sure. My question is whether it is the same in practice for LLMs behind said API. I found no official documentation that we will get exactly the same result as far as I can tell.

And no one here touched how high a multiple the cost is, so I assume its pretty high.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: