Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure exactly what setup you are running, in theory yes, higher temperature for both model means higher chance of overlap and thus less rejections -> faster sampling (but worse quality overall).

However, if you have higher temperature but still are operating under a top-k sampling where k is small, not sure it's going to translate to any noticeable difference, since this will make your actual distributions very much non-uniform.





This is with Together's API via OpenRouter, running DeepSeek V3 0324 and Kimi K2 0905.

I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.


Oh in that case there is definitely a top-k or top-p behind the scene, it might just not be exposed to the user as a param they can change through their API. I haven’t heard of anyone running a LLM in prod with actual pure sampling

I see. That's slightly unfortunate. In principle, increasing temperature flattens out the distribution but the ordering between different tokens' probabilities remain the same, so setting a top-k shouldn't break my test. Can't say the same for top-p though. And all of this is probably too deep into the provider's implementation details for me to make assumptions on.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: