You think they are caching? Even though one of the parameters is temperature? Ca... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		PUSH_AX on Sept 12, 2023 \| parent \| context \| favorite \| on: Fine-tune your own Llama 2 to replace GPT-3.5/4 You think they are caching? Even though one of the parameters is temperature? Can of worms, and should be reflected in the pricing if true, don't get me started if they are charging per token for cached responses. I just don't see it.

why_only_15 on Sept 12, 2023 [–]

You can keep around the KV cache from previous generations which lowers the cost of prompts significantly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact