Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
PUSH_AX
on Sept 12, 2023
|
parent
|
context
|
favorite
| on:
Fine-tune your own Llama 2 to replace GPT-3.5/4
You think they are caching? Even though one of the parameters is temperature? Can of worms, and should be reflected in the pricing if true, don't get me started if they are charging per token for cached responses.
I just don't see it.
why_only_15
on Sept 12, 2023
[–]
You can keep around the KV cache from previous generations which lowers the cost of prompts significantly.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
I just don't see it.