Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Prompt caching is functionally identical to snapshotting the model after it processed the prompt. And you need the KV cache for inference in any case so it doesn't even cost extra memory to keep it around, if every single inference task is going to have the same prompt suffix.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: