Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In theory today, I should be able to process <long quote from document> <specific query> and just stop after the long document and save the KV cache right?

People do this, it's called prefix caching.

There's also https://arxiv.org/abs/2506.06266 where they compress the context down to a smaller representation they call a "cartridge," and composing cartridges from different contexts seems to work reasonably well.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: