that's fair - i have some app ideas for which i would like control over prefix c...

that's fair - i have some app ideas for which i would like control over prefix caching. for example you may want to prompt cache entire chunks of enterprise data that don't change too often. the whole RAG application would be built over this concept - paying per hour for caching is sensible here.

>With OpenAI I don't have to do any planning or optimistic guessing at all: if my app gets a spike in traffic the caching kicks in automatically and saves me money.

i think these are completely different use cases. is this not different just from having a redis sitting in front of the LLM provider?

fundamentally i feel like prompt caching is something i want to control and not have happen automatically; i want to use information i have over my (future) access patterns to save costs. for instance i might prompt cache a whole PDF and ask multiple questions. if i choose to prompt cache the PDF, i can save a non trivial amount of tokens processed. how can OpenAI's automatic approach help me here?