The other day I was talking to Grok, and then suddenly it started outputting corrupt tokens, after which it outputted the entire system prompt. I didn't ask for it.
There truly are a million ways for LLMs to leak their system prompt.
I didn't save the conversation but one of the things that stood out was a long list of bullets saying that Grok doesn't know anything about x/AI pricing or product details, tell user to go x/AI website rather than making things up. This section seems to be longer than the section that defines what Grok is.
There truly are a million ways for LLMs to leak their system prompt.