Why not? If it was trained where some subset of the input tokens are always instructions and another subset are always language data wouldn't it have a clear separation?
Because that isn't how it's trained. The model ingests and tokenized documents. They're not labeled. The content is just the content. (This is why it can't tell instructions from other content, nor facts from untruths.)
These kind of models get better when a human leans on them by rewarding some kinds of outputs and punishing some others, giving them higher or lower weights. But you have to have the outputs to make those judgements. You have to see the thing fail to tell it to "stop doing that." It's not inherent in the original content.