That's what I was thinking, too. It would do some good for the people implementing this stuff to read about in-band signaling and blue boxes, for example.
They are well aware of it, which is why there's a distinction between "system" and "user" messages, for example.
The problem is that, at the end of the day, it's still a single NN processing everything. You can train it to make this distinction, but by their very nature the outcome is still probabilistic.
This is similar to how you as a human cannot avoid being influenced (one way or another, however subtly) by any text that you encounter, simply by virtue of having read it.