I wonder if this will survive distillation. I vaguely recall that most open models answer “ I am chat gpt” when asked who they are, as they’re heavily trained on openai outputs. If the version of chatgpt used to generate the training data had a watermark, a sufficiently powerful function approximator would just learn the watermark.