Hacker News new | past | comments | ask | show | jobs | submit login

Then watermark the output and say if they wrote it. Between a binary classifier in the age of adversarial training, and any level of watermarking, you’d be able to say which minor version printed it.



I don't think it's possible to watermark AI generated text in a way that can't be easily removed by someone who simply switches a word around or adds a typo.


Spot catches the people who can beat OpenAI on non-trivial stenography: sophisticated actors aren’t what this is about catching. They’re going to get away with some level of abuse no matter what. APTs? They can afford their own LLM programs just fine: some of them have credible quantum computing programs.

But a lot of propaganda is going to take place at the grassroots level by actors who can’t beat OpenAI, even one in decline, at breaking both watermarks and an adversarial model.

But the grand finale is of course, at this point how has OpenAI behaved like anything other than an APT itself. It’s the friendly, plucky underdog charity that’s now manipulating the process on making things illegal without involving congress.

That’s exactly how advanced actors operate: look at the xz thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: