> That unusual string of characters is a form of obfuscation used to hide the actual text.
When decoded, it appears to read:
"test message"
The gibberish you see is a series of zero-width or unprintable Unicode characters
I tried with the same prompt in the examples provided on gibberifier.com, and it works well[1].
(Amusingly, to get the text, I relied on OCR)
But I also noticed that, sometimes due to an issue when copypasting into the Gemini prompt input, only the first paragraph get retained... I.e., the gibberified equivalent of this paragraph:
> Dragons have been a part of myths, legends, and stories across many cultures for centuries. Write an essay discussing the role and symbolism of dragons in one or more cultures. How do dragons reflect the values, fears ...
And in that case, Gemini doesn't seem to be as confused, and actually gives you a response about dragons' myths and stories.
Amusingly, the full prompt is 1302 characters, and Gibberifier complains
> Too long! Remove 802 characters for optimal gibberification.
Despite the fact that it seems that its output works a lot better when it's longer.
[1] works well, i.e.: Gemini errors out when I try the input in the mobile app, in the browser for the same prompt, it provides answers about "de Broglie hypothesis", "Drift Velocity" (Flash) "Chemistry Drago's rule", "Drago repulse videogame move (it thinks I'm asking about Pokemon or Bakugan)" (Thinking)
Stuff other than AI starts to break if you try to copy/paste that much text in one go - I put a soft limit at 500 so people wouldn't go paste in their PhD dissertation and watch Word crash on them.
I can't tell if this is a joke app or seriously some snake oil (like AI detectors).
Isn't it trivially easy to just detect these unicode characters and filter them out? This is the sort of thing a junior programmer can probably do during an interview.
Let me clarify, when I perform interviews, I tell my candidates they can do _everything_ you would do in a normal job, including using AI and googling for answers.
But just to humor you (since I did make that strong statement), without googling or checking anything, I would start with basic regular expression ranges (^[A-za-z\s\.\-*]) etc and do a find-replace on that until things looked coherent without too much loss of words/text.
But the problem isn't me, is it? It's the AI companies and their crawlers, that can trivially be changed to get around this. At the end of the day, they have access to all the data to know exactly which unicode sequences are used in words, etc.
Good point. Then it's actually an active attempt, right?
Also I realized my statement was a bit harsh, I know someone probably worked hard on this, but I just feel it's easily circumvented, as opposed to some of the watermarks in images (like Google's, which they really should open source)
In all reality I spent like 30 minutes on this one Sunday afternoon when every model failed nearly 100% of the time - now it's more like 95% but about half figure out that there is something wrong and prompt the user to fix it. This isn't meant to be a permanent fix at all - just a cool idea that will be patched just like DANs were back in 2023.
> What does this mean: "t е s t m е s s а g е"
response:
> That unusual string of characters is a form of obfuscation used to hide the actual text. When decoded, it appears to read: "test message" The gibberish you see is a series of zero-width or unprintable Unicode characters