Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you can identify text written with mixed glyphs just ban it outright. Normal users don't use text like this, the pure binary presence of such "homomorphic" text at all is probably a better signal for spam than whatever your neural net when running it after normalization.


> Normal users don't use text like this

I think that depends on the users. People copying and pasting bits of text that was in English or another common languageโ€” think documentation, code, news articles, tweets, etc.โ€” with a different character set could be problematic.

Also, ๐’ฎโ„ด๐“‚โ„ฏ ๐’œ๐“…๐“…๐“ˆ marketed as "๐”ฝ๐• ๐•Ÿ๐•ฅ๐•ค ๐•—๐• ๐•ฃ ๐•ค๐• ๐•”๐•’๐• ๐•ž๐•–๐••๐•š๐•’" would be โ„ญ๐”ž๐”ฒ๐”ค๐”ฅ๐”ฑ ๐”ฒ๐”ญ ๐”ฆ๐”ซ ๐”ฑ๐”ฅ๐”ฆ๐”ฐ. (math symbols) A user base with young people getting bounced or shadow banned for trying to express themselves or distinguish themselves from their peers would be like เฒ _เฒ  (Kannada letter ttha)

I think targeting the language they're using is a better bet.

ยฏ\_(ใƒ„)_/ยฏ (Hirigana letter tsu)


I can especially echo the "social media fonts" trend. They're quite popular on certain Discord guilds at least.

Minor nitpick, but ใƒ„ is the katakana tsu.


Oh, fact. Not a Japanese speaker. (or reader)


> pasting bits of text that was in English or another common language

If they use many (maybe three? four? or more) character sets in the same post, or different character sets in any single word, then that'd be highly suspicious?

Whilst still letting people copy paste from another language

Special case needed for the shoulder shrug with an Hirigana letter tsu I mean katakana tsu


I've noticed much more usage of alternative Unicode ranges for numbers/letters in email subjects lately to make marketing messages stand out, too (in addition to emoji of course), though I wouldn't necessarily mind banning that...


Bizarrely, I'm seeing recruiters using this on LinkedIn.


huh. For any specific purpose? Does it seem like they avoiding paying for recruiter accounts or something by evading algorithms designed to detect their activity, or is it just for the heck of it?


It's following some business wank advice to stand out from the crowd like printing your resume on A3 card stock.


Ah. Sounds like something one of those pickup artists would say if they blindly pivoted into marketing.


> Normal users don't use text like this

Sounds like you live in a filter bubble.

(โ•ฏยฐโ–กยฐ)โ•ฏ๏ธต โ”ปโ”โ”ป

(๏พ‰โ—•ใƒฎโ—•)๏พ‰*:๏ฝฅ๏พŸ


I know right? There are so many times when I've wanted to use something like box drawing unicode characters (cp437) to explain a complicated concept on hacker news, but alas I couldn't, due widespread computer fraud and abuse. How are we going to build a more inclusive internet that serves the interests all ALL people around the world, regardless of native language, if the bad guys are forcing administrators to ban unicode? (โ•ฏยฐโ–กยฐ)โ•ฏ๏ธต ฬฒโ”ปฬฒโ”ฬฒโ”ป


> Normal users don't use text like this

They kinda do. Check out the shrug "emoji", table flip, and so forth. Then there's the meme of adding text above and below by abusing Unicode's "super" and "sub" modifications.

You could block it to only ever represent ASCII, but then you've knocked out the ability to expand internationally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: