Hacker News new | past | comments | ask | show | jobs | submit login

т is a homoglyph of T—because one could mistake one for the other. They're in a visual equivalence-class. That doesn't mean that you should normalize т into T, though. Those are separate considerations.

If you were granting e.g. domain names, or usernames, you'd be able to map each character in the test string to its homoglyph equivalence-class, and then ask whether anyone has previously registered a name using that sequence of equivalence-class values. So someone's registration of "тhe" would preclude registering "the", and vice-versa; but when you normalized "тhe", you'd still get "mhe".

Of course, to use such a system properly, you'd have to keep the original registered variant of the name around and use it URL slugs and the like (even if that means resorting to punycode), rather than trying to "canonicalize" the person's provided name through a normalization algorithm. Because they have "[the equivalence class of т]he", not "mhe"; someone else has "mhe".




> т is a homoglyph of T—because one could mistake one for the other.

I believe gp is talking about the font. In some fonts (especially italic/cursive), the letter "т" looks like "m", and nothing like "T" -- so it's really hard to say with which one it's "visually equivalent".


Look at the image on https://en.wiktionary.org/wiki/%D1%82 and you will see that it is also a homoglyph of m.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: