Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You should post this to Show HN. Also you have a typo on your README ("characgters")


The one built-in to Python will get you most of the way there:

    >>> import unicodedata
    >>> unicodedata.normalize('NFKD', '𝓗℮𝐥1೦𝗵𝗲𝗹𝗹𝗼')
    'H℮l1೦hello'
Obviously it isn't going to remap leetspeak characters like 1 -> l but it covers a lot of cases.


Obviously you're saying it doesn't cover everything, but a big thing it's not going to catch beyond leetspeak-type situations is the kinds of thing you (used to) see in internationalized domain spoofing: legitimate non-Latin-script letters that just look the same or nearly the same.

NFKC/NFKD will handle "this is another form of the Latin letter A" type stuff but not "Cyrillic A looks like Latin A."


Thanks, I've fixed the typo! It was such a simple project, hardly seems worthy of a "Show HN".


I've seen crazier things get to #1


Test for the library: would it catch that that typo still refers to characters?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: