Yeah, then someone has to create or find that whole table and make. The initial ...

jffry · on Sept 28, 2021

The good news is that the Unicode consortium has a report on this issue, and the tables already exist for normalization and mapping of confusables to their ASCII lookalikes: https://www.unicode.org/reports/tr39/

Alir3z4 · on Sept 28, 2021

Oh, that's nice.

I guess I can use that next time time to work on the data cleaning for that model.

Thanks.

wanderingstan · on Sept 28, 2021

As I commented above:

I built a Python library for finding strings obfuscated this way. Was critical when moderating our telegram channel before an ICO. https://github.com/wanderingstan/Confusables E.g. "𝓗℮𝐥1೦" would match "Hello"