Hacker News new | past | comments | ask | show | jobs | submit login

Except the fix seems to be simply to show the punycode URL.

That's not a fix, that's a workaround.

EDIT: This led me to read up on how various browsers handle non-ASCII letters which in turn helped me discover that apparently no browser supports the German sharp-s ("ß") which gets auto-expanded to "ss" although domains containing the sharp-s can be registered separately from "ss" domains -- effectively allowing people to register domains that can't be accessed in any browser without explicitly using the unreadable punycode representation.

EDIT2: It seems the fix is more fine-tuned than just showing punycode for everything. So it's still a workaround (punycode URLs are not fit for human consumption so this still actively punishes confusing domains even if they're not intentionally malicious) but it affects fewer domains than I initially feared.




It was already fixed back when domain names had to be plain ASCII.

It was West-centric, yes, but it allowed for a unique and legible ASCII identifiers. And encouraged non-ASCII languages to create a unique (or, mostly-unique) Latin representation of their scripts — which is, in general, a good thing. It encouraged unification, using ASCII as the common ground.

Allowing for Unicode characters opened a new Pandora box, creating a situation that is unsolvable — either we keep the new names, making almost every string of characters potentially ambiguous, or we return to the state where ASCII-only names are the only ones usable.

Also, differentiating between ASCII and non-ASCII names doesn't solve the thing. Imagine what if the legitimate address is already in a non-ASCII script.


In what universe is ASCII "common ground"? And in what universe is a few scammers here and there "pandora's box"?

Some people in this threat seem almost eager to throw out any attempt at respecting cultures other than their own using the earliest convenient excuse.


> In what universe is ASCII "common ground"?

Excluding EBCDIC, which has the same characters, can you name a major character set that doesn't start with a carbon copy of ASCII? Shift JIS starts with ASCII. Big5 starts with ASCII. Every code page starts with ASCII. Unicode, of course, starts with ASCII. Look at just about any (physical) keyboard for any language and it will support ASCII.


I can't think of a real fix though; you'd have to disambiguate the Unicode itself for that.


What would you consider a proper fix?


Unless they did something ridiculously clever, they just made IDN domains unusable. That means legitimate IDN domains are as affected as malicious ones, punishing non-ASCII languages.

A proper fix would keep the domain name human-readable but differentiate between the ASCII and homoglyph versions.

How? Not my job to figure that out. If you want a random idea: the homoglyphs could be rendered differently (i.e. make the font disambiguate them). That's probably not a perfect solution but I'm not getting paid to do this.


The fix is https://chromium.googlesource.com/chromium/src/+/08cb718ba7c... :

> Block a label made entirely of Latin-look-alike Cyrillic letters when the TLD is not an IDN (i.e. this check is ON only for TLDs like 'com', 'net', 'uk', but not applied for IDN TLDs like рф.

That's neither "ridiculously clever", nor it will make (non-nefarious) IDN domains ununsable.


Except that this assumes there are no legitimate IDN domains on non-IDN TLDs. Considering how few IDN TLDs there are, I would wager that most IDN domains don't live on these TLDs.

However it seems they don't flat out block all IDN domains but only those containing the homoglyphs. IUIC they also don't block domains containing Cyrillic homoglyphs alongside other Cyrillic characters.

This seems somewhat reasonable. I still think rendering Cyrillic in a way that makes alphabet mismatches more obvious would be a better and more future-proof solution.


That fix is an improvement, but in general I think it is better to whitelist stuff instead. Unicode is huge and complicated.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: