Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DNS is generally restricted to a subset of the ASCII character set. While this isn't a strict limitation of DNS, there are enough DNS servers, clients, and applications using DNS that break on non-ASCII sequences that it's a de-facto standard.

In place of storing Unicode inside DNS, Unicode sequences outside RFC 952 (ASCII alphanumerics, case-insensitive, along with '-' and '.' characters) are encoded into RFC 952 compatible hostnames using Punycode, and stored thus in DNS.

Here in HN comments, Unicode is just embedded as actual UTF-8, no strange DNS encoding needed. The hostname, however, is actually xn--gckvb8fzb.com, hence why it's displayed as such.

(your browser will automatically convert from whatever encoding to Punycode where appropriate, so a link like https://xn--gckvb8fzb.com (should) work correctly, but the actual hostname lookup performed on the wire is for xn--gckvb8fzb.com)

Edit: well, HN also converts my Unicode hyperlink into the punycode equivalent. Interesting. It preserves the original encoding when I go back to edit the comment, but displays the Punycode form everywhere else. This gives credence to the idea this is intentional to avoid homoglyph attacks, as csnover states.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: