Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While proper description is linked already in neighbouring comment.

TLDR: in 2024 with PHP8 you still need mbstring extension and also you should be careful around UTF-8 if you do any text processing. In almost all other modern programming languages it's just works.



Anyone claiming such things doesn't understand Unicode at all.

The whole concept of having special Unicode-aware strlen or substr is nonsense.


In NodeJS for example don't you have use Buffers and special decoders to deal with UTF-8 strings? I.e it's a pain there too.


I don't think that's a pain. It's making explicit what should be explicit and the decoded string doesn't have an encoding attached (like in Ruby), it can't be in an unexpected format, it's always UTF-16. One can argue about weather UTF-16 is the best choice, but at least it's always that and always Unicode. No surprises.


No, JS strings are UTF-8:

    > '蛋糕'.substr(0,1)
    '蛋'
    > '蛋糕'.length
    2
    > Buffer.byteLength('蛋糕')
    6
You do have to be careful when working with binary data (e.g. streams) but this is expected.


They're UTF-16, and substr(), length, etc, work at the code unit level. Hence, the above isn't actually valid for all strings - any characters that are represented by codepoints between U+10000 and U+10FFFF require 2 code units [1]. For example U+10429 Deseret Small Letter Long E [2]

  > '𐐩'.substr(0, 1)
  '\ud801'
  > '𐐩'.length
  2
[1] https://en.wikipedia.org/wiki/UTF-16#Description

[2] https://codepoints.net/U+10429


TIL thanks :) Interestingly, "for of" iteration works on the whole character, so must be some magic going on under the hood.


And with that you're completely wrong, since strings in JavaScript are UTF-16.

It just so happens that your example consists of two UTF-16 codepoints.

(Node.js' Buffer uses UTF-8 by default).


One ambiguity here might be that Javascript defines strings as UTF-16, but JSON defines strings as UTF-8.


The 蛋糕 is a lie!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: