PEP-393 is a stupid compromise. They couldn't choose between UCS-2 and UCS-4, so they are using both. They are wasting tons of CPU cycles converting between them and single character outside of range doubles the size of string.
I don't fully understand the use case for extracting codepoints from strings, but they could have just added Java-like: codePoints and keep returning code units from old methods. This is CPU and memory efficient and 100% backwards compatible.
I think the problem is the same could have been done in Python 2 (with UTF-8) that would mean less reasons for Python 3.
I don't fully understand the use case for extracting codepoints from strings, but they could have just added Java-like: codePoints and keep returning code units from old methods. This is CPU and memory efficient and 100% backwards compatible.
I think the problem is the same could have been done in Python 2 (with UTF-8) that would mean less reasons for Python 3.