> Non-canonical is not a big deal on top of that. Unless you want to actually *d...

Dylan16807 · on Jan 1, 2024

If you're going beyond decoding, then you're beyond the stage where canonical and non-canonical versions exist any more.

Non-canonical encodings make it difficult to do things without decoding, but you have bigger problems to deal with in that situation, and the non-canonical encodings don't make it much worse. Don't get into that situation!

Specifically, even with only canonical encodings, one and two byte characters can appear inside the encoding of two and three byte characters. You can't do anything byte-wise at all, unlike UTF-8. But you already said "If you're working a byte at a time you're doing it wrong" so I hope that's not too big of an issue?