I originally laughed at "in five minutes", but even though I do not think the article reads in five minutes, it does a surprisingly good job of covering the basics: so good job!
I do wonder if it is clear for people who are unfamiliar with Unicode? Anyone who is mostly unfamiliar with the details article covers who can say how comprehensible the article is?
I would also add a mention of the standard Unicode collation table that does a passable job for many languages at the same time (though Unicode Collation Algorithm is mentioned, which this is the default for, I think it's worth highlighting this property of most UCA implementations).
As for the article gotchas, multilingual text is even more complex when go past 5 minutes even for "simple" European scripts. Eg. in Bosnian/Croatian/Serbian in Roman/Latin alphabet, "nj" will be capitalized to "Nj" or "NJ" depending on the rest of the word — eg. "Njegoš" or "NJEGOŠ"; confusingly, Unicode also includes digraphs for both capitalization forms (the eternal tension in Unicode between encoding letters, glyphs or characters), even though they are linguistically equivalent — in practice, they are never used, which makes their inclusion even more perplexing (they are always spelled out using two characters, and there was no historical reason since none of the 8-bit encodings had them)! It will also sometimes be two distinct letters, especially in loanwords like "konjugovan" — this makes things harder when you need to collate texts since the proper order would be "konjugovan", "kontakt", "konj".
All of this is why I like to joke how Cyrillic script is technically much better for all of these languages, even though it is basically in official use only for the Serbian language — in Cyrillic, there is no conundrum in either of the above examples since nj=њ (or нј), Nj/NJ=Њ, and the order is clear: конјугован, контакт, коњ.
> I originally laughed at "in five minutes", but even though I do not think the article reads in five minutes, it does a surprisingly good job of covering the basics: so good job!
Slightly off topic, but just to riff on this a bit: maybe books and articles called "$THING in $NUMBER_OF $TIME_PERIODS" or "Learn $THING in $NUMBER_OF $TIME_PERIODS" should be retitled "$NUMBER_OF $TIME_PERIODS with $THING." It would be more accurate, not imply any sort of mastery, and, on top of that, sound a little more dignified. But, maybe it wouldn't sell as many books, so... ¯\_(ツ)_/¯.
I do wonder if it is clear for people who are unfamiliar with Unicode? Anyone who is mostly unfamiliar with the details article covers who can say how comprehensible the article is?
I would also add a mention of the standard Unicode collation table that does a passable job for many languages at the same time (though Unicode Collation Algorithm is mentioned, which this is the default for, I think it's worth highlighting this property of most UCA implementations).
As for the article gotchas, multilingual text is even more complex when go past 5 minutes even for "simple" European scripts. Eg. in Bosnian/Croatian/Serbian in Roman/Latin alphabet, "nj" will be capitalized to "Nj" or "NJ" depending on the rest of the word — eg. "Njegoš" or "NJEGOŠ"; confusingly, Unicode also includes digraphs for both capitalization forms (the eternal tension in Unicode between encoding letters, glyphs or characters), even though they are linguistically equivalent — in practice, they are never used, which makes their inclusion even more perplexing (they are always spelled out using two characters, and there was no historical reason since none of the 8-bit encodings had them)! It will also sometimes be two distinct letters, especially in loanwords like "konjugovan" — this makes things harder when you need to collate texts since the proper order would be "konjugovan", "kontakt", "konj".
All of this is why I like to joke how Cyrillic script is technically much better for all of these languages, even though it is basically in official use only for the Serbian language — in Cyrillic, there is no conundrum in either of the above examples since nj=њ (or нј), Nj/NJ=Њ, and the order is clear: конјугован, контакт, коњ.