> how you can't "alphabetize" language names Not so, this sort order has been st...

adastra22 · 2025-05-19T15:06:28 1747667188

Sort rules are different in different locales.

vikingerik · 2025-05-19T15:47:57 1747669677

It's a circular dependency: how do you sort and list the locales or languages for someone to pick one, when by definition you don't know their locale yet?

You have to either make some best-guess approximation (IP geo, browser headers, etc) or use a locale-invariant sort, both of which will be wrong in some cases.

notpushkin · 2025-05-19T16:04:19 1747670659

We can find a sorting order with the minimal total distance between where we place a language entry and where this entry would be in that language. If there’s no pair of languages A and Ä such that A > Ä in one and Ä > A in the other, then (I guess???) this total distance will be zero.

baobun · 2025-05-20T04:02:00 1747713720

> A and Ä

Coincidentally, the expected position of "Ä" can vary wildly. Is it an umlauted A, normalized as AE, or a distinct letter coming after Z?

notpushkin · 2025-05-20T04:22:15 1747714935

That’s also part of the reason I’ve chosen it for a placeholder / variable name! The actual placing is not important as long as it’s where speakers of the Ä language expect to find it.

Or suppose there’s languages Ä₁ and Ä₂, where in Ä₁ the ‘ä’ is the umlauted ‘a’ and in Ä₂ it’s a distinct letter. The language list would be displayed as:

A Ä₁ B C Z Ä₂

The only problem / corner case would be such a language Ä₀ that would e.g. sort ‘ä’ before ‘a’. I would still put it after, since it’s where most other readers would expect to find it.

numpad0 · 2025-05-20T13:47:31 1747748851

> "Ä"

OT, but this looks like an adorably blushing hen to me

numpad0 · 2025-05-19T18:43:20 1747680200

can't you just sort all as int? the codepages usually come roughly sorted, and while no one knows which of 檎 or 橙 comes first, I don't think it'll be particularly offensive whichever way a random app did, to most.

vikingerik · 2025-05-19T19:43:54 1747683834

That would be one locale-invariant sort as I said. Sure, you can pick some way of doing it that's least-bad. The codepages are roughly sorted, but what we're debating is the cases where that fails some definition of correctness. The point is there can be no universally correct answer for sorting locales before the user picks one, because that can depend on already knowing the locale itself.

account42 · 2025-05-26T07:51:26 1748245886

There is no such thing as standard codepage numbers.

mananaysiempre · 2025-05-19T14:03:28 1747663408

Yes, the DUCET is bound to disappoint everybody (especially users of the Latin script with diacritics, as none of them agree on the sort order and everyone’s preferences are tied to the specific subset of diacritics they need), but at least it disappoints everybody more or less equally.

(Do yourself a favor, though, and use the CLDR root collation instead of the raw DUCET—they are basically the same, except, and I’m quoting the standard here[1], “the DUCET is not entirely well-formed”.)

[1] https://www.unicode.org/reports/tr10/#Well_Formed_DUCET