Not accepting Accept-Language is one of my major pet peeves. What makes it worse...

magicalhippo · 2025-05-19T15:15:37 1747667737

A related issue that has me fuming is when, after arriving at a page of interest from a search engine, a modal popup forces me to select the country I'm from, and then promptly redirects me to the homepage of the regional website.

Some have a X button to close said dialog, but many don't which is really aggravating.

wudu · 2025-05-19T16:28:30 1747672110

Google does this. I want to check out the new device they just released - "sorry, this product is not available in your country". I just wanted to read the specs, not buy.

jiggawatts · 2025-05-19T18:00:51 1747677651

Products don’t get to be informed about the factory in which they are made, or which shop they are to be sold.

numpad0 · 2025-05-19T18:38:48 1747679928

https://xkcd.com/869/

uselesswords · 2025-05-20T12:13:36 1747743216

https://bbc.co.uk sends me to https://bbc.com in the states and it is just infuriating

CGamesPlay · 2025-05-19T12:52:57 1747659177

> in alphabetical order

Well, it's in an order, but I don't know about alphabetical. I clicked on today's English featured article and looked at the languages: "中文", "Italiano" are "suggested", then the remainder are grouped by geographic region, and aren't particularly alphabetical. They appear to be in groups which are still not alphabetical. Europe seems to have a Cyrillic group but "Қазақша" is shown after "Українська" which isn't accurate in Kazakh and probably also unexpected for anybody who isn't familiar with the letter Қ (Қ isn't a letter in Russian, this is probably why this happens). The Chinese languages don't seem to be in stroke order (no expert here), although Korean is below them (because of course, K for Korean alphabetizes after C for Chinese).

Anyways, no hate for Wikipedia; they do a great job of localizing. Just a bit of nuance/pedantry about how you can't "alphabetize" language names in their own language.

bmn__ · 2025-05-19T13:40:00 1747662000

> how you can't "alphabetize" language names

Not so, this sort order has been standardised as part of Unicode for at least 28 years. To see it in action, pipe the list of languages as a text file through a conforming tool like `ucsort`. When Қ is falsely sorted after Ч, then the wrong algorithm or no algorithm at all has been used.

> because of course, K for Korean alphabetizes after C for Chinese

That's not how it works.

adastra22 · 2025-05-19T15:06:28 1747667188

Sort rules are different in different locales.

vikingerik · 2025-05-19T15:47:57 1747669677

It's a circular dependency: how do you sort and list the locales or languages for someone to pick one, when by definition you don't know their locale yet?

You have to either make some best-guess approximation (IP geo, browser headers, etc) or use a locale-invariant sort, both of which will be wrong in some cases.

notpushkin · 2025-05-19T16:04:19 1747670659

We can find a sorting order with the minimal total distance between where we place a language entry and where this entry would be in that language. If there’s no pair of languages A and Ä such that A > Ä in one and Ä > A in the other, then (I guess???) this total distance will be zero.

baobun · 2025-05-20T04:02:00 1747713720

> A and Ä

Coincidentally, the expected position of "Ä" can vary wildly. Is it an umlauted A, normalized as AE, or a distinct letter coming after Z?

notpushkin · 2025-05-20T04:22:15 1747714935

That’s also part of the reason I’ve chosen it for a placeholder / variable name! The actual placing is not important as long as it’s where speakers of the Ä language expect to find it.

Or suppose there’s languages Ä₁ and Ä₂, where in Ä₁ the ‘ä’ is the umlauted ‘a’ and in Ä₂ it’s a distinct letter. The language list would be displayed as:

A Ä₁ B C Z Ä₂

The only problem / corner case would be such a language Ä₀ that would e.g. sort ‘ä’ before ‘a’. I would still put it after, since it’s where most other readers would expect to find it.

numpad0 · 2025-05-20T13:47:31 1747748851

> "Ä"

OT, but this looks like an adorably blushing hen to me

numpad0 · 2025-05-19T18:43:20 1747680200

can't you just sort all as int? the codepages usually come roughly sorted, and while no one knows which of 檎 or 橙 comes first, I don't think it'll be particularly offensive whichever way a random app did, to most.

vikingerik · 2025-05-19T19:43:54 1747683834

That would be one locale-invariant sort as I said. Sure, you can pick some way of doing it that's least-bad. The codepages are roughly sorted, but what we're debating is the cases where that fails some definition of correctness. The point is there can be no universally correct answer for sorting locales before the user picks one, because that can depend on already knowing the locale itself.

account42 · 2025-05-26T07:51:26 1748245886

There is no such thing as standard codepage numbers.

mananaysiempre · 2025-05-19T14:03:28 1747663408

Yes, the DUCET is bound to disappoint everybody (especially users of the Latin script with diacritics, as none of them agree on the sort order and everyone’s preferences are tied to the specific subset of diacritics they need), but at least it disappoints everybody more or less equally.

(Do yourself a favor, though, and use the CLDR root collation instead of the raw DUCET—they are basically the same, except, and I’m quoting the standard here[1], “the DUCET is not entirely well-formed”.)

[1] https://www.unicode.org/reports/tr10/#Well_Formed_DUCET

euazOn · 2025-05-19T13:15:12 1747660512

Yes, that’s confusing and probably hard to find a good balance. Someone speaking Greek or Czech may expect to find their language around E (Ελληνικά) or C (Čeština), but nope, on Wiki it’s all the way after Z.

kazinator · 2025-05-19T20:39:24 1747687164

The problem may be is that you need to set the locale in order to get certain alphabetization, but setting the locale won't happen until after the language is chosen.

A reasonable approach might be to sort the list of names by using, as the sort keys, the strings projected through a Unicode normalization function, followed by folding to upper case. Then Čestina gets mapped to CESTINA and at least appears among the C's.

nis251413 · 2025-05-19T13:22:43 1747660963

Don’t special characters always go after the Latin alphabet? I think this is pretty common, and fairly expected behaviour. Of course nothing is perfect but I feel like the way Wikipedia handles it is consistent.

e-topy · 2025-05-19T13:35:30 1747661730

Not in the Czech alphabet: a, á, b, c, č, d, ď, e, é, ě, ...

Also, we regard 'Ch' as its own letter. So yeah, try sorting alphabetically. I'll wait.

bmn__ · 2025-05-19T13:50:53 1747662653

    perl -E'Unicode::Collate::Locale->new(locale => 'cs')->sort … … …'

works. Test cases at https://prirucka.ujc.cas.cz/?action=view&id=900

mulmen · 2025-05-19T14:20:54 1747664454

I love some perl on Monday morning but how does this work when you don’t know the locale?

bmn__ · 2025-05-19T14:51:04 1747666264

Then a system should fall back to DUCET which produces more or less sensible results across all locales.

bawolff · 2025-05-19T15:35:37 1747668937

Digraphs like Ch are common in a lot of languages. Wikipedia supports that fine on category pages. E.g. https://cs.wikipedia.org/wiki/Kategorie:CHKO_%C5%A0umava

If you want to see bizarre sort rules, look up how french sorts accent characters.

thaumasiotes · 2025-05-19T17:38:43 1747676323

> If you want to see bizarre sort rules, look up how french sorts accent characters.

I tried to do this, but there do not appear to be any sources addressing this question.

I did find a French Stack Exchange question asking for this exact information, and complaining that there are no sources (other than an uncited wikipedia page) that address it. There is no answer posted, but there is a comment from a French guy suggesting that there are no official rules.

https://french.stackexchange.com/questions/54217/french-dict...

How were you imagining I would look this up?

bawolff · 2025-05-19T18:11:52 1747678312

Here is a blog post talking about it https://archives.miloush.net/michkap/archive/2004/12/31/3447...

Or a more technical version at https://www.unicode.org/reports/tr10/#Backward

Another case that is kind of weird is thai https://www.unicode.org/reports/tr10/#Rearrangement

thaumasiotes · 2025-05-19T20:33:46 1747686826

> Here is a blog post talking about it

I notice that post suggests that Académie française specifies that accents should be sorted in reverse, and includes a link over the words "Académie française", and yet that link doesn't go to a supporting document.

A while ago I complained on this forum that Amazon's hyphenation for Kindle ebooks is abysmally bad. (Which is still true.) Someone responded to say that the hyphenation algorithm for English requires this. I pointed out that the hyphenation algorithm for English is a lookup table; each word has its hyphenation defined in the table, and when you need to hyphenate a word, you look up the hyphenation points.

Another response linked me to a paper describing how this table can be stored as a set of rules that provide hyphenation points in arbitrary letter sequences rather than dictionary words. That paper is very clear about its goals; it is an advance in data compression, proposing a method of storing a lookup table that takes less space than the table does. It carefully goes over how to produce the ruleset from the table.

But somewhere along the line, people confused the data compression algorithm (of storing the lookup table as a ruleset) for the hyphenation algorithm. They will now tell you with a straight face that a single ruleset that seems to have gone around represents the hyphenation algorithm for English, even if the word you want to hyphenate wasn't in the table that that ruleset was prepared from. And this is false.

It looks to me like something similar has happened in English speakers' understanding of French sorting order. It's very easy to explain why the example quadruplet has the sorting order it does:

    cote
    côte
    coté
    côté

(Note that the Stack Exchange question from 2024 and the blog post from 2004 use exactly the same example.)

These four words have two pronunciations, and the pronunciations are grouped with each other. After that, "cote" comes first by virtue of bearing no accents, and "o" comes before "ô" for the same reason.

What's happening here is that although French generally pretends that "e" and "é" are the same letter, they aren't, which forces -e (not pronounced) to come before -é (pronounced!). "o" and "ô" actually are the same letter, and can be ordered flexibly.

The rule "sort the accents in reverse" arises as a coincidence; it happens to be the case that this distinction is most significant at the end of French words. But French speakers would reject this ordering:

    cetot
    cétot
    cetôt
    cétôt

This doesn't come up because those words don't exist.

makapuf · 2025-05-19T13:32:30 1747661550

Well in my language "é" is absolutely not special, and should definitively be placed near "e" (to the point that uppercase é is often written E instead of É)

jurip · 2025-05-19T13:36:02 1747661762

It depends on the language. Unicode defines rules for it: https://www.unicode.org/reports/tr10/

psychoslave · 2025-05-19T15:17:46 1747667866

If I recall correctly, the default propose a first list that push items which are guessed most likely what the user expect, then a list more complete, and in any case let you filter by typing. I think it also can change the way it behave if you are connected and tweaked your preferences in the matter for your account.

bawolff · 2025-05-19T15:33:52 1747668832

Wikipedia uses UCA sort order in categories (depending on which lang wikipedia you are reading). Most other lists just sort using unicode codepoint order (in NFC). So it depends, but yes, for generated lists other than categories ascii characters usually come first.

philistine · 2025-05-19T13:52:11 1747662731

That’s English hegemony. Languages have their own sorting that they expect. You can’t impose rules to other languages.

Of course at some point Unicode needs to be ordered, but you don’t get to impose technical details to people around the world because it matches with how English does it.

That’s where geo-ip guessing becomes relevant. Show a list with the most likely languages at the top.

swiftcoder · 2025-05-19T14:07:17 1747663637

Or use the Accept-Language. Since we already know the User understands that one, it's probably a reasonable choice for which sort order they expect too.

adastra22 · 2025-05-19T15:08:17 1747667297

That’s not English sort order either.

paulddraper · 2025-05-19T13:36:52 1747661812

Sorting by character codes, yes.

But in the language native locale, no.

af78 · 2025-05-19T13:39:04 1747661944

I guess the default (when no language is specified) is Unicode order:

U+005A LATIN CAPITAL LETTER Z

U+010C LATIN CAPITAL LETTER C WITH CARON

U+0395 GREEK CAPITAL LETTER EPSILON

soulofmischief · 2025-05-19T15:12:26 1747667546

When serving that many languages, a search bar is paramount.

thaumasiotes · 2025-05-19T16:12:34 1747671154

> The Chinese languages don't seem to be in stroke order (no expert here)

They are for me. In the Asia section, 中文 ["Chinese"] is listed first, followed by 吴语 ["Wu"] and then 粤语 ["Cantonese"]. Stroke order is first by stroke count and then by an obscure criterion that I don't know (and that, in my experience, Chinese people living in China also don't know), but stroke count is unambiguous and these are in order: 中 4, 吴 7, 粤 12.

Note that they aren't in alphabetical order: 中 Z, 吴 W, 粤 Y.

Japanese appears between Wu and Cantonese for unclear reasons.

matvore · 2025-05-19T16:43:54 1747673034

It is sorted FIRST by radical and SECOND by stroke order. This is roughly equivalent to the Unicode codepoint sort if you stay in the basic multilingual plane. The order also puts literary chinese afer wu Chinese, which breaks with a pure stroke-count sort:

中文 - 中 = 丨 + 3 strokes

吴语 - 吴 = 口 + 4 strokes

文言 - 文 = 文 + 0 strokes

日本語 - 日 = 日 + 0 strokes

粵語 - 粵 = 米 + 7 strokes

thaumasiotes · 2025-05-19T17:47:34 1747676854

Dictionary lookup is done first by radical and second by stroke count. Collation is not. Stroke count is first.

For example, I have a book of 成语 stories that gives its table of contents in non-alphabetical order. (Since nobody understands the traditional ordering, I also have several such books that put their table of contents in alphabetical order.)

Here is the collation order in the book:

一七八入九人口千小三亡大不专天井见毛月文风为心水四 ...

Note that 三's radical is 一, the first Kangxi radical, and that 一 is listed first. Your theory is wrong. 三 isn't even first among the 3-stroke characters, which start (among these) with 口.

Why did you make up a false answer to this question?

matvore · 2025-05-19T18:16:46 1747678606

The Wikipedia sort for the languages is as I stated above, with Literary Chinese and Japanese between Wu Chinese and Yue Chinese. I explained why it was sorted that way, because radical is considered first. You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.

I didn't say sorting is never done by stroke count alone. But I have seen radical+residual stroke count much more often than stroke count alone. Probably a result of the content I'm accessing. It's mostly Japanese and not intended for children.

The dictionary and non-dictionary sorting distinction that you make doesn't sound like a real thing. The audience, the country, and the number of items sorted are bigger factors. But you're not wrong in that stroke count is sometimes used alone.

thaumasiotes · 2025-05-19T23:14:08 1747696448

> You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.

I can't explain that because it's part of a different logical group, with its name written in a different script.† This puts it parallel to the Chinese options and to Korean.

> The Wikipedia sort for the languages is as I stated above

I took you to be describing the sort order for characters, not for wikipedia. Wikipedia doesn't obey that order either. You can check the page for Jiangsu, where all of the languages mentioned so far appear before the "Latin alphabet" style languages, but 閩南語 and 閩東語 appear after them.

† I also can't explain why wikipedia seems to have chosen 吴语 but 粵語, 客家語, and 贛語. Jiangsu is on the mainland... and so are Jiangxi and Guangdong.

matvore · 2025-05-21T01:27:10 1747790830

  > all of the languages mentioned so far appear before the "Latin alphabet" 
  > style languages, but 閩南語 and 閩東語 appear after them.

Could it have something to do with Minnan and Mindong Chinese articles being written in a Latin script, (despite the language name showing in both Chinese characters and Latin letters) ?

thaumasiotes · 2025-05-21T08:35:01 1747816501

As far as I know, sure, it could.

bawolff · 2025-05-19T15:30:33 1747668633

> Well, it's in an order, but I don't know about alphabetical. I clicked on today's English featured article and looked at the languages

This depends on whether you are viewing desktop site or mobile site. It also depends on if you have a non-default skin set in your preferences.

Seems like desktop (vector-2023) does the region thing.

Mobile does alphabetical by language name (i imagine codepoint order but i didnt check)

Some other skins are alphabetical by bcp47 code.

e-topy · 2025-05-19T11:35:43 1747654543

And it even remembers what you chose last time and pushes it to the top. That is UX. Being actually helpful and not fucking annoying.

whstl · 2025-05-19T13:31:24 1747661484

Oh nice! I never noticed that "Suggested languages" shows languages I previously selected.

But additionally, I like how it's not simply "pushing to the top", it does shows a previously selected language on the top, but it still keeps it duplicate in the list below, in case the user is going by muscle memory.

To me this is the best way.

Either make it VERY OBVIOUS that you're removing the item from the bottom of the list (which wouldn't be possible here), or don't remove it at all.

If I had a cent for each time a SaaS made my life harder by trying to "help me" I would be CEO of every SaaS I use.

TheJoeMan · 2025-05-19T13:33:33 1747661613

I can give a perfect bad example: the Youtube app on my iPhone somehow determined to change to Amharic. This is the Google support article: https://support.google.com/youtube/answer/87604 telling me the buttons to press in English. Also, I don't know/speak Amharic, and so at the time had no idea what language it was, and the iPhone translate doesn't even recognize this Ethiopian language. Bit of a pickle that could have easily been mitigated by a universal multilanguage icon.

JumpCrisscross · 2025-05-19T13:35:04 1747661704

> the Youtube app on my iPhone somehow determined to change to Amharic

Was this about 6 weeks ago?

TheJoeMan · 2025-05-19T14:04:28 1747663468

Yes, actually. Is there an article about it? After updating to iOS 18.4, Amharic was appended to the end of the list of preferred languages in the iPhone Settings app. However, what's interesting is it was ranked below English, and apps are supposed to use the languages list in order, but perhaps Youtube is alphabetically sorting the list?

JumpCrisscross · 2025-05-19T14:52:25 1747666345

> Yes, actually. Is there an article about it?

Not that I know of. It just happened to me, too, around then. I thought it had to do with my pet fascination with the Ethiopian civil war and GERD.

distances · 2025-05-19T17:25:39 1747675539

Or the ChatGPT app which can be baffling. My phone language is English, I've set ChatGPT app specifically to English in the app settings, I ask my questions in English, and every now and then it still decides to answer in German.

hombre_fatal · 2025-05-19T14:56:48 1747666608

You can set the language individually per app in the iOS settings. But I thought it defaulted to your global setting.

tlb · 2025-05-19T13:57:05 1747663025

There are two levels of this. If I get some other European language, I can generally figure out which is their word for English and it's not a big deal to switch. But if it gives me a script I don't know, like Bengali or something, it's a problem.

Perhaps every "choose language" menu should include English and Chinese in non-localized form, as an escape hatch, since almost every web users can recognize enough of them to navigate a menu to find their actual language.

adastra22 · 2025-05-19T15:11:06 1747667466

Just include languages in their own script only. Why would a user need to select a language from that menu for which they DON’T know the target script? Showing Bengali in Bengali script is exactly what you want.

bawolff · 2025-05-19T15:36:43 1747669003

True, but how often do you want to select a language you can't read?

derf_ · 2025-05-19T13:45:17 1747662317

My favorite is the sites who do parse Accept-Languages, but then pick the last one in the list instead of the first. I have mine in rough order of my competence in them, which gets me my least-competent language on some sites even when my most-competent is supported.

I get a kick out of it when I see it, because you can understand how it happens. "Well, at least you tried."

bmn__ · 2025-05-19T13:59:24 1747663164

It is wrong to blame the server here. For better results in content-negotiation, a user-agent should allow you to assign numeric weights instead of just a list (implying the same level of preference). Example:

    Accept-Language: en;q=0.7,pt;q=0.3

If you already send something similar to this, and the server gets it wrong, then this is an outright bug, the software or its operator is out of compliance with HTTP.

drtgh · 2025-05-19T16:24:27 1747671867

This parameter, at first glance, appears to be used as numeric weights for automatic translations served by default, what turns browsing very uncomfortable (wrong translations, distorted texts).

Ie. Google, Youtube, Reddit.

Automatic translations should never be served by default, but only be loaded if the user requests it. The classical "do you want translate".

bmn__ · 2025-05-19T17:29:08 1747675748

It was and still is used for manually translated text, among other things. Does this help you get the full picture?

drtgh · 2025-05-19T23:49:01 1747698541

I don't need help, I'm criticising the use of the automatic translations served by default, and that are being used the weights of this parameter to do it.

The full picture? The weights seem to be more useful for fingerprinting and perhaps for server SEO than to help the users. Users who in the end will have to give the same weight to all the languages, or rewrite the outgoing headers, in order to be able to browse the Internet.

gus_massa · 2025-05-19T13:45:51 1747662351

99% agree, but there is a problem on mobile, to switch from Spanish to English when I click the glass to search for alternatives. I have to type "Ing" (that are the first letters of "Inglés") while it shows "English" in the list of matches. It would be better if I can type both "Ing" or "Eng".

netsharc · 2025-05-19T15:15:30 1747667730

It's even more amusing when the displayed list looks like it's sorted randomly, but in reality it's sorted alphabetically in a different language to the display language..

e.g.

    Nederlands
    English
    Français
    Deutsch
    Espagnol

(but if sorted in English: Dutch, English, French, German, Spanish)

elric · 2025-05-19T17:44:12 1747676652

I don't understand why you would want to select Inglés instead of English? You want to selecf English, or Español, or Nederlands, or Deutsch, or whatever language. If makes no sense for it to be translated.

gus_massa · 2025-05-20T01:18:19 1747703899

Only in a very few weird corner cases. If it's an article about a city in Germany, I may like to see the article in German and use autotranslation to read t in English or Spanish.

Sometimes the article in the local language has more info. I had that problem in comments about places or events in Argentina. Sometimes the English article has less info than the Spanish article, so I made a link to the autotranslation.

lxgr · 2025-05-19T16:27:53 1747672073

Ironically, given TFA, it seems to be primarily using the user's IP:

> How does Universal Language Selector determine which languages I may understand

> ULS queries a service that determines your originating country based on your IP address. This is inaccurate in some cases. Based on the country code, most often spoken languages are suggested for you.

(from https://www.mediawiki.org/wiki/Universal_Language_Selector/F...)

Nemo_bis · 2025-05-19T19:22:46 1747682566

Indeed. Wikimedia wikis' language selection feature relies on Unicode CLDR language-territory data. This is very complex to maintain (and there are still many mistakes to fix), because reliable data is expensive to collect.

https://www.mediawiki.org/wiki/Universal_Language_Selector/F...

dalmo3 · 2025-05-19T14:32:41 1747665161

Funny. The Wikipedia home page has a "Language" button. Like that, in plain English. And it is translated to the language you switch to.