How much is poor text input eroding the use of non-latin script worldwide? I.e. how many just give up instead of using poor input?
As an example: I use a Swedish keyboard. If I'm in a situation where ö is missing from the input (because the app is stuck in an en-US keyboard layout say, so I have to hit a modifier ¨+o to type an ö) then there is a very good chance my communication with my colleagues would just naturally be in english instead. I'd sigh and just give up using it. Rather than using an o for my ö I'd just type it in english instead.
Is this a thing in e.g. asia, israel, or the arab world? Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script? Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages? (You could argue that emoji is just that but the other way around I suppose)
It's the same for me when writing Swiss-style German. It's not only cumbersome to type the special characters like ö on a cell phone, also spell-check is a pain in the ass. For example, spell-checkers are not aware that in Switzerland, we do not use the ugly German ß and just write ss instead. So I get false corrections all the time. Furthermore, spell-checks usually seem to be designed for languages like English with only a limited number of flections. English only has singular and plural, but in German, the ending of a word can be bent in much more ways, making it rather unlikely that the spell-checker suggests the one you want. I suspect that most spell-checkers do not know that "car" and "cars" are the same word in a different form. Instead, they add each form to the dictionary individually. This works well for English, but for many other languages awareness of this would be helpful. Another detail is that expressions that are short in one language can be long in another and vice versa, so many direct translations of English words are much longer in German, taking more space on buttons and other UI elements, thereby screwing up the layout.
So yes, even when "localized", hardware and software is usually not fully adjusted to the language. And for German, which is relatively close to English, the problems are probably relatively small. I cannot even fathom how big all these issues for more distant languages must be.
And to answer you questions: yes, these pain-points lead me to sometimes prefer English and to avoid special characters. I even avoid spaces in file names as too many programs I worked with in the past struggle with that.
> Furthermore, spell-checks usually seem to be designed for languages like English with only a limited number of flections. English only has singular and plural, but in German, the ending of a word can be bent in much more ways
Oh, and I always say German works still great. In Finnish words have many 100s of different endings (several concatenated + combinatorics explain how this is possible) and predictive inputs just don't work. Probably it could be improved with massive semantic support, I am not an expert in that field.
For Hungarian, people stuck with an English layout usually just leave off the diacritical marks (áéíóöőúüű -> aeiooouuu). While this theoretically leaves some of the meaning ambiguous, and pedants can craft examples that may be ambiguous even with context, it works well enough in practice.
Switching to English is way overblown a reaction. Two Hungarians chatting in English (unless there are non-Hungarian speakers involved) seems extremely weird to me. It may be partially that English is really foreign for us, while it's pretty close linguistically to Swedish, both being Germanic.
Hungarian is really foreign to EVERYONE :)
Me switching to english is in the context of work, not chatting with peers in my free time. At work in tech everyone* (few exceptions) communicates in english (emails, bug reports etc) also between Swedish colleagues, so it's very natural.
I wouldn't send a text message to my wife in English if I happened to struggle with the diacritics on the device I'm on.
> Switching to English is way overblown a reaction. Two Hungarians chatting in English (unless there are non-Hungarian speakers involved) seems extremely weird to me. It may be partially that English is really foreign for us, while it's pretty close linguistically to Swedish, both being Germanic.
For what it's worth, I have a Swedish friend who almost always posts in English, even when he's talking to his family and other people he knows in real life. He'll switch to Swedish occasionally (and Facebook's autotranslation is very understandable), but 90% of the time he uses English.
Yeah in swedish leaving them off isn't working. The diacritics aren't for accentuation, they are distinct letters. An ö is as different from o as u and e are.
Same in Hungarian. One funny example is "főkábel" (fő+kábel, main cable) vs. "fókabél" (fóka+bél, seal intestine). Still, the intended meaning is almost always easy to guess.
Hungarian is also redundant enough that you can even replace all vowels with just one and still be understandable. Most of the meaning is carried by the consonants. E.g. "Szia én vagyok Péter, hogy vagy?" -> Szii, in vigyik Pitir, higy vigy. (sounds obviously wrong, but very understandable) Retaining the vowels but collapsing all consonants to one would be more destructive to the meaning.
Uh, what do you mean? Leaving them off and letting the reader guess the intended word works pretty well in practice and is what many organisations in Sweden resorted to when getting a domain name (ex: riksgälden, åhlens, företagarna).
I can only say about CJK with an example: the web version of Twitter has frequently omitted the last character of CJK messages yet to be committed, and people seem to complain a lot but have adapted somehow. I think this is partly why it doesn't get fixed in time---it is frustrating but not a blocker. [1]
[1] And when the affected user base is small enough, even larger problem can take a lot to fix. I use a third-party IME that is relatively known among programmers, and early this year Google Chrome began to crash seemingly at random when you type Hangul. I tracked the root cause and I believe it is ultimately Chrome's fault, but I was too busy to file an issue at that time and the IME has adapted to Chrome to avoid triggering the crash next month. I believe Chrome still crashes when older versions of the IME are in use.
> I use a third-party IME that is relatively known among programmers, and early this year Google Chrome began to crash seemingly at random when you type Hangul
I'm aware that third-party Chinese and Japanese IMEs are a thing, but what user-facing difference does the third-party IME you mention provide in the Hangul context?
- Support for less-popular or customized keyboard layouts. From time to time popular OSes had a wrong version of supported layouts as well.
- Deterministic autocorrection. For example, normally an initial jamo (e.g. ㄱ) should be followed by a medial jamo (e.g. ㅏ) and not vice versa, but many IMEs offer a feature that automatically swaps them. With a carefully designed layout this can catch lots of transposition typos.
- Custom candidates for special characters, as Korean IMEs show special characters as candidates when a jamo is being "converted" to Hanja (popularized by MS IME).
The IME in question [1] also allows an extremely customizable input system, to the extent that it forms a soft of wholesale DSL for Hangul IMEs.
In Israel I've found that Hebrew speakers exclusively write in Hebrew. Aside from letters, there's nothing else needed. One CAN add vowels, but only children's books have vowels so we're used to reading without them.
I can speak a little on Chinese and Japanese. In neither of these countries, English (or sometimes even Latin) literacy is high enough to replace the domestic language. (And anyway, maybe it's just me, but reading romanized Chinese or Japanese is exhausting.)
With China, the market's so big that there's an entire ecosystem of homegrown (or Chinese-modified) software and websites, everything from office suites to web browsers, and I have to imagine those support Chinese input just fine.
With Japanese, generally the same software and sites are used, but Japanese support is also pretty good across common software.
Some niche or open source software struggle with IMEs in general, but in those cases, the solution is just to not use that software.
Not that this addresses your main question, but for most latin script languages you'll be fine with always using the altgr-intl variant of en-us.
As for mobile, you can generally find the language-specific ones when pressing and holding some character when you're on a normal qwerty keyboard.
For Scandinavian:
å: altgr + w
ä: algr + q
ö: altgr + p
æ: altgr + z
ø: altgr + l
Should be available on all OSs. I never have to switch layouts when switching between languages. Still struggling with finding an actually nice IME setup for Japanese and Chinese, though, and I've spent some time...
On Linux: setxkbmap -layout us -variant altgr-intl
Regardless, I really think input methods should be kept out of scope for all apps. There's nothing I hate more than app developers trying to "solve" it for me. There are already people who work full-time on this and users who have their systems set up as they want. There's no way an app developer is going to help more users than they break things for.
The only exception I've seen is Google's web CJK input for Translate, but they fall under the "huge amount of resources" category.
Context: I write in Swedish and Japanese on a daily basis, spent some time learning Chinese and sometimes have to retype things from other languages. Using vanilla iOS and Android on mobile, Linux on desktop. CJK has been the only struggle so far. There's enough tech-literate people with those native languages that there definitely are ways to get good configuration for any OS, for Linux you just have to figure out how, which can be tricky to find if you don't read the language.
I don't even have an AltGr button on my keyboard (!). It would be very cumbersome to write swedish with AltGr though, as the characters are quite common.
Not even a right Alt key - should be the same if you set the layout..? That's a really minimal keyboard, all 60% keyboards I've seen and even some 40% keyboards have that... I'd actually be curious to see your bastard keyboard!
YMMV, but I got used to it pretty quickly and I type Swedish all the time, way less annoying than doing the Alt-shift dance to switch layouts whenever I go from coding to chatting at least ;)
I always have an AltGr key, because I tend to program on Finnish keyboards. (Identical layout as Swedish, but because nobody came up with a term to group those 2 together, there seem to be always 2 options to chose that make no difference...) What I really hate about AltGr / this layout is the curly braces and square brackets. It makes programming so much harder to need to use AltGr for each of them.
That said in Linux/X11 you could configure everthing to your likings. I must admit I prefer complaining...
Yeah, this is exactly the reason I stopped using the Swedish/Finnish layout altogether. It must be hundreds or even thousands of time I accidentally typed :( instead of :) in IMs because someone at some point decided the parentheses had to be shifted one step...
Right-alting for åäö is just way less cumbersome than doing the same for @${}[]\~| or having to switch layouts when switching contexts.
What are the cases when your keyboard is stuck in English?
In Ukraine, no, we don’t. Kids don’t generally know English enough that they would want to speak it with peers, and there are simply no reasons to do so.
“Translit” (writing Ukrainian/Russian with English characters) was a thing in SMS days, when typing in Cyrillic would shrink your SMS length in half (because that’s how SMS encoding worked, I guess), but now practically nobody uses it anymore.
> “Translit” (writing Ukrainian/Russian with English characters) was a thing in SMS days, when typing in Cyrillic would shrink your SMS length in half (because that’s how SMS encoding worked, I guess), but now practically nobody uses it anymore.
Ok that's exactly the type of thing I was thinking about. Funny it was born from the limitation of 140 chars rather than from actual input difficulty though.
A dude is still sending me messages in translit over the web every time he's in foreign lands, even though a language is added to the OS' switching set in a few clicks these days.
But back in the 90s to early 2000s, translit was pretty big on the internets due to the dozen of different Cyrillic codepages and some software supporting only 7-bit characters (specifically Fidonet exchange software and newsgroup nodes).
Just the fact that I still to this day run into mis-encoded diacritics in movie subtitles, shows how ‘US first’ ASCII caused a lot of headaches for many years.
(BTW: you probably have been told by now, but your nickname is a common slang word in Russia.)
In India, it was definitely born because of input difficulty, its such a big thing that It has given rise to Hinglish (Mixing Hindi and English) plus translit is the only way 99% of people type here
As a Swede I use a US keyboard layout and a compose key as you've described. alt gr followed by o and " for example. It was cumbersome to write Swedish fluently for a few days, but after that it felt natural. A good boon is that now I also don't have a problem typing a bunch of characters common in other Latin-like scripts like ç, ø or ß.
That said, I type mostly in English anyway. My reason for using a US layout is that it is more ergonomic for bracket/brace/semicolon heavy programming languages. Braces in particular are an annoying chord with Swedish keyboard layouts. Especially on OS X, which for whatever reason uses three part chords (alt+shift+8 or 9).
> Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages?
Yeah, transliterating arabic to english is a thing:
> With smartphones, the relationship between informal, chatty Arabic and formal written forms has become more complicated, giving rise to a hybrid known as “Arabeezy” – or Arabic written with Latin characters and numbers to represent letters that have no English equivalent.
In Sweden and Finland people typically prefer to omit the diaeresis without adding a trailing e. Due to Germany getting their way, though, the American-readable part on passports uses the German-style conversion to ASCII.
> Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script?
As the input method of CJK languages are significantly different from English (compared to the relatively small difference), every app just allows the locale's input method.
Nobody tries to communicates with only the English keyboard, b.c that is outright impossible (while in Swedish it's possible in worst cases).
As I said I wouldn't use Swedish language without a Swedish keyboard, even though it's quite possible (just 3 missing characters basically). What I'd do is switch to english language instead.
> What I'd do is switch to english language instead.
Hmm, I've never encountered situations when only English keyboard is available for communication (except for when someone is setting up a new Linux machine).
But even if there is such situation, I don't think that would happen, I'm not sure if Swedish is similar to English but English is cognitive load here. Even with people that are proficient in English, I cannot imagine myself communicating with the English language.
I remember myself resorting to online keyboards though(while setting up Linux).
In industry in Sweden, especially in tech, english is the language used. If you have a team of 10 Swedish developers working on an app, they will most certainly write all user stories/bug reports/specs/code comments etc in 100% english. The 11th person to join might be an english speaker so you can't afford to have your backlog or code in Swedish. The step to using english for email/chat then is very small.
> Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script? Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages?
Both, in my part of India. The latter used to be much more common in the feature-phone era, writing in Tanglish (Tamil words+English script), but seems to have become much less popular now. Not sure why, perhaps swype-type keyboards have made English so much easier to type.
The former certainly still happens a lot - I know a bunch of people (including myself) who would prefer to communicate in Tamil, but there's enough friction (not having a Tamil keyboard, having to get used to new keyboard layouts, the Tamil keyboard being buggy often enough, ...) to make it more appealing to go with English for normal conversation.
AFAICT, Swedish keyboard has a dedicated key for ö. You just press it. I don't understand how an app can get stuck on keyboard layout if you set the system keyboard layout to Swedish. Could you give an example?
Or are you're actually using US keyboard with Swedish layout and there is no ö key on it?
There are multiple things in play here: keyboard layout which is slightly different for a swedish keyboard https://en.wikipedia.org/wiki/QWERTY#Swedish and of course keyboard language which maps the key pressed to a character. You can mix and match those.
Lacking the Ö key doesn't matter so long as pressing the key that is where the Ö key should be actually produces an Ö (which is the case e.g. with a US keyboard layout but a Swedish keyboard chosen in windows).
> Could you give an example?
It was a hypothetical, but in windows you have a per-app layout (so you can choose multiple keyboard layouts in windows and an icon in the systray will let you choose the active one) and that one often spuriously switches so suddenly you hit the Ö key and it produces a ';'.
A better example is perhaps when I'm travelling and composing an email from someone elses computer at my office in Britain back to my office in Sweden. I'm not going to bother switching the windows settings just to write the email on that machine.
Or if I'm logging on to a VM with only english layout so hitting my Ö key will produce a ;
If I needed to type anything inside that VM desktop for whatever reason, I wouldn't bother trying to write Swedish even if the person I was typing to was Swedish.
> Or are you're actually using US keyboard with Swedish layout and there is no ö key on it?
Personally I actually do use a US keyboard wit ha Swedish layout, which works fine. If I hit the key labelled ';' it will show an Ö on screen. The reason is I wanted a specific type of keyboard that didn't exist with Swedish layout, and the layouts are almost equal. It's just one missing key between lshift and Z that doesn't exist in US layout.
I can second that particular grief in windows. I use english input for most things but spanish for a few; it will randomly switch. This is doubly aggravating when I am in, say, vim and it messes everything up before I figure out why things aren't responding. Or, I go to put in a `[`, get nothing, press again and get `''`, or press a key and get `á`. Let alone if I'm trying to type programming-related chars in my spanish keymap (taking notes in markup syntax etc.); I then have to remember modified positions of keys (parentheses are the worst). Dealing with essentially all non-char keys getting hijacked is very confusing; just writing this, I have made several mistakes.
Honestly, English is uniquely suited to computer input. A narrowly-defined charset maps well to a situation where you can only fit so many keys on a keyboard or on a screen. I also sympathize with difficulties implementing languages that have ligatures, right-to-left reading, or other significant differences. Non-discrete chars just don't map well to a world of ones and zeroes, because representing all is the simplest option but has O(n!) complexity in such languages (assuming you can combine all, which is probably not possible, but you get the point). I have a great deal of sympathy for maintaining such complexity for what can be a very small part of your user-base (that is can communicate in only one foreign language).
I assume this is about having to use other people’s devices, which aren’t configured how you’d configure your own.
I personally have muscle-memory for my own programming-optimised adaptation of Type 2 Dvorak, regular German layout, and UK/US/International variants of English keyboards, each in Mac and PC variants. And, I suppose, in mobile phone and tablet screen-keyboard variants.
It always takes a few seconds to adapt to whatever I’m sitting in front of, and I have a lot of practice with this compared to the average person.
At least in German, there are standard substitutions for when umlauts and sharp S aren’t available for technical reasons:
ä -> ae
ö -> oe
ü -> ue
ß -> ss (historically “sz“, which you still occasionally see, particularly where there is Hungarian influence; Swiss German tends to forego ß altogether)
It happens to my wife's German USB keyboard in Windows 10. One day it's working fine, the next day pressing Z outputs a Y and Ä outputs a semicolon. I've had tons of problems with Japanese input in Linux and Mac (but not in Windows).
Multilanguage support is a function of how many customers demand it, and aside from English and Japanese, there hasn't been enough economic incentive to take it seriously.
The keyboard sends a scancode, and something, either the text input of the OS, or the app themselves, translates that to an ö. Second, it is possible that the system just does not load the correct keyboard layout and defaults to US, so you loose your familiar keyboard layout precisely when you already have other problems.
Parts of the system hardware might also do scancode to text transcoding. When PS/2 to USB keyboard convertor cables were common, I've seen quite a few having trouble with keys not being correct for non-ASCII characters, especially when using Alt GR
There isn't text coding. The problem stems from the Microsoft-centric design of USB modifiers — there is no true AltGr code, so they have the convention (at the OS level) that left Alt is always Alt and right Alt is sometimes AltGr. A few early converters didn't grasp that left and right versions of the same modifier key could have different meanings, because nobody in their right mind would do that.
I know in Arabic people sometimes write in a way intended for telegraphy when they use phones and don’t want to or can’t switch IME (the main thing I remember seeing is 3 being used as a letter.)
As an example: I use a Swedish keyboard. If I'm in a situation where ö is missing from the input (because the app is stuck in an en-US keyboard layout say, so I have to hit a modifier ¨+o to type an ö) then there is a very good chance my communication with my colleagues would just naturally be in english instead. I'd sigh and just give up using it. Rather than using an o for my ö I'd just type it in english instead.
Is this a thing in e.g. asia, israel, or the arab world? Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script? Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages? (You could argue that emoji is just that but the other way around I suppose)