Hacker News new | past | comments | ask | show | jobs | submit login
Help! Is This Arabic? (isthisarabic.com)
635 points by davikr on Feb 26, 2023 | hide | past | favorite | 298 comments



Tangentially, for those doing web development, CSS has a bunch of "logical" properties [1] that adapt to the locale of the user agent. For example, you can swap out `margin-left` and `margin-bottom` in your CSS with `margin-inline-start` and `margin-block-end` respectively. Similarly, `text-align` accepts start/end instead of left/right. Even if you're not targetting right-to-left or top-to-bottom locales, it's easy enough to switch to logical properties and get most of the work out of the way if you change your mind in the future.

[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Logical...


Do "top-to-bottom locales" exist? Has anyone built a site with users that get utility out of these?

I've usually found that the arguments for these properties revolve around, "well, what if I decide to use traditional Mongolian in the future?," which seems like the biggest case of YAGNI I can think of. I suspect their popularity is owed to tutorial authors showing off their CSS prowess.

(I'm also not convinced of the need to flip sizing values for RTL/LTR, but that's at least useful)


Here's a guide to styling such text: https://www.w3.org/International/articles/vertical-text/

Chinese/Korean/Japanese/Vietnamese originally, but computers have made horizontal writing more common on the internet. Modern Vietnamese is also written in a variation of the Latin script, of course.

Mongolian still uses the vertical Mongolian script, though Cyrillic script m has also been introduced in Mongolia during Soviet times and seems to be common inside Mongolia. However, the Mongolian government seems to be moving back to using the original Mongolian script. Furthermore, the Mongolian people inside China never took up the Cyrillic writing system.

Websites can and do use the vertical script: https://president.mn/mng/ http://khumuunbichig.montsame.mn/index.php?home&readnews=572

Google uses the Mongolian Cyrillic alphabet (https://www.google.com/?hl=mn) so even companies that seem to have a page in every single language don't seem to bother with supporting the original script. This is probably because the Mongolian speakers inside China can't make much use of Google anyway.

Funnily enough, Mongolian is one of the few known scripts not only written vertically but also left to right, unlike other Asian vertical scripts (which go right to left).


Yes! The website of the president of Mongolia has three versions, including one with the traditional top-to-bottom script. It's quite well-designed! You can see web-design features that make sense for this script: the nav-bar being on the left, left-to-right scrolling, and pagination at the right end.

https://president.mn/mng/


That site is well made. This is a bit pedantic, but if the default for that site is horizontal text, I'm not sure I would call this a "top-to-bottom locale": https://president.mn/


I’m somewhat surprised to see that some of the characters are not being correctly displayed in iOS Safari. I was expecting the symbols common enough to be used on an official page of a head of state to be common enough for inclusion in the Unicode standard. Or is it a font issue?


They should make people turn their monitors vertically oriented --are monitors in MN usually in "portrait" or do they maintain a discrepancy of vertical text but horizontal layout?


I think it's ok to have horizontal monitors even for "top-to-bottom" scripts. That's simply because it's less taxing for us to use. Plus, mouse scrollwheels work just as well so it's not problem to navigate.


Go to any random bookstore in Taiwan and pick up a book in traditional Chinese. It's top-to-bottom, and right-to-left. The spine of the book is flipped compared with western books: it's on the right side when the cover is facing up.


Japanese is actually top to bottom, but they largely gave up on that idea. Probably too much hassle.


But, there are still texts that are read top to bottom. I only know that because my daughter has a Japanese friend and she has shown me some of her books and how they are read top to bottom.


Same with Mongolian, but JP goes TBRL (Top to Bottom, Right to Left), Mongolian goes TBLR. Wikipedia has an overview: https://en.wikipedia.org/wiki/Writing_system#Directionality


I'd say it's YAGNI if it required going out of your way to add something on top of your project. CSS logical properties mirror traditional properties one-for-one. Once you know about them why would you go back to older layout properties? The need to support different locales is really a question of what product you're building or audiences your targeting. e-commerce, news sites, social networks... they seem like good use cases for making the switch and future proofing a little.


I love supporting different locales, and have shipped sites for multiple continents. I don't know if your locale in question exists.

Forget my question on your users getting utility out of this. Have you ever seen a site, any site, that supports switching to top-to-bottom writing? Against what future are we proofing – a new language arising?


The site of the Office of the President of Mongolia is available in top-to-bottom traditional Mongolian script: https://president.mn/mng/

Other examples (e.g. vertical Japanese) aren't too hard to find: https://nishinokensetsukogyo.co.jp https://ok-maru.jp etc.


We're looking for vertical to horizontal switching. Not just examples of vertical writing.

Almost all extant vertically-written languages are more commonly seen horizontally on the web. Vertical Japanese on the web (from my American understanding) is a design choice, and is notably absent from mainstays like https://www.yahoo.co.jp/

That is to say, it is unlikely that a site's language switcher would opt for top-to-bottom writing.

Edit: the Mongolian president's site does have an English version! ...but it's a completely different site :( https://president.mn/en/. Still, this is the closest to a use case for logical properties I've seen, so kudos.


It's unlikely (I'd guess) that a site would want to offer the exact same page in both horizontal and vertical modes, yes. But it's plausible they might want to share a lot of design elements between horizontal and vertical versions of the content.

Using logical properties would make it easier to have a common base of CSS that controls spacing, sizes, etc., that can be used by both language versions, rather than maintaining them completely separately.


For finality, I will quote my original comment:

> the arguments for these properties revolve around, "well, what if I decide to use traditional Mongolian in the future?," which seems like the biggest case of YAGNI I can think of

...And even this one close example doesn't use them!


One of my favorite observations about forms that are in Arabic and English (or any left-to-right and right-to-left language pair) is that you don't need to pick one language as the primary language. The trick is to put the English on the left, a blank field in the middle, and the Arabic on the right. This way an English speaker will naturally read the form as an English form with the Arabic translation off to the right. But an Arabic reader will naturally read the form as an Arabic form with an English translation off to the left. Neither language is dominant.

Contrast this with a form in English and Spanish, where you need to put the text all on the left and decide which language goes on top.


Your text inputs still need to flow either left-to-right or right-to-left though? In which use case would this be useful?


If we're talking about an HTML text input field, you can use `dir=auto`. Other tools should have similar features. This approach is also very common on paper, which naturally flows in the correct direction.

https://www.w3schools.com/tags/att_global_dir.asp

https://caniuse.com/mdn-html_global_attributes_dir

Edit: oh use case? Imagine a PDF form you'd like someone to fill out, or a page that may get printed. You could create two forms and let the person filling out the form pick their language, but the person processing the form may prefer to read the form in the other language.


> Your text inputs still need to flow either left-to-right or right-to-left though? In which use case would this be useful?

Anything that doesn't require input - particularly printed ones where you can't just swap with a button. Signs, menus, etc.


Layout on an airline magazine where the left side of the page is English and the right side is an RTL language like Hebrew or Arabic.


This website is well-meaning, but will be difficult to parse if you don't have elementary Arabic and can't tell apart ال from ل ا.

What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them. The middle case, where the sentence is correctly rendered RTL but the individual words are LTR (breaking the ligatures), is particularly common and insidious because it looks plausible to non-Arabic speakers.


I actually thought that part was perfectly clear.

> In general, the letter combination ال should be common.

So if your text has more than a few words, you should be able to look though your text and see that somewhere.

I can't read Arabic but I can recognize that pattern. I went to https://www.bbc.com/arabic and could find numerous occurrences.

It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."


> It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."

This is true. You should also be able to see the word "the" in several places in English text, and if it is rendered as "ehT" or something like that, the ligature code may have a bug -- similar with reversing the ال (read as "al") pattern.

> I can't read Arabic but I can recognize that pattern. I went to https://www.bbc.com/arabic and could find numerous occurrences.

Really? I only ever learned to read just enough Arabic to read parts of the Quor'an and am by no means fluent, but I couldn't see any mistakes on that website myself.


The comment was that they found numerous cases of ال, not ل ا.


> In general, the letter combination ال should be common.

It was a really smart way of saying you don't need to know anything at all to pattern match on this a spot a very common problem.

Also, if you haven't checked out the youtube video on that page I highly recommend it. It gives a great concise summary of the issue, and it's impressive how much I was able to learn to visually parse Arabic script with only a couple of mins into the video.

[1] - https://youtu.be/X1ynZm1wI18?t=65


A group of flying kookaburras flap about a body of liquid. In this group, individuals distinguish from distinct sorts. Small, big, colorful or plain, all flap wings as a way to maintain afloat. It's a sight to watch as this group zips, zooms, and turns all around. It's a natural habitat for this flying squad, and you can find it in many spots around our world. This group adds to our natural world's charm and, through its distinct traits, brings a lot of joy to many.


Ah yes, the good old "your heuristic doesn't work on my carefully constructed pathological case" observation, wouldn't be the Internet without that.

GP also said "probably". That's how heuristics work.


You would almost think what you described was our profession :-)


Wait, you get paid for Internet snark?


> your heuristic doesn't work on my carefully constructed pathological case

This is a bit like writing unit tests, debugging code, and so on.


It's a heuristic (claims to be probably helpful). You might as well complain that checking string size first as part of an equality check of strings that demonstrably have a very large distribution of sizes is obviously dumb because obviously you can construct same-length strings at will. Breaking rules of thumb is the easiest thing in the world.


Great example because if that text is on your corporate website then someone upstream is not doing their job

Edit: obviously it is English but it's definitely not correct for a company website.


Hey, you now want corporate website standard English with no Es? Talk about scope creep!


But it may be totally correct for a website about kookaburras.

There's more than just corporate websites, and frankly, if a company of any meaningful size offers content in Arabic, I'd expect them to hire someone for that. Even part-time or freelance.


> But it may be totally correct for a website about kookaburras.

It isn't; "to maintain afloat" is not grammatical.

You could replace that with "stay afloat" and it would fix the grammatical error without introducing an E.

There's a similar unforced error in referring to a "group" of kookaburras rather than a "flock".

"Individuals distinguish from distinct sorts" is gibberish. I cannot tell what it's supposed to mean.

"All flap wings as a way to [stay] afloat" is, at best, very awkward; fluent English would require "flap their wings", but that would introduce an E.

"Flapping about a body of liquid" is a very odd thing to say unless the body of liquid happens to be suspended in midair, since midair is the only location where you can find birds flapping.


Now, find anothEr 20% of commEnts hErE that follow the samE rulE as yours.


The 'e' is in your username. :)


> It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."

For the exception, see

see https://en.wikipedia.org/wiki/Gadsby_(novel)


Wow, having read the example prose, it is quite an interesting sounding book.


https://en.wikipedia.org/wiki/Ella_Minnow_Pea is a riot of a read. Starts with a full complement of letters and drops one letter chapter by chapter.


The article even links to a site where you can practice: https://notarabic.com/

Thanks to this training, I can now identify that numerous Arabic strings on that site are backwards. He wasn't joking, IJ is everywhere.


> ال from ل ا.

Who can’t tell these two character sequences from each other? Genuine question.

I think the main point the author is making is, if you are including Arabic script somewhere, take a bit of time to either do it right or hire someone to do it right for you.


I believe the difference is simply that the two sequences are in reverse order of each other.


Not only this but the reverse of ال is لا because Arabic letters change based on position of the letter (initial, medial, final). So ل ا is never going to be a word since you lack joining, since the ل will look like لـ if joined and لا is a distinct ligature


> What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them

I don't think it does. Just hire someone.

I guess really, the point Rami tries to make is that not a single person who reads Arabic was involved in the video game/advertising/website/etc. The errors are often so basic that a child could point them out.

It would be like if someone wrote English without any spaces between words. It's so painfully obviously wrong.


It's one thing for a small personal project to get this stuff wrong, and there's somewhat of a baseline you can try yourself, but nobody is really going to care that much. It's like those funny pictures of restaurants in China with an English name of TRANSLATION SERVER ERROR.

What the site is about and where hiring someone makes sense is anything "big budget", especially if your target market includes Arabic-speaking or Arabic-adjacent countries.


> a simple reference input string, examples of how it gets broken, and what to do to fix them.

I really like this idea. Just have a standard set of strings covering all edge cases (even the sprawling labyrinth that is bidi) with a visual reference that shows how the correct rendering of each string would look like. Each entry would also have a description of the problem and suggested solutions.

Unlike the solutions in OP, this one is pragmatic and is actually actionable for the vast majority developers. I'm kinda surprised that something like this doesn't already exist given the substantial amount of material and visual examples already available that covers the bidi algorithm.

- https://www.w3.org/International/articles/inline-bidi-markup... - https://www.w3.org/International/articles/inline-bidi-markup... - https://www.w3.org/International/articles/inline-bidi-markup...


I'm not sure I understand the idea. It sounds insane to me because I feel like there's probably trillions of combinations (and it would be insane to expect to be able to cover every specific example of incorrect text), and I thought the website was pretty clear and provided good examples.


As I understand it, the idea is just to make an Arabic test suite with enough examples (maybe a few dozen to a few hundred) such that if your program correctly renders all those examples, it’ll probably work fine with most Arabic text found in the wild. It sounds like there’s a lot of very broken software out there. Testing any Arabic input would be a big improvement for a lot of software.


> and can't tell apart ال from ل ا.

Who can't tell these apart? I know literally no Arabic - these characters look very much like latin ones I and J, and it's just an order thing.

It seems like an excellent quick test to me to see if there's ordering problems.


Exactly if you aren't comfortable with the idea that glyphs are distinct and order can be assumed to matter.

This article can't possibly be scoped for someone like that.

There's a side note here, that dyslexia can become apparent in very different written languages.[0]

In which case don't try and handle multilingual text pay someone else, even if that's on fiverr.

[0] https://blogs.scientificamerican.com/observations/its-all-ch...


> Exactly if you aren't comfortable with the idea that glyphs are distinct and order can be assumed to matter.

But isn't that exactly what it's just telling you? The order is important and if you see this very simple pattern it's wrong.

If you read any latin character based language then you surely must be OK with the idea that glyphs can be distinct? Are there many people who exclusively read languages where the order of symbols is not important?

> There's a side note here, that dyslexia can become apparent in very different written languages.[0]

If the point is that dyslexia means some people can't see the difference then that's fair, I'd not come at it from that angle. I don't see any surprising pre-assumed knowledge in this IJ/JI distinction however.


Yea we are in agreement.


Sorry, I'd not read the usernames and was trying to tie your response to the original comment. Makes a lot more sense now.


I don't think the intent is to fix every issue, but rather to tell you if you are using busted translation tools. The thing about left-to-right/right-to-left is something that a non-native Arabic speaker may not even know to look out for, even though it is a core aspect of the way the language is written.


Author was more making the point that one occurs commonly and the other order should never occur as the other order would require connecting the letters. The stick character does not connect to the left, but will on the right if that letter connects to the left.


I think the author is aiming for "You're making a prop for a film with Arabic text, or making a multilingual sign, are you doing it right?" rather than "You're writing a text editor, are you doing it right?"


If you can’t tell the difference between a stylized lJ and Jl you probably didn’t end up on that page to begin with.

I don’t think that is an unreasonable assumption on the reader


I think that's the point though? It's trying to educate us that we should pay attention to that difference as we can the block of text.


> This website is well-meaning, but will be difficult to parse if you don't have elementary Arabic and can't tell apart ال from ل ا.

That's a really weird comment. Just Ctrl+F ل ا in your supposedly Arabic text?


I agree. It doesn't take long to get to the author's thoughts on the most common cause: lack of subject matter expertise. That's cold comfort for those who are working against a tight deadline, though.


The whole 'two billion people can read the Arabic alphabet to some degree' is a weird claim. Maybe it's possible, but there aren't two billion people with functional Arabic literacy.

Most Arabic is written without small vowels (harakat). You could have script that is justified right-to-left and with letters correctly connected and it still be gibberish. And many of those 'two billion people' would be none the wiser.


>'two billion people can read the Arabic alphabet to some degree' is a weird claim.

Not really, because

>there aren't two billion people with functional Arabic literacy.

isn’t something that the author claimed. “To some degree” is a phrase that explicitly states that the author isn’t talking about full functional literacy.

It would be a super weird claim if “some degree” and “full functional literacy” meant the same thing, but they don’t! You would almost have to intentionally ignore the meaning of the words the author used and invent a nonexistent overlap of meaning to become confused on this point!


> It would be a super weird claim if “some degree” and “full functional literacy” meant the same thing, but they don’t!

They should at least be close if the author isn't trying to pump up the numbers in a misleading way. I definitely assumed that number would be close to the literate number, and not including people who can recognize a tiny fraction. Hell, I can read Arabic "to some degree" if we're being completely literal, but I think including me in a persuasive claim about people that use Arabic would not be appropriate at all.

> You would almost have to

No need to be rude.


This doesn't make any sense. Why would you want the meaning of "some degree of knowledge" and "full functional literacy" to be close? What's misleading about describing exactly what you're talking about? Who cares how many people are literate in Arabic in this context? What you care about is how many people know enough about Arabic to know that you don't know anything about Arabic.

You seem to be demanding that "people who know enough medical terminology and/or Latin to see through your fictional doctor" be nearly the same class as "people who are doctors," and what's more, implying that's some sort of deception.

> No need to be rude.


>They should at least be close if the author isn't trying to pump up the numbers in a misleading way.

I’m genuinely confused here. The author was pretty much crystal clear about how he defined the size of the group that he was talking about, I do not understand how a person could be confused let alone feel the need to accuse him of being intentionally misleading.

What exactly is the nefarious goal that the author was trying to sneak past you with his clever trick of speaking in plain english?


Suppose I said, "It's important to protect endangered species. There are 130 billion mammals on Earth."

It's plain English, but I picked the wrong measure to support my claim.


Your analogy is backwards. It’s more like the author said “This is good for all mammals” and you are the one that inserted “but there exists a smaller subset of mammals”, which is entirely orthogonal to the point.


The author's thesis is that Arabic text is important because two billion people recognize its alphabet. This fact is irrelevant because it's a proper superset of the group that matters: people who can read Arabic.

Let me try a different analogy. "It's important for caterers in the US to provide a gluten-free meal choice. After all, the population is 332 million!" Without knowing the incidence of gluten sensitivity, it's a borderline-misleading statistic.


I am glad that we agree that this article that referred to people with varying degrees of knowledge of Arabic script is not, in fact, about people that are fluent.

I also agree that the existence of a subset that wasn’t referred to at all in the article is completely irrelevant to the topic at hand!


The article strongly implies that the "some degree" group is the group that matters. But it's not. The group that matters is somewhere in between "some degree" and "fluent".

And the parent comment was not talking about fluent people when writing "can read".


Literally what do you think the words “some degree” mean to you personally?

Posters have been able to pinpoint the number of fluent speakers, can you give me a ballpark to how many people matter and how many don’t matter?

This article about rendering text appropriately has taken such a fun turn into sorting folks into groups that “matter” and “don’t matter”?

If being rigid about these numbers is so important, how many people that don’t matter today might matter next year? How many people are learning arabic script? How many might want to look up something written in arabic without it being rendered in absolute nonsense?


> Literally what do you think the words “some degree” mean to you personally?

I answered that in my first post! I can read a tiny tiny bit of Arabic. That puts me into the literal "some degree" group, but I am also definitely in the not-mattering group, because rendering mistakes with Arabic will not cause me any problems with reading.

> Posters have been able to pinpoint the number of fluent speakers, can you give me a ballpark to how many people matter and how many don’t matter?

Have they? But I don't have numbers, I'm just saying that "fluent" is too small and "some degree" is too big.

> This article about rendering text appropriately has taken such a fun turn into sorting folks into groups that “matter” and “don’t matter”?

Are you offended that I classify myself as not mattering in this very specific context? You don't have to make it sound like I'm saying people don't matter in general, jeez.

> If being rigid about these numbers is so important

If a number is worth busting out to make a point, it's worth being correct.

> how many people that don’t matter today might matter next year? How many people are learning arabic script?

What's your point? If the number changes, then use the new number. Don't use a wrong number because it might change later. Or if you have an expected future number, label it as such.

> How many might want to look up something written in arabic without it being rendered in absolute nonsense?

A lot of those people aren't even inside the "some degree" group, so now you're making a different argument. I'd rather not start any new tangents at this point, if you don't mind.


> What's your point?

My point is that you’re trying to use some sort of odd pedantic mark trick to shift the conversation from your experience of “There is a group that I personally don’t care about” to “Math dictates that this is not actually a problem worth addressing.”

Your position that the important takeaway here is actually the importance of scrutinizing pointless minutiae rather than text rendering being fundamentally broken isn’t empirically based. Your entire argument is “look at how clever I am!”, which is fundamentally off-topic when talking about rendering text properly.

Like lol, how are people supposed to learn the script if their examples are all messed up? As a maths genious surely you could see the issue with how “impacted people” is somewhere between “fluent people” and “fluent people plus an unknown number of others.” What hard number did you land at when adding unknown variable x to the number of fluent speakers you googled?


> Your entire argument is “look at how clever I am!”, which is fundamentally off-topic when talking about rendering text properly.

I think this is a really uncharitable read of this conversation. This thread has been about the veracity and the relevance of the author's claim that "two billion people can read Arabic to some degree".

I don't think anyone is trying to refute the author's conclusion that Arabic text rendering is important. I also don't think anyone is trying to show off how clever they are.

Personally, I agree with the author's conclusion, and I thought the post was really neat! But I also think the 2 billion statistic weakened their argument -- it's better to omit a statistic than include the wrong one.


> This thread has been about the veracity and the relevance of the author's claim that "two billion people can read Arabic to some degree".

This is not really true. You tried to center your conclusion that your math was better than the author’s math while distracting from the topic of rendering text properly.

This thread has been about you insisting that people listen to your math and not discuss rendering text properly. lol this thread has been about how clever you are, _not_ rendering text properly.


It's not about the math at all, just "this seems like the wrong group to use as an example". It's a simple point, nobody is trying to show off.

And in these comments I'm assuming that the author has exactly the right number for the group they cited. Because it's really not about math. I have done no calculations and trust the number given. I just think they're citing the wrong statistic. That's why I'm also uninterested in the factors you mentioned that might influence the number up or down. The actual number doesn't matter for this criticism: even if the number in the article happens to match the right statistic, they're still citing the wrong statistic.


> pump up the numbers in a misleading way

Is it misleading? You don't have to be anything like fluent to realise when text rendering is broken. The quantity that is actually relevant to the discussion is the number of people who, when they look at your UI, will know that the arabic text rendering is broken; not the number of people who are fluent in arabic.


The only reason for me to care that it's broken is on behalf of the people that can read it.

So even though I know a single digit number of words, and you can count me in that two billion, nobody should care about getting it right on my behalf.


>The only reason for me to care that it's broken is on behalf of the people that can read it.

The people who can't read it, but who can see that it's broken, will form a lower opinion of your product. It's as if I went to a Polish website and the text was all right-aligned and in all caps. I can't read Polish at all, but I'd still form an opinion about the quality of the site.


I'm not sure if you were trying to disagree with me, but I agree. They will form a lower opinion, but they form that opinion almost entirely because of the people that actually should be able to read it.

If you screw up a language that has 0 readers, it matters far less.

The point of saying how many speakers there are was to increase the strength of that effect. Because of that, it's misleading if you pump up the number. For pumping it up to not matter, the number would have to not matter, and there wouldn't have been a reason to mention it in the first place.


There is a difference between literacy and fluency.

Literacy is ability to understand a writing system. I am literate in the Latin alphabet.

Fluency is ability to understand a language. I am fluent in English.

Neither implies the other. You can be fluent but illiterate (the default until modern universal education), and literate but not fluent (I am literate in the Latin alphabet but I am not fluent in Italian).

The claim that there's ~2 billion people who are literate in Arabic script and will laugh at you if you get it wrong, is more or less true. It's of course referring to the large number of people who can read from the Quran in Arabic but without understanding all the words.


2B still seems like a stretch. Can 2B people read (even with low comprehension) a basic paragraph in Arabic? My SWAG would be 1B people.


A quick Google search seems to indicate that there are 1.9 billion Muslims in the world. It makes sense to assume they know enough Arabic to recgonise at the very least common religious set-phrases like the Bismillah or the Shahada. In my book this counts as "some degree of Arabic script literacy"


You'd have to exclude everybody under 7 or so, everybody who is illiterate (in this very expansive sense), and those that are Muslim but do not read Arabic and do everything in translation.

That might be a small group though and probably outweighed by all the non-Muslim Arabic readers (for instance I work with 2 Egyptians, one is Coptic and one is ex-Muslim and both can easily read Arabic).


If it's Quranic Arabic then the orthography includes vowel markings so vocalising it is considerably easier.

The number of people who can vocalise Arabic without the vowel marks (e.g. a newspaper) is considerably less. But you don't need to be able to do that to notice any of the errors in the article.


Yeah, I think we're talking here about the number of people who will know you messed up rather than the number of people who can actually read the text in your app.


Exactly. That would be just a subtle exaggeration based on a pseudo-claim, to some degree.


So, there are degrees of literacy involved, and what we're talking about here is one step up from the bottom layer, a recognition of the writing system, not of a specific language.

I can muddle along somewhat in French, especially written French, but I have essentially no Polish despite working with more Polish people than French. Nevertheless, Polish is written in a Latin script (for about a thousand years) and so I "can read Polish to some degree". If you show me a Polish street address, and then some street signs, I can spot when the sign matches the address, because I understand that symbols which are slightly different just mean the same thing. If you give me Polish mirror writing, I know it's wrong - it's backwards, even though I don't understand it.

If I attempt this in China it won't work, because I don't understand the Han script, so I am not sure whether a symbol I'm seeing is the same symbol written more or less ornately or an entirely different symbol which just looks somewhat similar. Are the symbols backwards? Or maybe they're different symbols which just look backwards.

The layer beneath this by the way is recognition that intent exists. If you show me Chinese text I not only can't read it I'm not sure which symbols are "the same" and which are not, however I can immediately tell this is writing. The writer intended to convey meaning with these shapes, perhaps I can find somebody else to translate them for me. Whereas say, the pattern on my duvet is just a pretty pattern, it doesn't mean anything (yes, I have thought about this, no it isn't a secret messsage) and so I can't get that "translated".


> If I attempt this in China it won't work, because I don't understand the Han script, so I am not sure whether a symbol I'm seeing is the same symbol written more or less ornately or an entirely different symbol which just looks somewhat similar.

Just for fun:

Same symbol: 龙 龍

Completely different: 已 己


> nevertheless, Polish is written in a Latin script (for about a thousand years) and so I "can read Polish to some degree"

No, reading and being able to decipher script is different. I can read portuguese, spanish or italian (even romanian) to some degree because the languages are close enough to each other but I can't read polish, basque, finish or magyar despite them using the same latin script


I don't understand much Arabic, but I know the alphabet almost as well as the Latin alphabet. If you see something in Latin along the lines of "svciwbc oaoøoaaö", and claims to be Gaelic or whatever, you'll know that something is off - the consonants and vowels just don't work that way.

It won't fool the people who can speak the language, but I think the website is just designed to educate people so that it doesn't look like complete nonsense.

Facebook is surprisingly an offender here. It's common to mix both Arabic and Latin, say, begin with بسم الله and then write the text, sometimes in english, and it'll throw off the alignment of the text completely. You get the Latin words right aligned or the Arabic one left aligned.

Edit: I'm actually quite surprised how well HN handles this.


HN - thankfully - sends text (mostly) unmolested to browsers which generally tend to handle arabic substrings correctly (assuming the page encoding supports it).

HN /could/ make it better by setting css auto directionality for all <p>s, but that would be antithetical to its goals as a English-written forum.


It's an approximation of how many Muslims there are in the world, making the generous but not entirely incorrect assumption that they will be familiar enough with Arabic to at least be able to tell if you've completely borked your rendering.


I would be careful linking "Muslims" and "Arabic readers/speakers".

I know Muslims who would be challenged to speak more than 3 words of Arabic, as well as Arabic speakers who are Christians or atheists.


i think every muslim can speaks more than 3 words, every muslim known what al fatiha is. that's 25 words.


I'm completely ignorant of Muslim culture but there was a time where Catholics recited prayers and liturgy in latin and plenty of practitioners recited the sounds without really understanding them.

Source: my grand aunts saying "arapreme" where "ora pro nobis" would go (the former kinda sounds like an Italian word). Also probably related, "hocus pocus" is the "magic" expression "hoc est corpus" ("this is the body [of Christ]").


Learning how to read and pronounce arabic properly is part of the religion. As all prayers etc are in Arabic and you are supposed to recite them in arabic not a translation and wrong pronunciation can change the meaning so they are strict about teaching proper pronunciation. Though I agree with you about the 2 billion number being incorrect but it was being used a turn of phrase not an accurate number.


Not everywhere. I live in an islamic region (around 40%). Religious quotes in every other home, but maybe 2-3% of people know at least some arabic letters, not to mention words. You learn it only if you learn at madrasah or are hardcore muslim (not as in extremism, but as in taking it seriously). After you learn it, you have to listen to popular interpreters, because Quran is still hard to get right on your own.


I grew up in a Muslim environment and I know al fatiha and more. But I don't understand a single word of it or can't tell if what I recited is a word or a whole sentece. I believe 99% of the population in my country are the same. Also we can't read Arabic script.


What country did you grow up in? We all don't speak arabic but can fluently read the script.


Turkey


In my surroundings I would say roughly 90% of people reciting al fatiha do so based on sounds learnt by heart, they would hardly be able to explain the meaning of these individual sounds.


The only assumption is that Muslims may, on average, be able to "read the Arabic alphabet to some degree". Obviously 'some degree' makes it very variable, does not imply being able to really read/write/speak.

I don't know if it's true but that does not seem unreasonable.


> I don't know if it's true

> that does not seem unreasonable.

My point is that what seems reasonable to someone that doesn't know anything about the subject means nothing, and could even end up being somewhat offensive.

Try to reverse the logic and realize how absurd it would look to you: say a guy in middle east wanted to roughly estimate the number of Latin speakers and made the assumption that it should roughly be the same as Catholics.

Wikipedia says there's ~2 billion muslims, and 400 millions Arabic-dialect speakers (native and non native). So on average 20% of muslims are able to understand basic arabic, I expect that not even half of those would be able to read and understand classical Arabic such as written in Quran.


Again, the assumption was not that Muslims are Arabic speakers...


if all catholics were taught the bible in latin, you could make the assumption (true or false but reasonable) that all catholics would be able to read latin to some degree.


Coran is traditionnaly taught in arabic even in non arab countries. Also I have no idea how these country can read or decipher arabic in general, I can guess where these number come from


Funny enough there are many Muslims who can read Arabic but have almost no idea what they are reading as they don't speak it. The Quran is interpreted into many languages but prayer is read in Arabic no matter what their native language is. Reading the Quran is also preferrably done in Arabic, as translation is open to the interpretation of the translator, so the only truly immutable word is read in Arabic.


Interpretation of Arabic itself is the least problem they have. It takes a well-educated interpreter to make correct sense out of most statements.

With few exceptions, Islamic revelations do not state which Quranic verses or hadith have been abrogated, and Muslim exegetes and jurists have disagreed over which and how many hadith and verses of the Quran are recognized as abrogated,[6][7] with estimates varying from less than ten to over 500.[8][9]


> The whole 'two billion people can read the Arabic alphabet to some degree' is a weird claim

Not a weird claim at all. Even without harakat, it is obvious when Arabic script is messed up (via alignment or letter reversal) to anyone who can read the script. It just makes whatever you are reading look janky and unprofessional.


It's weird, because the number is off by an order of magnitude. "If you count all of the varieties of today’s Arabic together, you can safely estimate that there are about 313 million Arabic speakers in the whole world, making it the fifth most-spoken language globally behind Mandarin, Spanish, English and Hindi." [1]

[1] https://www.babbel.com/en/magazine/how-many-people-speak-ara...


Arabic is both a (group of) language(s) and an alphabet/script. The page isn't claiming that 2B people speak some dialect of the Arabic language, it is claiming some 2B people have a degree of literacy in the alphabet/script, so a quarter of the world's population will potentially know if you've screwed up your website/app/game/whatever. You don't need to speak Latin to know that "anunmtMh mucjmydDDDDD" is just the result of me mashing random keys on my keyboard, because you can read Latin letters. Likewise no Mongolian speaker would believe "прнтзх йоэЁюи" is proper Russian, even though those two languages are wildly different.


Also I cannot read Arabic and know enough to see that something is wrong if it's LTR.

Now I know:

> 3. In general, the letter combination ال should be common. The combination ل ا cannot occur in Arabic script, as those characters should be connected.

I still cannot read a jot of Arabic and know enough to identify some cases where the dev/writer has gotten it wrong.


The writer most likely count the moslem population. Most moslem don't speak arabic but they recite quran (written in arabic) daily. Most of them will notice if a ال written as ل ا


Egypt has ~100 million people, the vast majority of whom are going to be native Arabic speakers. Algeria, Sudan, and Iraq also have ~40 million people each, again, most of whom are native Arabic speakers (of different varieties). Morocco, Saudi Arabia, and Yemen are no slouches either. In short 300 million native speakers of various Arabic languages is extremely plausible.

FWIW, the usual estimate for worldwide adherents to Islam is ~1 billion.



Those estimates are not for Arabic readers or speakers. They are for Muslims or followers of Islam.


FWIW, I am not a muslim nor a native Arabic speaker but I "can read the Arabic alphabet to some degree".

Though if your numbers are correct, 2 billion sounds way too high


Most Muslims are taught how to read Arabic script, even if they can't understand or speak Arabic. That is because reading the Qur'an in its Arabic form, even without understanding, is a sacred act above reading the translation. Additionally, ritual prayers require reciting verses in Arabic, (again without necessarily understanding). Therefore it's not weird to assume the number of people familiar with Arabic script include the population of non-Arabic speaking Muslims.


> You could have script that is justified right-to-left and with letters correctly connected and it still be gibberish.

That just simply means you can't read arabic. You absolutely do not need to vowels to be able to read arabic, if you know the language, you know the vocabulary and know how a word should be pronounced. The vowels are there to aid in clarity. In most cases one single vowel could outright turn a word into its own antonym. And right now, 99% of arabic text is written without vowels apart from Quran and literary works.


Let's assume there's a correct "brand" for Coca Cola in Arabic (I have no idea if there is, but I'm sure there's some large brand that was localized) - there will certainly be a very large number of people who recognize that brand name, even if they cannot read, and will recognize it if you break it, just like someone with no literacy at all can recognize something is broke if they see a coke can labelled LOCA CACO.


This may be referring to people who can "read" the Koran, where "read" means say the text out loud. Understanding what they're saying is a different question, and most of those people probably do not understand the Koran's language (which is a more than thousand year old version of Arabic).

Alternatively, it may be referring to people who read a language written in Arabic script, but not necessarily the Arabic language. Languages that use some variety of Arabic script include Persian (Farsi and Dari), Urdu, Pashto, Western Punjabi, Uyghur and some other languages. Some of those languages use vowel diacritics, particularly Uyghur.


>most of those people probably do not understand the Koran's language (which is a more than thousand year old version of Arabic).

My understanding, as a non-Muslim who doesn't speak Arabic, is that the standard Arabic is pretty conservative such that MSA, as would be commonly heard on TV or read in the news or other more formal settings, is not very far off the Arabic used in the Quran. So I think most native Arabic speakers would understand the Quran well, but may not be able to fluently speak or write it.

And that many Arabic learners learn MSA along with a dialect, so many second language speakers would probably be able to productively read the Quran as well.


Most native Arabic speakers will be able to understand large chunks of the Qur'an. There are some words / phrases in the Qur'an that wouldn't be used in modern Arabic that might prove challenging for those who don't engage with the Qur'an on a regular basis, but the Arabic of the Qur'an is very close to that of MSA.

To most Arab speakers, both Classical Arabic and MSA are simply referred to as Fusha.

There are also many non-speakers of Arabic who understand many verses and can pick up on the gist of many verses in the Qur'an.


MSA is Modern Standard Arabic to save others a google!


> Photoshop breaks Arabic. We even have a website that will break our Arabic so that Photoshop breaks it back to normal. Yes, Adobe is aware. No, they do not care.

Photoshop since time immemorial had a Middle East edition (ME) which supported RTL and used to curb piracy

I don’t know how it’s with their new pseudo-sass but I think they integrated it into their main product


Photoshop relatively recently consolidated its text layout into a single layout engine (based on HarfBuzz) and Arabic support no longer an opt-in (which was the source of the trouble, you had to know you need to opt-in Arabic support before installing the application)

https://helpx.adobe.com/lv/photoshop/using/unified-text-engi...


Oh man. I remember from 15 years ago or so when I used to struggle with arabic script, would often resort to writing in another software and exporting it to PNG or EPS in order to use it in PS. Luckily, there was a 3rd party add-on that fixed that issue.


I remember the student union at the university of Edinburgh having massive welcome signs printed on a wall in all languages, and the Arabic one very clearly had the mistake in the points 2 and 3. Printed on a wall. It was very clear to me even though I don't speak a word of Arabic.


Mistranslations are always a source of amusement

http://news.bbc.co.uk/1/hi/7702913.stm

> When officials asked for the Welsh translation of a road sign, they thought the reply was what they needed.

> Unfortunately, the e-mail response to Swansea council said in Welsh: "I am not in the office at the moment. Send any work to be translated".


Wouldn't it make sense for translators to send messages like that in multiple languages?


At the end of the "It's a small world" ride at Disney, FL, there's a sign in Zulu that translates to "goodbye". But in Zulu, how you say goodbye depends on whether the person you're saying goodbye to is staying or going. The sign at the ride says goodbye as though the rider is staying (implying that the ride is leaving).


The core issue is people without familiarity with how non-native (to them) script can function - in the English centric world it means assuming one letter == one visual glyph, and the visual glyph won't get a different rendering depending on context. Native English speakers don't seem to really acknowledge that the English alphabet does have contextually different glyphs for the same letter: that is what capitalization is (and applies to other latin descended scripts). Of course single character = single glyph equivalence fails for many latin descended scripts 'ss' vs 'ß', 'ij' vs 'ÿ'. This is entirely ignoring accented letters which break anglo-centric "single code point == single character" equivalence (though this is more understood).

People often think text layout is "easy" because they don't consider how anything other than their common experience treats thing (the reality is there still isn't a good vertical text layout story on the web).

This is also demonstrated whenever people go "I'll handle text layout myself", because they think one key press = one character, and immediately break text entry for more than half the world.


Another example of a contextually dependent glyph in Latin script is the non-word-final form of 's' that looks sort of like an 'f' in English texts. See for example images of the hand-written US Constitution, where this comes up in the very first word "Congress", which appears to modern readers to be spelled "Congrefs".

This led to an embarrassing mistake when Google OCRed older books that contained the word "suck."


ah but you see, my conviction on one letter == one visual glyph context be damned is not hypocrisy nor is it cultural chauvinism. i for one am perfectly consistent. capitalization ought be abolished, tradition is not justification. language is about the encoding of information. capitalization is wasted code space. tradition is dead men telling you how to live. and after that, simplify the letters down into combinations of the same handful of strokes and curves. do ve really need letters vith tvice the horizontal space as the others? vhv not do avav vith all the letters that reach under the botton line? oet rid of bointu diaoonal lines too! eueru thino can and should be uniforn. doun uith the excebdions, consisdencu breuails! see hou nuch nicer this loohs. the uau it rebeats the sane feu elenets ouer and ouer uet is still leoible. it flous snooth lihe budder. uho needs all those letters anuvay?


Completely seriously I'd like to see English get rid of capitals. They're almost entirely useless and confuse learners for no benefit. On top of that kids books that use a font that makes I and l look the same annoy me too.


i an conbletelu serious too


For example, in Year 1 that useless letter "c" would be dropped to be replased either by "k" or "s", and likewise "x" would no longer be part of the alphabet.

The only kase in which "c" would be retained would be the "ch" formation, which will be dealt with later.

Year 2 might reform "w" spelling, so that "which" and "one" would take the same konsonant, wile Year 3 might well abolish "y" replasing it with "i" and iear 4 might fiks the "g/j" anomali wonse and for all.

Jenerally, then, the improvement would kontinue iear bai iear with iear 5 doing awai with useless double konsonants, and iears 6-12 or so modifaiing vowlz and the rimeining voist and unvoist konsonants.

Bai iear 15 or sou, it wud fainali bi posibl tu meik ius ov thi ridandant letez "c", "y" and "x" -- bai now jast a memori in the maindz ov ould doderez -- tu riplais "ch", "sh", and "th" rispektivli.

Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.

from history, sometimes attributed likely incorrectly to Mark Twain


> The core issue is people without familiarity with how non-native (to them) script can function - in the English centric world it means assuming one letter == one visual glyph, and the visual glyph won't get a different rendering depending on context.

English has cursive and I think most English speakers are at least aware of cursive. We have script that is designed for print as well, though, while Arabic script does not; it is always cursive.


Some have developed non-cursive Arabic script, but it hasn’t taken off.

For instance, the Simplified Arabic Alphabet was devised by Muhammad Shakeel as an alternative way to write Arabic. It is a non-cursive alphabetical script as opposed to the traditional cursive Arabic abjad. The letter shapes are based mostly on the early Arabic Jazm script. It is not connected to or inspired by Nasri Khattar's Unified Arabic script.

https://omniglot.com/conscripts/saa.htm


Interesting. I think it probably hasn't really taken off because there's not really any reason for it to at this point. We have high resolution displays and printers and software that is powerful enough to render and input regular Arabic script just fine.

There were similar issues with typing JP/KR/CN glyphs on computers for a long time which thanks to technological progress has stopped being something that needed solving.


What about farsi and other languages that use arabic script but not arabic at all? I can write (with effort lol) indonesian or javanese in arabic script https://en.wikipedia.org/wiki/Jawi_script.


Farsi has the letter "p" which Arabic does not. That letter has three dots on top (sometimes drawn like ^. I think there may be another. Arabic only has the "sh" sound with the three dots.

Trying to read Farsi, it feels like I should know what's going on but am left with the feeling that I've forgotten all my Arabic. Then I'll see some of the bonus letters.


There are four letters in Farsi that do not exist in Arabic - (گ ژ چ پ), which make the 'p', 'ch', 'zh' and 'g' sounds, respectively. But the underlying calligraphic system (RTL order, joining forms, harakat diacritics, and so on) is pretty much the same across the Arabic script and its descendants.


The same issue also exists in Latin script, where German has the ß, not to mention various umlauts, strikes, circles and cedilles modifying letters of the otherwise standard Latin script.


I'd imagine it's more like seeing Vietnamese as a European: https://en.wikipedia.org/wiki/Vietnamese_alphabet

If you live in Europe and speak a language using a Latin script, you probably have come across most of the extensions other European languages add to the shared base in loanwords or foreign media. But then you look at something like Vietnamese and you are no longer sure how letters work.


> Trying to read Farsi, it feels like I should know what's going on but am left with the feeling that I've forgotten all my Arabic

This is what Dutch sounds like to me as an English speaker - plenty of common sounds with English; it has a similar speed, rhythm, intonation to English. It feels like I’m hearing English but have lost my faculties to parse it


Yes! It’s kinda like “hearing voices’. It sounds like English, but if you try and ‘tune in’ you can’t!


I know some Spanish and hearing Portuguese does this to me every time; it's like I've almost tuned in but there's nothing there.


I'm Spanish and I get that feeling with Romanian. It sounds somewhat familiar, but I have no idea what is going on.

Portuguese and Italian on the other hand are much closer, and if spoken slowly enough, somewhat understandable.


As word games, I like sentences which sound valid in multiple languages, regardless of if the meaning is changed.

"Goedemorgen, ik hoop dat je bent goed".


Always fun are sentences that sound very similar and are entirely correct in both languages but mean something completely different, like:

He was in the war -- Hij was in de war (he was confused)

A stiff in the brook -- Een stijve in de broek (a boner in the pants)

Those are the two most famous examples I'm familiar with, but I'm sure there are a lot more.


Oh wow, I've encountered a lot of words in English/Czech that are "false friends" but it's too far away gramatically to construct similar sounding entire sentences. That's brilliant you can do it with English and Dutch :)

Also I wonder if there's a connection between trousers being "broek" in Dutch and "breeks" in Scots.

edit: wow ok I should've just went to wikipedia: https://en.wikipedia.org/wiki/Breeks


At a NATO spy convention, an English spy asks their Estonian, French, Spanish, German, and Bulgarian counterparts if they can see them in the new camouflage they are testing.

"Jah" "Oui" "Sí" "Ja" "Da"


An ex had the smaller example of incorrectly asking for "un préservatif" when she meant "un préserve".

One of her friends was half of a multi-nationality couple, I think it was French and Irish, and the punchline was their kid, at a beach, yelling, in a strong Irish accent "Look mummy! Phoques!"


If by "un préserve" they meant a preserve/marmalade then in French that's "une confiture" (or "marmelade" but it's less somewhat rare). I don't think "un préserve" is a word...


Although I may be misremembering the exact word as I have a GCSE grade D from 23 years ago, her French is much better than mine as she lived in Paris for a few years.

It was certainly something close to what I wrote.


> Trying to read Farsi

Wait till you discover Urdu, which confuses Farsi speakers even more. For extra fun try the Nastaliq script.


"p" has three dots beneath it (پ), three dots on top is "s" (ث).


Thanks, thought it was 3 dots on the Arabic "b". It really doesn't take long until you Farsi tries to sneak in some extra letters to figure out it's not Arabic.

Looking back on it, I remember feeling like I can't remember Arabic, but part of it is that this also happens during that time when I'm getting used to the script. There is always an adjustment period with every new font/handwriting that takes a sentence or two to sort out the style before I truly start reading.


Point 3 mentioned that:

3. The text is in the wrong Arabic-script language, for example Farsi in Egypt (Egyptians primarily speak an Egyptian Dialect of Arabic), or Modern Standard Arabic in Afghanistan (Afghans speak Dari, an Afghan Persian language). Comme l'alphabet latin, vous pouvez écrire différentes langues avec l'alphabet arabe.


That's actually what I expected the website to do: Is this Arabic vs. Farsi, etc?


this applies to those languages as well


Mostly yes, and I think the OP is well-written.

But note that in some cases, eg https://en.wikipedia.org/wiki/Tajik_alphabet#Samples point number 2 won't be accurate. I'm not the best reader of Tajik written in (Perso-)Arabic script, but I don't believe "ال" appears in the samples there. (It does appear in the word "ALphabet" in the opening paragraph!)


And also Aramaic dialects like Syriac

Syriac script looks completely different to Arabic but has the same issues when rendered


The first two correct examples are "Hello world". Third one is "language".

The stick character is either a short "a' or "i", which you would know from the little tick marks above/below that you see in decorative script, but are generally left out in print where you get it from context.

The J looking character is an "L". So they make "al" like al-jabaar => the strong. It's 95% like "el" in Spanish or "il" in Italian, though it is genderless.

The o with dots attached or not to the word on the right indicates feminine gender.


Incidentally you can find the "al" in many words of Arabic origin like alchemy, algebra, algorithm or alcohol.


The question should be more precise. Is this Arabic script?

There's a half dozen languages at least that are not Arabic but use Arabic script, such as Persian or Urdu. The typographic rules mentioned still apply though.


Urdu (and IIRC Western Punjabi, i.e. Punjabi as spoken/written in Pakistan) prefer to use the Nasta'liq variety of Arabic script, and historically Persian (Farsi) was written in Nasta'liq. And it is not the case that Nasta'liq is written on a base line--in fact, characters in a word tend to slant down to the left.


One of the three rules on the page do not apply to either Persian or Urdu. The article “Al” isn’t used in them, so the third rule doesn’t apply.


I also thought it might've been about the differences between them, given that Pashto has its own characters unique to its script and Arabic has diacritic marks not used by Farsi or Pahto. This is a different subject, though, that has little to do with what the post is complaining about. Their grievances apply to all scripts based on Arabic.


I once had to implement Arabic support in an existing editor. It also had to support mixed RTL and LTR input. Interesting to see how the cursors walks differently through the text when going going forward and backwards and how text selection works. Most document editors (MS Word, Libre Office with CTL, for example) and browsers implement this correcly as well. So, you can just compare if a text is displayed correctly by comparing how it looks in a document editor or a text box in an HTML file.


You've never had to worry about copy-pasting destroying your text between two programs

Actually, yes. Yes I do.

Try having to work in documents that trade text in multiple languages between Adobe Acrobat, and Microsoft Office. Add in some opinions from iOS, and I end up pasting text into a blank ASCII file just to get it back to basics so I can send it to the next program because neither Microsoft nor Adobe can reliably handle the macOS standard Command-Shift-V to paste text unformatted.

I can't imagine the disaster that would await me if i also has to do it in Arabic.


This reminds me of an Arabic client who wanted both English and Arabic version of the app. I sent him a spreadsheet with 2 cols, with English on one column, and another empty column where I expected him to fill the Arabic translation.

He replied with a plaint .txt file with a couple of Arabic lines in it, with a message saying "I translated some of those text from your excel sheet, please go ahead and use this text, I will send the rest by tomorrow". Nobody understood which translated to which.


Also add "use the correct Arabic for your target audience". "Arabic" is not monolithic and using the wrong dialect can go over as well as using Scots on a web page intended for Jamaicans.


As an Arabic speaker, baby steps!

For spoken Arabic, a decent placeholder is either MSA (standard Arabic) or a generally understood dialect (Egyptian or Levantine).

The latter is what a lot of the recent Arabic game dubs have been doing (e.g., Ubisoft). Another example: Pixar and Disney use Egyptian Arabic for their dubs.

Edit: Oh, and for written Arabic, you should almost always be using MSA. That is, you shouldn’t worry about the dialect of your target audience.


I pity the poor dev who has a Moroccan co-worker that helpfully offers to translate their application into darija thinking it’s Arabic


Tangent: I hate how Moroccans monopolized the word darija to mean "Moroccan Arabic". Every country in North Africa calls their own language darija!


Do you know why levantine arabic is more used for dubbing ?


My understanding is that it’s simply because they were among the first to setup dedicated dubbing studios.


>Bonus 'I hate it' points if it's that Arabic-looking font that's just English text that looks kind of Arabic.

There's a faux-katakana font someone used for the titles on an otherwise amazing album of some FM synth Touhou remixes[0] that hardcore fucks with my brain.

[0] https://www.youtube.com/watch?v=24x3AL7yMX0


There's a whole subculture of using Cyrillic to make "Russian-sounding" things but it's just English with reversed letters. Hilariously amusing to anyone who knows any Cyrillic language.


Probably a good place to ask: I want to add an arabic translation to a webpage I operate. My text editor is a bit confused by the unicode symbols. Obviously the text editor displays left-to-right no matter what as it's made for programming.

What's the right way to handle this?


This might provide some insight for setting up your editor [1]. W3C has guidelines for localization [2] and for using RTL script online [3].

[1] http://andreasmhallberg.github.io/typing-arabic-in-vim/

[2] https://www.w3.org/International/questions/qa-html-language-...

[3] https://www.w3.org/International/questions/qa-html-dir


If the editor only has trouble displaying Arabic and does not somehow mess with the unicode representation then you should be fine. Ie you could paste an Arabic string into a JSON file, it might look like shit, but should still render correctly in your app or website (assuming you set the text direction or language properties right)

And you can always double check in a different editor to be sure nothing gets messed up. I use VS Code which, likely because its browser based, seems to get it right as well.


I recommend using gedit for this use case (available for linux and mac) it has the best support for arabic, right to left script and bidi (that's mixing rtl and ltr in the same sentence) I've ever seen


Kate (KDE's editor) is also fine.


also I would be happy to help you support arabic in your webpage. send me an email slim@pirate.tn


The main thing here is knowledge of the language. Given how widespread Arabic is (though perhaps not so much in the gaming ecosystem which is Rami's area), it makes sense to have someone on the team knowledgeable about the language while doing translations.

I encountered a (sort of) similar problem while rendering Devanagari text and wouldn't have realised If I couldn't read the text https://stackoverflow.com/questions/44254171/devanagari-text...


> it makes sense to have someone on the team knowledgeable about the language while doing translations

Speaking about translations, I had. Laughable experience when I was experimenting with Google translate API a few years ago, when I picked some random text about solar system from Wikipedia that reads something like “Mercury is a planet of the solar system”, when translated to Arabic, it read “الزئبق هو أحد كواكب المجموعة الشمسية” , apparently, it was not be able to infer the meaning of “Mercury” from the context.


When I translate that with safari it comes back as ‘mercury is one of the planets in the solar system’?


In Arabic, Mercury (planet) and mercury (element) are two different words (عطارد and زئبق), here the machine translation is using the wrong mercury for the context.


This is the problem with machine translation, often a round trip gets you back something that looks reasonable.

Some of them let you click on each word and see which it picked, but that doesn't always work.


"Quicksilver is one of the planets in the solar system."


Although I’ve never tried it, I can’t imagine it would be that difficult to find an editor for the specific language to take a look at it. I doubt they need that many hours of work from the editor for it to be a significant expense.


I hope someone creates something similar for distinguishing southeast asian scripts. Would be very useful for GeoGuessr


A good start is https://en.wikipedia.org/wiki/Wikipedia:Language_recognition... although it's pretty weak on Southeast Asia in particular.


That is interesting. I happened to notice that some described ligatures under Dutch aren't appearing for me for some reason. I've been thinking about font rendering in different scripts lately since I like the DejaVu fonts but unfortunately the project was abandoned a while ago and coverage is incomplete in many areas. With other more complete fonts available these days it seems likely to be better to use a more restricted version of the fonts to allow fallback to the more complete scripts, although I'm not sure how well the unicode blocks match actual usage. It is easier than ever to mix languages but still hard to know if you are messing them up. I found out my terminal was rendering Japanese wrong when someone posted this link recently:

https://heistak.github.io/your-code-displays-japanese-wrong/

Now it displays Chinese wrong.


Do you mean a script like Lontara? That would be interesting indeed.

I am interested in Lontara and Bugis resources in general, if someone has something they would like to share.


I'm not sure how popular these are, so I guess if the explanation included that fact it'd be even better!


I wrote a note in GMail the other day where I needed a Latin phrase. For accuracy I pasted it from the web page where I found the original. It screwed up the “x height” of the text that followed. And offended my sense of correct typesetting. I had to do the paste via third app maneuver to get my email right. So this is just QoI between two European-biased applications using two European languages that use the same basic writing. So I am disappointed by poor Arabic handling but not surprised.


Hold shift while pasting (chrome) to paste the text only, no formatting.


Or just copypasta through Notepad (or your favorite Linux equivalent) for a universal formatting-stripping solution.


I don't know about "just" for a solution that more than doubles the number of steps.


It's a well-known, time-honored solution that many people have down to muscle memory now, and has the benefit of not relying on the destination app supporting "paste without formatting" functionality.

That is, the best way to deal with apps being too clever about rich text, is to never give them rich text in the first place.


The gp is quibbling about the overuse of the adjectives "just" or the more common "simply" to qualify multiple actions, which are often not simple.

See

Avoid "Just" and "Simply" in Documentation http://jackkelly.name/blog/archives/2019/09/20/avoid_just_an...

Don’t say “simply” in your documentation by Jim Fisher https://www.knowledgeowl.com/blog/posts/dont-say-simply-jim-...


for a faster universal solution in windows, press win+R and paste, copy from the run dialogue, and for a faster universal solution in linux make a script that utilizes xclip and bind it to a custom shortcut


Win+R is for single-line text, though if you have a browser focused, you can use the address bar for this too - press CTRL+L to focus it, then usual CTRL+V CTRL+A CTRL+X.

Notepad is for multi-line text.

EDIT: While the address bar trick may be slightly more convenient than Win+R (and the easiest thing to do on Linux), keep in mind that whatever you paste there will likely end up recorded on someone else's servers. I bet this was a major driver for integrating previously separate address bar and search bar, and adding "search suggestions" function...


I have notepad pinned to taskbar, so I can do "Ctrl-C, Win-Shift-6, Ctrl-V Ctrl-A Ctrl-C Ctrl-W n Alt-Tab Ctrl-V"

Simple!


That’s what I meant by “the paste via third app maneuver”.


That should be the default. I think it's incredibly strange to copy text including the formatting, as that is far more context and application dependent. I almost never want to copy formatting.


It's particularly silly that it pastes-with-formatting into a gmail compose box even when it's set to "send as plain text"...


Ctrl+Shift+V to paste without formatting works almost everywhere, not just Chrome


Except Microsoft Word, which for some god forsaken reason demands you do Ctrl+V, Ctrl, T.


Nice to have some smoke tests, but wondering about minor details in the first things always undermines credibility for me quickly:

> The text is rendered left-to-right, instead of right-to-left. siht ekil daer ot gnivah ekil s'tI!

But the text (the arabic one) is not rendered in different direction bad vs good version? So this is just alignment? I can understand this still is very annoying and triggering, but not as bad as needing to read the other direction?!


The image at the top only talks about alignment, the point you quoted is separate from that and talks about when it’s actually rendered reversed which does happen


What text are you talking about?


Note that:

1. There are several languages which (sometimes) use the Arabic script, but are not Arabic, like Kurdish, Turkish up until a century ago, Javanese etc.

2. Farsi sort-uses the Arabic script, with some glyphs unique to it

... so things can be more complex than just deciding "Is it Arabic?" . But I like the website!

----

Right-to-Left languages with non-Arabic script are also often similarly mis-rendered. I'd given the more popular scripts as an example, like N'Ko and Adlam, but almost nobody in the West bothers to print those, plus I'm not familiar enough with them; so - Hebrew:

In the disconnected-form script, here's Hello:

שלום עולם

(sounds like: shalom)

do you see the squarish glyph at the end of each of these words? That's the final-form Mem consonant. But in the beginning or the middle of the word, its glyph is מ. So if you see:

םולש

that means someone printed it in left-to-right order of glyphs instead of right-to-left.

-----

These issues come up a lot in our LibreOffice RTL languages user group on Telegram:

https://t.me/+GY3UwBnlDN9mY2M8

and if you're particularly interested in Arabic, there's the Arabic-issues-specific group:

https://t.me/LibreOffice_Arabic


Crazy how there’s “two billion people” who can read Arabic and seemingly 0 of them are making serious contributions to Arabic text rendering except this one specific guy.

Weird.


Two billion people can read it to some degree because of Islam. A minority of them actually use the script to write things daily.

And there are plenty of people other than this guy making contributions to it, you’ve only just read a singular post by a single person outlining his most common gripes with companies that do not consult native speakers for their translations. It’s not an exhaustive list of people working in the sphere. Imagine if you read a blog post by Linus Torvalds, and thought: ‘wow, it’s crazy that no one else is making contributions to this cool open source kernel?’ - that’s exactly what you’ve done in your comment.


>A minority of them actually use the script to write things daily.

how do I interpret that? something like "hundreds of millions of arabs constitute a minority of 2 billion"? or are you saying that vast majorities in the arab countries don't use arabic script to write things daily?


There is 99% chance the software you are reading text on right now uses HarfBuzz to lay out the text, a library written mainly by a Persian person.

https://en.wikipedia.org/wiki/HarfBuzz


I believe that’s the specific guy that was referenced.


That is not the point I’m making, rather a single person is responsible for how most of software render text in most of the world writing systems and no one is wondering how 6 billion people on this planet so much depend on the work of “this one guy”. It is because text layout is something 99% of software developers can’t do right if their lives depended on it (same for anything that involves the irregularities of human nature, software developers are not cut for this).


> It is because text layout is something 99% of software developers can’t do right if their lives depended on it (same for anything that involves the irregularities of human nature, software developers are not cut for this).

Many can't even get the spaces before and after punctuation right, even though the correct way involves no irregularities.


This is not at all weird. The world doesn't need 2 billion text layout libraries. It needs a very small number (probably not 1 but there isn't just 1 as is) that actually work, are easy for devs to use etc.

If 1 person does the job very well others don't have to. That's the way we make progress in software.


Not too uncommon. This (used to be?) is the case for timezones[1]. Even xkcd has joked about it [2].

I think in software once there's one great, free implementation of a hard problem there's a tendency to stick with it. If that project is maintained by one person, then so be it.

1. https://onezero.medium.com/the-largely-untold-story-of-how-o...

2. https://xkcd.com/2347/


> 3. In general, the letter combination ال should be common. The combination ل ا cannot occur in Arabic script, as those characters should be connected.

Funny enough, Rami is Egyptian/Dutch and ل ا looks a bit like IJ/ij, which is a digraph in Dutch that even used to have its own codepoint once upon a time.

A digraph for which two existing letters in the ASCII code already worked perfectly had a code-point, and I'd argue that even historically almost no Dutch person has ever even used it because keyboards with it barely existed. The only place where you might find an IJ key is on ancient electronic typewriters, and the only reason I know is because I had touch typing lessons on one thirty years ago and the teacher made us use them, after which I had to unlearn that because no computer keyboard featured it.

And meanwhile Windows apparently can't even be bothered to copy/paste Arabic correctly (at least not in 2015 when the embedded video was recorded).

[0] https://en.wikipedia.org/wiki/IJ_(digraph)


>that would rather insult a tenth of the human population than actually pay to have someone who knows the language look at it.

I am unsurprised that a website by Rami Ismail uses collective offense-taking as its frame, and uses it to try shaming companies into giving money to non-westerners or adjacent.

It's not automatically insulting to misrender arabic, any more than it is insulting to read hilariously mis/autotranslated "chinglish" in China, or see comically offensive English tshirt slogans in Japan. Depending on context it could even be an attempt at being _nice_ that just misfired.

A simpler explanation and frame is that text rendering is tough enough as it is, and a solution that works perfectly fine for one market won't scale to everywhere. Arabic in particular is very hard, not just because of ligatures, but because right-to-left text is actually bidirectional in practice.

They might've even asked an Arabic speaker to review, but it may have looked okay right until the very last step in the production pipeline, when it got mangled.


  > for example Farsi in Egypt (Egyptians primarily speak an Egyptian Dialect of Arabic)
Nit picking here maybe, but neither Farsi (Persian) is a dialect of Arabic (not even in the same family of languages) nor Egyptians speak it typically. Although, they speak a dialect of Arabic.

Update: My mistake, as pointed out here, I misunderstood the point initially.


Yup, that's the exact point they're making. Non-Arabic-speaker sees Arabic script and assumes it's the Arabic language, so presents it to users that don't understand it, such as presenting Farsi text to Egyptian Arabic speakers. Or, to take the Latin script example, presenting French text to English speakers.


Well as the Persian and Arabic alphabet are nearly identical, unicode doesn't appear to store two distinct copies, instead just storing one plus the additional characters for Persian.

Oddly though, it stores several copies of identical chinese characters. Traditional, simplified, and the Japanese loaned characters kanji.


> Oddly though, it stores several copies of identical chinese characters. Traditional, simplified, and the Japanese loaned characters kanji.

Actually, all of those characters are usually unified into one code point. This is controversial enough on its own that there's a Wikipedia page on it: https://en.wikipedia.org/wiki/Han_unification


> Traditional, simplified, and the Japanese loaned characters kanji.

Traditional, simplified, Japanese, Korean. Some are identical, some different. I wonder why these rendering details were not left to fonts and you can type 雨 and ⾬ for example. Maybe stronger presence of those countries in whatever SDOs and tech integration has to do with it.


I believe that was the exact point the article was making.


I'm surprised that this is still such a big issue. I'd expect with unicode standards, this should handle well out of the box.

Ages ago, in 2005 or thereabouts, I was working on a website for a Jewish organisation. I don't know Hebrew, but Hebrew has many of these same issues, including the right-to-left direction, and I was quite surprised to see text from our own CMS, based on Java, with tons of XML pipelines (Cocoon! XSLT!) generating HTML to be viewed in a browser, just handled this correctly without any problems.

At least the text was right-to-left, which was the only thing I knew, but the customer presumably knew more and they were happy.

Though selecting text in a mixed left-to-right and right-to-left piece of text looks really weird. Not sure what we did with alignment there, but I vaguely recall that everything may have been centered. An ugly compromise perhaps, but centering text was still popular at the time.


For the most part, these problems only pop up when you try to render RTL text visually, and then copy it back out from UI elements. In internal pipelines, however, it's just another string of opaque bytes that you perform a few very well-specified standard library operations on.

(Arabic also has the connection challenge, which Hebrew does not, but usually renderers that do RTL right also handle the ligatures.


One thing missing is to read the docs of your UI framework. I handled translations for a Windows tool 15 years ago. Windows docs were great at explaining what to do. I found an Arabic speaker to check my work and we had it right on the first attempt. I wonder how hard it would be on other more modern platforms.


Do arabic peoples seek the videos from right to left too? If not, would they want?


My understanding, preferably, yes. https://m2.material.io/design/usability/bidirectionality.htm...

> 7. Progress bars fill in the same direction as content is read

In general, if it represents "start to finish", or it has "forwards or backwards", then it should be mirrored. Above site has good examples for how a bicycle icon should be mirrored (as facing left is going "forwards"), but a magnifying glass would not.


It's a dilemma, either teaching English to everyone, or recreating the UI style guides from zero point.


They experience time backwards too.


I like the URL of the tool they linked to. http://www.arabic-keyboard.org/photoshop-arabic/

When reading the URL, I imagine an Arabic language called “Photoshop Arabic”, which came into being at the Photoshop team in Adobe XD

- Do you speak Arabic?

- Arabic is not just one language.

- Yeah, but do you speak an Arabic language?

- Yeah, I speak Photoshop Arabic.


Also note that much like Latin script, many different languages use variations of Arabic script. The most notable one would be Persian.


I’d like to add Urdu to the list of languages that are written in Arabic letters.


> Bonus 'I hate it' points if it's that Arabic-looking font that's just English text that looks kind of Arabic.

How is a tacky font on some text that was never intended to be Arabic even approaching the badness of these other issues, let alone worse?

Or is the worry you've found a person that doesn't understand different languages exist at all?


I think it's extra insulting when someone uses a font that's just supposed to look Arabic

I have definitely seen attempts at a "global-looking" interface where fake languages were put on screen using plausible-looking fonts.

Like, imagine a welcome page with "hello!" in lots of different languages, but the Arabic/Urdu/Armenian/Thai characters are all just literal gibberish that looks kind of like the alphabets in question.


We're talking about style at that point.

If someone is using those fonts to pass text off as being in a different language, as in the example you mention, then sure, that's egregious.

If, however, the font is use to render some text in the style of another language, then I don't see a problem with it. We've done the same with Greek, Russian, Japanese, and many other languages that have a distinctive script. Sometimes it's tacky, but I wouldn't call it insulting.


Is the context just "places that are clearly supposed to be Arabic"? Sure, I can understand that being a problem. But if the font is being used as a font I don't think it should be a big deal.


I wonder if there is something like "Help! Is this Canadian French?" on the Arabic Internet?


naive question. Is there an arabic equiv of a 7 segment display? It looks like a writing system that is very amenable to that sort of digitization. Always a center line, and then up to two strokes going {up, down, diagonal, curved} intersecting at the {left, middle right} of the main line, followed by some dots to further specify a sub-category. Seems like a small finite set of graphical components being composed together. Which in fairness describes any writing system, but this one seems to have a very simple and predictable stroke sub-alphabet (I mean that as high praise not as a slight, simplicity is good). Am I wrong? What does a simple Arabic digital display look like? I've never seen one.



Numbers will typically be rendered in Arabic numerals and not in Arabic script if a 7 segment display is being used. Latin characters are already pretty difficult to read on a 7 segment... Arabic would be even more difficult.


That's not entirely what I meant by my question. Let me try rephrasing. Is there a thing that works on the same principle as a 7 segment display, possibly with a more than 7 segments but still a manageable amount, which is designed to display Arabic characters with a similar amount of legibility?


I'm not an expert but I have gone down both the Arabic font creation road and the custom segmented display roads - not at the same time, of course.

My understanding of Arabic is that this would be _exceedingly_ hard to generalize into individual, discrete glyphs that could not only be broken out into their own segments but then repeated in a way that makes sense.

The only way I could personally think of is with matrix displays, which act more like normal displays.

This is because even though it is an alphabetic script, the blending between characters is non-trivial and highly dependent on the sequence of glyphs, which causes an explosion of possible combinations depending on what's being written.

That said, you can always design segmented displays that show _any_ shape or script, assuming you don't need to change them.


What exactly are the character blending rules? When you say non-trivial, are we talking normal non-trivial or oops-I-embedded-a-Turing-machine-in-my-written-language non-trivial?

I'm going to guess the various shifts you allude to are simple transformations for a pen-plotter, perhaps accumulating some "context" offsets to the initial position or flipping some glyph etc. Not sure how to translate that to segments, unless you can electro-mechanically adjust the segments...?...


Truetype fonts are more like polygonal vector graphics, less like lines/thicknesses. You generally combine two or more characters into a single glyph via a ligature.

But it doesn't end there. Fonts have a crazy number of different lookup tables that can affect how glyphs are drawn. While not Turing complete they're certainly very close.

In Arabic, how the baseline blends affecting the base shape of the glyph, plus lots of kerning rules, etc. would make Arabic very difficult to replicate on a segmented display.


I've never seen anything like that. But I have seen dot matrix displays render Arabic script


> Is there an arabic equiv of a 7 segment display?

A 7 segment display would have nowhere close to enough resolution to properly render Arabic script. Digital signage would at least be using a low-resolution LED matrix but more likely you'd see LED/LCD displays in places that are affluent and plain analog signage where they are not.


I could understand if this were a non-alphabet system like Chinese where the information coding is that dense per character. But for an alphabet system, why the mismatch between the bits of information per character and the bits needed to visually represent that character? The Latin alphabet mostly fits on a 7 seg display, and 26 letters requires a theoretical minimum of 5 bits to encode, so the graphical efficiency is close to optimal. What is it about the way other alphabets are encoding information that makes them hard or impossible to reduce down to a small number of segments?


Because the information coding is dense. Arabic script has a lot of tiny shapes that indicate what letter it is and it also has diacritic marks, which on their own are similarly intricate and small. You can't represent one line, two lines, a dot, a circle, a hamza, etc. above, below, to the side of particular characters, without a lot of resolution.


Another challenge for representing Arabic a display made of line segments is all the curves. If it weren't for the dots, I imagine you could write consonant-only Arabic on something close to a seven-segment display, although it would look bizarre because all the letters would be isolated and there would be so many more straight line segments than in normal Arabic script. (An absolute majority of the isolated forms of Arabic letters are made up only of curves, whereas an absolute majority of Latin capital letters are made up only of straight line segments!)

I wonder how Arabic ended up with characters that differ only by dot positioning (like ب ت ث, and also ن whose combining form is especially similar to the combining forms of those). By contrast, the most similar Latin characters might be EF / CG / IJ / MN / UV which I think are clearly more different than the Arabic characters that differ only by dot quantity and positioning.

(The Il1| are also famously confusable in some Latin fonts, but one could say having to worry about these comes late in the history of Latin script writing. Although the scribal minim https://en.wikipedia.org/wiki/Minim_(palaeography) used at some points to write u, i, m, and n is every bit as confusing as any Arabic character.)


I don't think curves are a technological problem for a segmented display. You can easily make curved segments and dots. Its all just an LED behind a window painted over with the negative image of the curve. In principle you could make a display that is optimized for OCGDU rather than EIFHTL.

Much like latin characters, at some point a segmented display is just going to have to compromise on the tradeoff between correct an the number of parts. We see it work just fine casting the curvy latin letters to squares. So the real question I reckon is how much can you mangle the diacritics / curves / other parts and still read it? In other words, what is the bandwidth redundancy of this writing system?


Yes, I wasn't thinking flexibly enough there (no pun intended). Interesting questions!


hold up. Representing a line, two lines, a dot, a circle... in one of several fixed positions relative to the character...that's exactly what a segmented display is good at. I think you may not have got what I meant by "dense". I'm looking at it from a pure information theoretic view. There's something like 5000 Chinese characters overall. Even if I could find a system of on/off segments to represent them all, I would still have to use at least 13 segments, because it takes at least 13 bits to encode 5000 possibilities. Chinese is dense in the sense that a whole 13 bits have to fit in the space of a single character. Granted diacrtics and other markers do add one or two new bits, but adding permutations isn't the same as adding complexity. Every time you learn one diacritic, you've doubled the number of letters you know how to write. Whereas in Chinese, when you learn a new character, you've learned a new character.

Other commenter nailed the real technical hurdle. Context dependent spacing system. How do monotypes do it (or do those not exist)?


There's certainly a finite set of combinations but in order for you to create a segmented display that isn't just a dot matrix, you need to have few enough variations that it can 1) actually encode all of the characters and their context-specific variations, 2) have segments that at least somewhat resemble the characters they are supposed to represent and 3) if they are overlapping must be able to to render all of the components of every letter distinctly. So If you have a line that you also want to be two or three dots and a hamza or some other diacritic, you need to break each segment down to the lowest common denominators.

A 7-segment display works by turning on and off each of the segments in order to render a glyph that is supposed to represent English alphanumeric characters. It mostly accomplishes this, though just barely and honestly a lot of characters do not render well at all (like the lowercase "i"). How would you shape the segments in such a way that could accomplish at least that level of clarity in Arabic?

I personally think that what you'd end up with would very closely resemble a dot matrix.


never seen one. my guess is it does not exist. it would make a great artistic project


based on the time stamp, you spoke almost exactly too soon. See alphabet9000s reply above yours.


Geez I get it, two billion people, 28 percent of the population.

(I have no idea where this number comes from but it is vastly inflated.)

All that being said, this is a decent primer on common Arabic rendering mistakes.

Never hurts to support popular languages properly, such as Arabic.


> Today adherents of Islam constitute the world's second-largest religious group. An estimated 1.8 billion or more than 24% of the world population identify themselves as Muslims.

Add in non-Muslim Arabs and you've hit 2 billion right quickly. The assumption is that a Muslim can recognize the Quran's written language somewhat.


This is a poor assumption.

I am deeply involved in the Muslim community, and can safely say that reciting holy works in Arabic offers zero useful knowledge of the modern language.


I was aware of this problem because I follow Rami Ismail for his game development and this site reminded me of him. Then I get to the bottom and it turns out it was made by him. Glad to see he is still fighting the good fight.


I watched the video at the bottom of the page, and it was all fun and I did laugh. But you lost me at the difference between ق and ك (q and k respectively) They are different sounds and do not have the same pronunciation.


When you doubt, use google translate instead of randomly hitting the keyboard.


According to the professional translators I work with, the results would be the same.


We just had a hilarious/sad case in German public media where it seemed like Google Translate was used for "plant shaped C4 charges" which resulted in a translation for "plant-shaped" instead of "planting shaped" [0] and then asked an expert how realistic it is that C4 was disguised as plants…

[0]: https://translate.google.com/?sl=en&tl=de&text=plant%20shape...


I think the parent is meant to say that MTs are only good as randomly hitting the keyboard to professional eyes...

But besides, I doubt "plant shaped C4" contains enough information to guarantee a correct answer in that case. Trying partial sentences and/or individual words through (human or machine) translations don't go well.


Yeah, you are right, for some reason I thought I was replying to grandparent. Removed the "maybe not" as I meant to agree with parent.

The plant shaped was not a fragment, but the whole text, this has a screenshot [0]. It was about fact-checking the Seymour Hersh Nordstream 2 article.

[0] https://nitter.librenode.org/Samy_t42/status/162879312196521...


because when you copy/paste into your editor that does not support arabic, the text gets scrambled.

one trick is to check if the text looks the same as in google translate after you pasted it


great post, really informative. especially the other links mentioned


It's possible in Photoshop if you have some settings wrong to copy/past Hebrew (and probably other Right-to-Left languages like Arabic) and have it show up reversed.

This has led to some unfortunate tattoos: https://bleacherreport.com/articles/2431265-mario-mandzukics...


Help! Is this horizontal scrolling?


م‎ر‎ح‎ب‎اً‎!


> to almost 2 billion people.

There are probably 350 million Arabic speakers.


He's not talking about Arabic speakers, he's talking about people who can read (to some extent) the Arabic script. This would include speakers of Urdu, Western Punjabi, Pashto, Dari, Farsi and many other (smaller) languages. He may also be talking about Muslims who can sound out the Arabic text of the Koran, even if they don't understand it--a skill which, from what I understand, is valued among Muslims.


But the Arabic language is not the only one using Arabic script. Not sure if you can get to 2 billion with Persian et al., but it would definitely increase your count


I was under the impression that Muslims were meant to be able to read the Qur’an phonetically, or at least the first chapter.


Many (most?) Muslims will have memorised the first chapter, and probably at least the last three chapters of the Qur'an. Reading isn't a requirement however, many children (Arabs and non-Arabs) will memorise those chapters verbally well before they learn how to read.


Not sure why this is being downvoted - a quick Google search gives me numbers ranging from 250-450 million, and 350 million sits comfortably within that range.

It's possible that the article refers to the number of readers of Arabic script in any language, which I imagine would be a bigger number.


> Not sure why this is being downvoted

Because it's comparing the wrong numbers. The article specifies what the 2 million refers to:

"Here's a 18 minute video of I talk I gave at XOXO in 2015 that'll teach you more than enough Arabic to not embarrass yourself in front of everyone who can read the Arabic alphabet to some degree. That's about 2 billion people, or 28% of the human population, and they will absolutely notice that you don't know what you're doing and laugh at you in languages you cannot even read."


"Why not just..." train a neural net to detect this? :)


> I hope this will you help in case you cannot afford to hire anyone at all with a elementary school level of Arabic script reading.

Oops.


That is probably in the top 3 most common English typos, made millions or billions of times a day by native English speakers, up to and including people with PhDs. The issue the author is complaining about is completely different.


I'm confused, are you implying a simple mistake made by natives all the time means the author should be outright discredited or something?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: