Help! Is This Arabic?

disintegrator · on Feb 27, 2023

Tangentially, for those doing web development, CSS has a bunch of "logical" properties [1] that adapt to the locale of the user agent. For example, you can swap out `margin-left` and `margin-bottom` in your CSS with `margin-inline-start` and `margin-block-end` respectively. Similarly, `text-align` accepts start/end instead of left/right. Even if you're not targetting right-to-left or top-to-bottom locales, it's easy enough to switch to logical properties and get most of the work out of the way if you change your mind in the future.

[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Logical...

lelandfe · on Feb 27, 2023

Do "top-to-bottom locales" exist? Has anyone built a site with users that get utility out of these?

I've usually found that the arguments for these properties revolve around, "well, what if I decide to use traditional Mongolian in the future?," which seems like the biggest case of YAGNI I can think of. I suspect their popularity is owed to tutorial authors showing off their CSS prowess.

(I'm also not convinced of the need to flip sizing values for RTL/LTR, but that's at least useful)

jeroenhd · on Feb 27, 2023

Here's a guide to styling such text: https://www.w3.org/International/articles/vertical-text/

Chinese/Korean/Japanese/Vietnamese originally, but computers have made horizontal writing more common on the internet. Modern Vietnamese is also written in a variation of the Latin script, of course.

Mongolian still uses the vertical Mongolian script, though Cyrillic script m has also been introduced in Mongolia during Soviet times and seems to be common inside Mongolia. However, the Mongolian government seems to be moving back to using the original Mongolian script. Furthermore, the Mongolian people inside China never took up the Cyrillic writing system.

Websites can and do use the vertical script: https://president.mn/mng/ http://khumuunbichig.montsame.mn/index.php?home&readnews=572

Google uses the Mongolian Cyrillic alphabet (https://www.google.com/?hl=mn) so even companies that seem to have a page in every single language don't seem to bother with supporting the original script. This is probably because the Mongolian speakers inside China can't make much use of Google anyway.

Funnily enough, Mongolian is one of the few known scripts not only written vertically but also left to right, unlike other Asian vertical scripts (which go right to left).

pradn · on Feb 27, 2023

Yes! The website of the president of Mongolia has three versions, including one with the traditional top-to-bottom script. It's quite well-designed! You can see web-design features that make sense for this script: the nav-bar being on the left, left-to-right scrolling, and pagination at the right end.

https://president.mn/mng/

lelandfe · on Feb 27, 2023

That site is well made. This is a bit pedantic, but if the default for that site is horizontal text, I'm not sure I would call this a "top-to-bottom locale": https://president.mn/

shantara · on Feb 27, 2023

I’m somewhat surprised to see that some of the characters are not being correctly displayed in iOS Safari. I was expecting the symbols common enough to be used on an official page of a head of state to be common enough for inclusion in the Unicode standard. Or is it a font issue?

mc32 · on Feb 27, 2023

They should make people turn their monitors vertically oriented --are monitors in MN usually in "portrait" or do they maintain a discrepancy of vertical text but horizontal layout?

pradn · on Feb 27, 2023

I think it's ok to have horizontal monitors even for "top-to-bottom" scripts. That's simply because it's less taxing for us to use. Plus, mouse scrollwheels work just as well so it's not problem to navigate.

kccqzy · on Feb 27, 2023

Go to any random bookstore in Taiwan and pick up a book in traditional Chinese. It's top-to-bottom, and right-to-left. The spine of the book is flipped compared with western books: it's on the right side when the cover is facing up.

H8crilA · on Feb 27, 2023

Japanese is actually top to bottom, but they largely gave up on that idea. Probably too much hassle.

irrational · on Feb 27, 2023

But, there are still texts that are read top to bottom. I only know that because my daughter has a Japanese friend and she has shown me some of her books and how they are read top to bottom.

tmtvl · on Feb 27, 2023

Same with Mongolian, but JP goes TBRL (Top to Bottom, Right to Left), Mongolian goes TBLR. Wikipedia has an overview: https://en.wikipedia.org/wiki/Writing_system#Directionality

disintegrator · on Feb 27, 2023

I'd say it's YAGNI if it required going out of your way to add something on top of your project. CSS logical properties mirror traditional properties one-for-one. Once you know about them why would you go back to older layout properties? The need to support different locales is really a question of what product you're building or audiences your targeting. e-commerce, news sites, social networks... they seem like good use cases for making the switch and future proofing a little.

lelandfe · on Feb 27, 2023

I love supporting different locales, and have shipped sites for multiple continents. I don't know if your locale in question exists.

Forget my question on your users getting utility out of this. Have you ever seen a site, any site, that supports switching to top-to-bottom writing? Against what future are we proofing – a new language arising?

jfk13 · on Feb 27, 2023

The site of the Office of the President of Mongolia is available in top-to-bottom traditional Mongolian script: https://president.mn/mng/

Other examples (e.g. vertical Japanese) aren't too hard to find: https://nishinokensetsukogyo.co.jp https://ok-maru.jp etc.

lelandfe · on Feb 27, 2023

We're looking for vertical to horizontal switching. Not just examples of vertical writing.

Almost all extant vertically-written languages are more commonly seen horizontally on the web. Vertical Japanese on the web (from my American understanding) is a design choice, and is notably absent from mainstays like https://www.yahoo.co.jp/

That is to say, it is unlikely that a site's language switcher would opt for top-to-bottom writing.

Edit: the Mongolian president's site does have an English version! ...but it's a completely different site :( https://president.mn/en/. Still, this is the closest to a use case for logical properties I've seen, so kudos.

jfk13 · on Feb 27, 2023

It's unlikely (I'd guess) that a site would want to offer the exact same page in both horizontal and vertical modes, yes. But it's plausible they might want to share a lot of design elements between horizontal and vertical versions of the content.

Using logical properties would make it easier to have a common base of CSS that controls spacing, sizes, etc., that can be used by both language versions, rather than maintaining them completely separately.

lelandfe · on Feb 27, 2023

For finality, I will quote my original comment:

> the arguments for these properties revolve around, "well, what if I decide to use traditional Mongolian in the future?," which seems like the biggest case of YAGNI I can think of

...And even this one close example doesn't use them!

8organicbits · on Feb 27, 2023

One of my favorite observations about forms that are in Arabic and English (or any left-to-right and right-to-left language pair) is that you don't need to pick one language as the primary language. The trick is to put the English on the left, a blank field in the middle, and the Arabic on the right. This way an English speaker will naturally read the form as an English form with the Arabic translation off to the right. But an Arabic reader will naturally read the form as an Arabic form with an English translation off to the left. Neither language is dominant.

Contrast this with a form in English and Spanish, where you need to put the text all on the left and decide which language goes on top.

NhanH · on Feb 27, 2023

Your text inputs still need to flow either left-to-right or right-to-left though? In which use case would this be useful?

8organicbits · on Feb 27, 2023

If we're talking about an HTML text input field, you can use `dir=auto`. Other tools should have similar features. This approach is also very common on paper, which naturally flows in the correct direction.

https://www.w3schools.com/tags/att_global_dir.asp

https://caniuse.com/mdn-html_global_attributes_dir

Edit: oh use case? Imagine a PDF form you'd like someone to fill out, or a page that may get printed. You could create two forms and let the person filling out the form pick their language, but the person processing the form may prefer to read the form in the other language.

IanCal · on Feb 27, 2023

> Your text inputs still need to flow either left-to-right or right-to-left though? In which use case would this be useful?

Anything that doesn't require input - particularly printed ones where you can't just swap with a button. Signs, menus, etc.

AdamN · on Feb 27, 2023

Layout on an airline magazine where the left side of the page is English and the right side is an RTL language like Hebrew or Arabic.

rippercushions · on Feb 27, 2023

This website is well-meaning, but will be difficult to parse if you don't have elementary Arabic and can't tell apart ال from ل ا.

What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them. The middle case, where the sentence is correctly rendered RTL but the individual words are LTR (breaking the ligatures), is particularly common and insidious because it looks plausible to non-Arabic speakers.

SamBam · on Feb 27, 2023

I actually thought that part was perfectly clear.

> In general, the letter combination ال should be common.

So if your text has more than a few words, you should be able to look though your text and see that somewhere.

I can't read Arabic but I can recognize that pattern. I went to https://www.bbc.com/arabic and could find numerous occurrences.

It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."

ramin_hal9001 · on Feb 27, 2023

> It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."

This is true. You should also be able to see the word "the" in several places in English text, and if it is rendered as "ehT" or something like that, the ligature code may have a bug -- similar with reversing the ال (read as "al") pattern.

> I can't read Arabic but I can recognize that pattern. I went to https://www.bbc.com/arabic and could find numerous occurrences.

Really? I only ever learned to read just enough Arabic to read parts of the Quor'an and am by no means fluent, but I couldn't see any mistakes on that website myself.

Nition · on Feb 27, 2023

The comment was that they found numerous cases of ال, not ل ا.

BoiledCabbage · on Feb 27, 2023

> In general, the letter combination ال should be common.

It was a really smart way of saying you don't need to know anything at all to pattern match on this a spot a very common problem.

Also, if you haven't checked out the youtube video on that page I highly recommend it. It gives a great concise summary of the issue, and it's impressive how much I was able to learn to visually parse Arabic script with only a couple of mins into the video.

[1] - https://youtu.be/X1ynZm1wI18?t=65

quickthrower2 · on Feb 27, 2023

A group of flying kookaburras flap about a body of liquid. In this group, individuals distinguish from distinct sorts. Small, big, colorful or plain, all flap wings as a way to maintain afloat. It's a sight to watch as this group zips, zooms, and turns all around. It's a natural habitat for this flying squad, and you can find it in many spots around our world. This group adds to our natural world's charm and, through its distinct traits, brings a lot of joy to many.

Oxidation · on Feb 27, 2023

Ah yes, the good old "your heuristic doesn't work on my carefully constructed pathological case" observation, wouldn't be the Internet without that.

GP also said "probably". That's how heuristics work.

quickthrower2 · on Feb 27, 2023

You would almost think what you described was our profession :-)

Oxidation · on Feb 27, 2023

Wait, you get paid for Internet snark?

quickthrower2 · on Feb 27, 2023

> your heuristic doesn't work on my carefully constructed pathological case

This is a bit like writing unit tests, debugging code, and so on.

Oxidation · on Feb 27, 2023

It's a heuristic (claims to be probably helpful). You might as well complain that checking string size first as part of an equality check of strings that demonstrably have a very large distribution of sizes is obviously dumb because obviously you can construct same-length strings at will. Breaking rules of thumb is the easiest thing in the world.

adaml_623 · on Feb 27, 2023

Great example because if that text is on your corporate website then someone upstream is not doing their job

Edit: obviously it is English but it's definitely not correct for a company website.

quickthrower2 · on Feb 27, 2023

Hey, you now want corporate website standard English with no Es? Talk about scope creep!

mcv · on Feb 27, 2023

But it may be totally correct for a website about kookaburras.

There's more than just corporate websites, and frankly, if a company of any meaningful size offers content in Arabic, I'd expect them to hire someone for that. Even part-time or freelance.

thaumasiotes · on Feb 27, 2023

> But it may be totally correct for a website about kookaburras.

It isn't; "to maintain afloat" is not grammatical.

You could replace that with "stay afloat" and it would fix the grammatical error without introducing an E.

There's a similar unforced error in referring to a "group" of kookaburras rather than a "flock".

"Individuals distinguish from distinct sorts" is gibberish. I cannot tell what it's supposed to mean.

"All flap wings as a way to [stay] afloat" is, at best, very awkward; fluent English would require "flap their wings", but that would introduce an E.

"Flapping about a body of liquid" is a very odd thing to say unless the body of liquid happens to be suspended in midair, since midair is the only location where you can find birds flapping.

darkwater · on Feb 27, 2023

Now, find anothEr 20% of commEnts hErE that follow the samE rulE as yours.

inkcapmushroom · on Feb 27, 2023

The 'e' is in your username. :)

fortran77 · on Feb 27, 2023

> It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."

For the exception, see

see https://en.wikipedia.org/wiki/Gadsby_(novel)

quickthrower2 · on Feb 27, 2023

Wow, having read the example prose, it is quite an interesting sounding book.

ehecatl42 · on Feb 28, 2023

https://en.wikipedia.org/wiki/Ella_Minnow_Pea is a riot of a read. Starts with a full complement of letters and drops one letter chapter by chapter.

roenxi · on Feb 27, 2023

The article even links to a site where you can practice: https://notarabic.com/

Thanks to this training, I can now identify that numerous Arabic strings on that site are backwards. He wasn't joking, IJ is everywhere.

Cyph0n · on Feb 27, 2023

> ال from ل ا.

Who can’t tell these two character sequences from each other? Genuine question.

I think the main point the author is making is, if you are including Arabic script somewhere, take a bit of time to either do it right or hire someone to do it right for you.

slowwriter · on Feb 27, 2023

I believe the difference is simply that the two sequences are in reverse order of each other.

erenyeager · on Feb 27, 2023

Not only this but the reverse of ال is لا because Arabic letters change based on position of the letter (initial, medial, final). So ل ا is never going to be a word since you lack joining, since the ل will look like لـ if joined and لا is a distinct ligature

madeofpalk · on Feb 27, 2023

> What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them

I don't think it does. Just hire someone.

I guess really, the point Rami tries to make is that not a single person who reads Arabic was involved in the video game/advertising/website/etc. The errors are often so basic that a child could point them out.

It would be like if someone wrote English without any spaces between words. It's so painfully obviously wrong.

bombcar · on Feb 27, 2023

It's one thing for a small personal project to get this stuff wrong, and there's somewhat of a baseline you can try yourself, but nobody is really going to care that much. It's like those funny pictures of restaurants in China with an English name of TRANSLATION SERVER ERROR.

What the site is about and where hiring someone makes sense is anything "big budget", especially if your target market includes Arabic-speaking or Arabic-adjacent countries.

Zuiii · on Feb 27, 2023

> a simple reference input string, examples of how it gets broken, and what to do to fix them.

I really like this idea. Just have a standard set of strings covering all edge cases (even the sprawling labyrinth that is bidi) with a visual reference that shows how the correct rendering of each string would look like. Each entry would also have a description of the problem and suggested solutions.

Unlike the solutions in OP, this one is pragmatic and is actually actionable for the vast majority developers. I'm kinda surprised that something like this doesn't already exist given the substantial amount of material and visual examples already available that covers the bidi algorithm.

- https://www.w3.org/International/articles/inline-bidi-markup... - https://www.w3.org/International/articles/inline-bidi-markup... - https://www.w3.org/International/articles/inline-bidi-markup...

epgui · on Feb 27, 2023

I'm not sure I understand the idea. It sounds insane to me because I feel like there's probably trillions of combinations (and it would be insane to expect to be able to cover every specific example of incorrect text), and I thought the website was pretty clear and provided good examples.

josephg · on Feb 27, 2023

As I understand it, the idea is just to make an Arabic test suite with enough examples (maybe a few dozen to a few hundred) such that if your program correctly renders all those examples, it’ll probably work fine with most Arabic text found in the wild. It sounds like there’s a lot of very broken software out there. Testing any Arabic input would be a big improvement for a lot of software.

IanCal · on Feb 27, 2023

> and can't tell apart ال from ل ا.

Who can't tell these apart? I know literally no Arabic - these characters look very much like latin ones I and J, and it's just an order thing.

It seems like an excellent quick test to me to see if there's ordering problems.

psychphysic · on Feb 27, 2023

Exactly if you aren't comfortable with the idea that glyphs are distinct and order can be assumed to matter.

This article can't possibly be scoped for someone like that.

There's a side note here, that dyslexia can become apparent in very different written languages.[0]

In which case don't try and handle multilingual text pay someone else, even if that's on fiverr.

[0] https://blogs.scientificamerican.com/observations/its-all-ch...

IanCal · on Feb 27, 2023

> Exactly if you aren't comfortable with the idea that glyphs are distinct and order can be assumed to matter.

But isn't that exactly what it's just telling you? The order is important and if you see this very simple pattern it's wrong.

If you read any latin character based language then you surely must be OK with the idea that glyphs can be distinct? Are there many people who exclusively read languages where the order of symbols is not important?

> There's a side note here, that dyslexia can become apparent in very different written languages.[0]

If the point is that dyslexia means some people can't see the difference then that's fair, I'd not come at it from that angle. I don't see any surprising pre-assumed knowledge in this IJ/JI distinction however.

psychphysic · on Feb 27, 2023

Yea we are in agreement.

IanCal · on Feb 27, 2023

Sorry, I'd not read the usernames and was trying to tie your response to the original comment. Makes a lot more sense now.

Kapura · on Feb 27, 2023

I don't think the intent is to fix every issue, but rather to tell you if you are using busted translation tools. The thing about left-to-right/right-to-left is something that a non-native Arabic speaker may not even know to look out for, even though it is a core aspect of the way the language is written.

LanceH · on Feb 27, 2023

Author was more making the point that one occurs commonly and the other order should never occur as the other order would require connecting the letters. The stick character does not connect to the left, but will on the right if that letter connects to the left.

michaelt · on Feb 27, 2023

I think the author is aiming for "You're making a prop for a film with Arabic text, or making a multilingual sign, are you doing it right?" rather than "You're writing a text editor, are you doing it right?"

wodenokoto · on Feb 27, 2023

If you can’t tell the difference between a stylized lJ and Jl you probably didn’t end up on that page to begin with.

I don’t think that is an unreasonable assumption on the reader

irjustin · on Feb 27, 2023

I think that's the point though? It's trying to educate us that we should pay attention to that difference as we can the block of text.

thiht · on Feb 27, 2023

> This website is well-meaning, but will be difficult to parse if you don't have elementary Arabic and can't tell apart ال from ل ا.

That's a really weird comment. Just Ctrl+F ل ا in your supposedly Arabic text?

_ix · on Feb 27, 2023

I agree. It doesn't take long to get to the author's thoughts on the most common cause: lack of subject matter expertise. That's cold comfort for those who are working against a tight deadline, though.

whoknew1122 · on Feb 27, 2023

The whole 'two billion people can read the Arabic alphabet to some degree' is a weird claim. Maybe it's possible, but there aren't two billion people with functional Arabic literacy.

Most Arabic is written without small vowels (harakat). You could have script that is justified right-to-left and with letters correctly connected and it still be gibberish. And many of those 'two billion people' would be none the wiser.

braingenious · on Feb 27, 2023

>'two billion people can read the Arabic alphabet to some degree' is a weird claim.

Not really, because

>there aren't two billion people with functional Arabic literacy.

isn’t something that the author claimed. “To some degree” is a phrase that explicitly states that the author isn’t talking about full functional literacy.

It would be a super weird claim if “some degree” and “full functional literacy” meant the same thing, but they don’t! You would almost have to intentionally ignore the meaning of the words the author used and invent a nonexistent overlap of meaning to become confused on this point!

Dylan16807 · on Feb 27, 2023

> It would be a super weird claim if “some degree” and “full functional literacy” meant the same thing, but they don’t!

They should at least be close if the author isn't trying to pump up the numbers in a misleading way. I definitely assumed that number would be close to the literate number, and not including people who can recognize a tiny fraction. Hell, I can read Arabic "to some degree" if we're being completely literal, but I think including me in a persuasive claim about people that use Arabic would not be appropriate at all.

> You would almost have to

No need to be rude.

pessimizer · on Feb 27, 2023

This doesn't make any sense. Why would you want the meaning of "some degree of knowledge" and "full functional literacy" to be close? What's misleading about describing exactly what you're talking about? Who cares how many people are literate in Arabic in this context? What you care about is how many people know enough about Arabic to know that you don't know anything about Arabic.

You seem to be demanding that "people who know enough medical terminology and/or Latin to see through your fictional doctor" be nearly the same class as "people who are doctors," and what's more, implying that's some sort of deception.

> No need to be rude.

braingenious · on Feb 27, 2023

>They should at least be close if the author isn't trying to pump up the numbers in a misleading way.

I’m genuinely confused here. The author was pretty much crystal clear about how he defined the size of the group that he was talking about, I do not understand how a person could be confused let alone feel the need to accuse him of being intentionally misleading.

What exactly is the nefarious goal that the author was trying to sneak past you with his clever trick of speaking in plain english?

bqmjjx0kac · on Feb 27, 2023

Suppose I said, "It's important to protect endangered species. There are 130 billion mammals on Earth."

It's plain English, but I picked the wrong measure to support my claim.

braingenious · on Feb 27, 2023

Your analogy is backwards. It’s more like the author said “This is good for all mammals” and you are the one that inserted “but there exists a smaller subset of mammals”, which is entirely orthogonal to the point.

bqmjjx0kac · on Feb 27, 2023

The author's thesis is that Arabic text is important because two billion people recognize its alphabet. This fact is irrelevant because it's a proper superset of the group that matters: people who can read Arabic.

Let me try a different analogy. "It's important for caterers in the US to provide a gluten-free meal choice. After all, the population is 332 million!" Without knowing the incidence of gluten sensitivity, it's a borderline-misleading statistic.

braingenious · on Feb 27, 2023

I am glad that we agree that this article that referred to people with varying degrees of knowledge of Arabic script is not, in fact, about people that are fluent.

I also agree that the existence of a subset that wasn’t referred to at all in the article is completely irrelevant to the topic at hand!

Dylan16807 · on Feb 28, 2023

The article strongly implies that the "some degree" group is the group that matters. But it's not. The group that matters is somewhere in between "some degree" and "fluent".

And the parent comment was not talking about fluent people when writing "can read".

braingenious · on Feb 28, 2023

Literally what do you think the words “some degree” mean to you personally?

Posters have been able to pinpoint the number of fluent speakers, can you give me a ballpark to how many people matter and how many don’t matter?

This article about rendering text appropriately has taken such a fun turn into sorting folks into groups that “matter” and “don’t matter”?

If being rigid about these numbers is so important, how many people that don’t matter today might matter next year? How many people are learning arabic script? How many might want to look up something written in arabic without it being rendered in absolute nonsense?

Dylan16807 · on Feb 28, 2023

> Literally what do you think the words “some degree” mean to you personally?

I answered that in my first post! I can read a tiny tiny bit of Arabic. That puts me into the literal "some degree" group, but I am also definitely in the not-mattering group, because rendering mistakes with Arabic will not cause me any problems with reading.

> Posters have been able to pinpoint the number of fluent speakers, can you give me a ballpark to how many people matter and how many don’t matter?

Have they? But I don't have numbers, I'm just saying that "fluent" is too small and "some degree" is too big.

> This article about rendering text appropriately has taken such a fun turn into sorting folks into groups that “matter” and “don’t matter”?

Are you offended that I classify myself as not mattering in this very specific context? You don't have to make it sound like I'm saying people don't matter in general, jeez.

> If being rigid about these numbers is so important

If a number is worth busting out to make a point, it's worth being correct.

> how many people that don’t matter today might matter next year? How many people are learning arabic script?

What's your point? If the number changes, then use the new number. Don't use a wrong number because it might change later. Or if you have an expected future number, label it as such.

> How many might want to look up something written in arabic without it being rendered in absolute nonsense?

A lot of those people aren't even inside the "some degree" group, so now you're making a different argument. I'd rather not start any new tangents at this point, if you don't mind.

braingenious · on Feb 28, 2023

> What's your point?

My point is that you’re trying to use some sort of odd pedantic mark trick to shift the conversation from your experience of “There is a group that I personally don’t care about” to “Math dictates that this is not actually a problem worth addressing.”

Your position that the important takeaway here is actually the importance of scrutinizing pointless minutiae rather than text rendering being fundamentally broken isn’t empirically based. Your entire argument is “look at how clever I am!”, which is fundamentally off-topic when talking about rendering text properly.

Like lol, how are people supposed to learn the script if their examples are all messed up? As a maths genious surely you could see the issue with how “impacted people” is somewhere between “fluent people” and “fluent people plus an unknown number of others.” What hard number did you land at when adding unknown variable x to the number of fluent speakers you googled?

bqmjjx0kac · on Feb 28, 2023

> Your entire argument is “look at how clever I am!”, which is fundamentally off-topic when talking about rendering text properly.

I think this is a really uncharitable read of this conversation. This thread has been about the veracity and the relevance of the author's claim that "two billion people can read Arabic to some degree".

I don't think anyone is trying to refute the author's conclusion that Arabic text rendering is important. I also don't think anyone is trying to show off how clever they are.

Personally, I agree with the author's conclusion, and I thought the post was really neat! But I also think the 2 billion statistic weakened their argument -- it's better to omit a statistic than include the wrong one.

braingenious · on March 3, 2023

> This thread has been about the veracity and the relevance of the author's claim that "two billion people can read Arabic to some degree".

This is not really true. You tried to center your conclusion that your math was better than the author’s math while distracting from the topic of rendering text properly.

This thread has been about you insisting that people listen to your math and not discuss rendering text properly. lol this thread has been about how clever you are, _not_ rendering text properly.

Dylan16807 · on March 6, 2023

It's not about the math at all, just "this seems like the wrong group to use as an example". It's a simple point, nobody is trying to show off.

And in these comments I'm assuming that the author has exactly the right number for the group they cited. Because it's really not about math. I have done no calculations and trust the number given. I just think they're citing the wrong statistic. That's why I'm also uninterested in the factors you mentioned that might influence the number up or down. The actual number doesn't matter for this criticism: even if the number in the article happens to match the right statistic, they're still citing the wrong statistic.

tsukikage · on Feb 27, 2023

> pump up the numbers in a misleading way

Is it misleading? You don't have to be anything like fluent to realise when text rendering is broken. The quantity that is actually relevant to the discussion is the number of people who, when they look at your UI, will know that the arabic text rendering is broken; not the number of people who are fluent in arabic.

Dylan16807 · on Feb 27, 2023

The only reason for me to care that it's broken is on behalf of the people that can read it.

So even though I know a single digit number of words, and you can count me in that two billion, nobody should care about getting it right on my behalf.

foldr · on Feb 27, 2023

>The only reason for me to care that it's broken is on behalf of the people that can read it.

The people who can't read it, but who can see that it's broken, will form a lower opinion of your product. It's as if I went to a Polish website and the text was all right-aligned and in all caps. I can't read Polish at all, but I'd still form an opinion about the quality of the site.

Dylan16807 · on Feb 27, 2023

I'm not sure if you were trying to disagree with me, but I agree. They will form a lower opinion, but they form that opinion almost entirely because of the people that actually should be able to read it.

If you screw up a language that has 0 readers, it matters far less.

The point of saying how many speakers there are was to increase the strength of that effect. Because of that, it's misleading if you pump up the number. For pumping it up to not matter, the number would have to not matter, and there wouldn't have been a reason to mention it in the first place.

zarzavat · on Feb 27, 2023

There is a difference between literacy and fluency.

Literacy is ability to understand a writing system. I am literate in the Latin alphabet.

Fluency is ability to understand a language. I am fluent in English.

Neither implies the other. You can be fluent but illiterate (the default until modern universal education), and literate but not fluent (I am literate in the Latin alphabet but I am not fluent in Italian).

The claim that there's ~2 billion people who are literate in Arabic script and will laugh at you if you get it wrong, is more or less true. It's of course referring to the large number of people who can read from the Quran in Arabic but without understanding all the words.

AdamN · on Feb 27, 2023

2B still seems like a stretch. Can 2B people read (even with low comprehension) a basic paragraph in Arabic? My SWAG would be 1B people.

jmopp · on Feb 27, 2023

A quick Google search seems to indicate that there are 1.9 billion Muslims in the world. It makes sense to assume they know enough Arabic to recgonise at the very least common religious set-phrases like the Bismillah or the Shahada. In my book this counts as "some degree of Arabic script literacy"

AdamN · on Feb 27, 2023

You'd have to exclude everybody under 7 or so, everybody who is illiterate (in this very expansive sense), and those that are Muslim but do not read Arabic and do everything in translation.

That might be a small group though and probably outweighed by all the non-Muslim Arabic readers (for instance I work with 2 Egyptians, one is Coptic and one is ex-Muslim and both can easily read Arabic).

zarzavat · on Feb 27, 2023

If it's Quranic Arabic then the orthography includes vowel markings so vocalising it is considerably easier.

The number of people who can vocalise Arabic without the vowel marks (e.g. a newspaper) is considerably less. But you don't need to be able to do that to notice any of the errors in the article.

kybernetikos · on Feb 27, 2023

Yeah, I think we're talking here about the number of people who will know you messed up rather than the number of people who can actually read the text in your app.

wruza · on Feb 27, 2023

Exactly. That would be just a subtle exaggeration based on a pseudo-claim, to some degree.

tialaramex · on Feb 27, 2023

So, there are degrees of literacy involved, and what we're talking about here is one step up from the bottom layer, a recognition of the writing system, not of a specific language.

I can muddle along somewhat in French, especially written French, but I have essentially no Polish despite working with more Polish people than French. Nevertheless, Polish is written in a Latin script (for about a thousand years) and so I "can read Polish to some degree". If you show me a Polish street address, and then some street signs, I can spot when the sign matches the address, because I understand that symbols which are slightly different just mean the same thing. If you give me Polish mirror writing, I know it's wrong - it's backwards, even though I don't understand it.

If I attempt this in China it won't work, because I don't understand the Han script, so I am not sure whether a symbol I'm seeing is the same symbol written more or less ornately or an entirely different symbol which just looks somewhat similar. Are the symbols backwards? Or maybe they're different symbols which just look backwards.

The layer beneath this by the way is recognition that intent exists. If you show me Chinese text I not only can't read it I'm not sure which symbols are "the same" and which are not, however I can immediately tell this is writing. The writer intended to convey meaning with these shapes, perhaps I can find somebody else to translate them for me. Whereas say, the pattern on my duvet is just a pretty pattern, it doesn't mean anything (yes, I have thought about this, no it isn't a secret messsage) and so I can't get that "translated".

thaumasiotes · on March 2, 2023

> If I attempt this in China it won't work, because I don't understand the Han script, so I am not sure whether a symbol I'm seeing is the same symbol written more or less ornately or an entirely different symbol which just looks somewhat similar.

Just for fun:

Same symbol: 龙龍

Completely different: 已己

poulpy123 · on Feb 27, 2023

> nevertheless, Polish is written in a Latin script (for about a thousand years) and so I "can read Polish to some degree"

No, reading and being able to decipher script is different. I can read portuguese, spanish or italian (even romanian) to some degree because the languages are close enough to each other but I can't read polish, basque, finish or magyar despite them using the same latin script

muzani · on Feb 27, 2023

I don't understand much Arabic, but I know the alphabet almost as well as the Latin alphabet. If you see something in Latin along the lines of "svciwbc oaoøoaaö", and claims to be Gaelic or whatever, you'll know that something is off - the consonants and vowels just don't work that way.

It won't fool the people who can speak the language, but I think the website is just designed to educate people so that it doesn't look like complete nonsense.

Facebook is surprisingly an offender here. It's common to mix both Arabic and Latin, say, begin with بسم الله and then write the text, sometimes in english, and it'll throw off the alignment of the text completely. You get the Latin words right aligned or the Arabic one left aligned.

Edit: I'm actually quite surprised how well HN handles this.

Zuiii · on Feb 27, 2023

HN - thankfully - sends text (mostly) unmolested to browsers which generally tend to handle arabic substrings correctly (assuming the page encoding supports it).

HN /could/ make it better by setting css auto directionality for all <p>s, but that would be antithetical to its goals as a English-written forum.

rippercushions · on Feb 27, 2023

It's an approximation of how many Muslims there are in the world, making the generous but not entirely incorrect assumption that they will be familiar enough with Arabic to at least be able to tell if you've completely borked your rendering.

Galanwe · on Feb 27, 2023

I would be careful linking "Muslims" and "Arabic readers/speakers".

I know Muslims who would be challenged to speak more than 3 words of Arabic, as well as Arabic speakers who are Christians or atheists.

randommar · on Feb 27, 2023

i think every muslim can speaks more than 3 words, every muslim known what al fatiha is. that's 25 words.

riffraff · on Feb 27, 2023

I'm completely ignorant of Muslim culture but there was a time where Catholics recited prayers and liturgy in latin and plenty of practitioners recited the sounds without really understanding them.

Source: my grand aunts saying "arapreme" where "ora pro nobis" would go (the former kinda sounds like an Italian word). Also probably related, "hocus pocus" is the "magic" expression "hoc est corpus" ("this is the body [of Christ]").

xbmcuser · on Feb 27, 2023

Learning how to read and pronounce arabic properly is part of the religion. As all prayers etc are in Arabic and you are supposed to recite them in arabic not a translation and wrong pronunciation can change the meaning so they are strict about teaching proper pronunciation. Though I agree with you about the 2 billion number being incorrect but it was being used a turn of phrase not an accurate number.

wruza · on Feb 27, 2023

Not everywhere. I live in an islamic region (around 40%). Religious quotes in every other home, but maybe 2-3% of people know at least some arabic letters, not to mention words. You learn it only if you learn at madrasah or are hardcore muslim (not as in extremism, but as in taking it seriously). After you learn it, you have to listen to popular interpreters, because Quran is still hard to get right on your own.

neoberg · on Feb 27, 2023

I grew up in a Muslim environment and I know al fatiha and more. But I don't understand a single word of it or can't tell if what I recited is a word or a whole sentece. I believe 99% of the population in my country are the same. Also we can't read Arabic script.

throwaway2203 · on Feb 27, 2023

What country did you grow up in? We all don't speak arabic but can fluently read the script.

neoberg · on Feb 28, 2023

Turkey

Galanwe · on Feb 27, 2023

In my surroundings I would say roughly 90% of people reciting al fatiha do so based on sounds learnt by heart, they would hardly be able to explain the meaning of these individual sounds.

mytailorisrich · on Feb 27, 2023

The only assumption is that Muslims may, on average, be able to "read the Arabic alphabet to some degree". Obviously 'some degree' makes it very variable, does not imply being able to really read/write/speak.

I don't know if it's true but that does not seem unreasonable.

Galanwe · on Feb 27, 2023

> I don't know if it's true

> that does not seem unreasonable.

My point is that what seems reasonable to someone that doesn't know anything about the subject means nothing, and could even end up being somewhat offensive.

Try to reverse the logic and realize how absurd it would look to you: say a guy in middle east wanted to roughly estimate the number of Latin speakers and made the assumption that it should roughly be the same as Catholics.

Wikipedia says there's ~2 billion muslims, and 400 millions Arabic-dialect speakers (native and non native). So on average 20% of muslims are able to understand basic arabic, I expect that not even half of those would be able to read and understand classical Arabic such as written in Quran.

mytailorisrich · on Feb 27, 2023

Again, the assumption was not that Muslims are Arabic speakers...

poulpy123 · on Feb 27, 2023

if all catholics were taught the bible in latin, you could make the assumption (true or false but reasonable) that all catholics would be able to read latin to some degree.

poulpy123 · on Feb 27, 2023

Coran is traditionnaly taught in arabic even in non arab countries. Also I have no idea how these country can read or decipher arabic in general, I can guess where these number come from

throwntoday · on Feb 27, 2023

Funny enough there are many Muslims who can read Arabic but have almost no idea what they are reading as they don't speak it. The Quran is interpreted into many languages but prayer is read in Arabic no matter what their native language is. Reading the Quran is also preferrably done in Arabic, as translation is open to the interpretation of the translator, so the only truly immutable word is read in Arabic.

wruza · on Feb 27, 2023

Interpretation of Arabic itself is the least problem they have. It takes a well-educated interpreter to make correct sense out of most statements.

With few exceptions, Islamic revelations do not state which Quranic verses or hadith have been abrogated, and Muslim exegetes and jurists have disagreed over which and how many hadith and verses of the Quran are recognized as abrogated,[6][7] with estimates varying from less than ten to over 500.[8][9]

gkfasdfasdf · on Feb 27, 2023

> The whole 'two billion people can read the Arabic alphabet to some degree' is a weird claim

Not a weird claim at all. Even without harakat, it is obvious when Arabic script is messed up (via alignment or letter reversal) to anyone who can read the script. It just makes whatever you are reading look janky and unprofessional.

IncRnd · on Feb 27, 2023

It's weird, because the number is off by an order of magnitude. "If you count all of the varieties of today’s Arabic together, you can safely estimate that there are about 313 million Arabic speakers in the whole world, making it the fifth most-spoken language globally behind Mandarin, Spanish, English and Hindi." [1]

[1] https://www.babbel.com/en/magazine/how-many-people-speak-ara...

raisin_churn · on Feb 27, 2023

Arabic is both a (group of) language(s) and an alphabet/script. The page isn't claiming that 2B people speak some dialect of the Arabic language, it is claiming some 2B people have a degree of literacy in the alphabet/script, so a quarter of the world's population will potentially know if you've screwed up your website/app/game/whatever. You don't need to speak Latin to know that "anunmtMh mucjmydDDDDD" is just the result of me mashing random keys on my keyboard, because you can read Latin letters. Likewise no Mongolian speaker would believe "прнтзх йоэЁюи" is proper Russian, even though those two languages are wildly different.

IanCal · on Feb 27, 2023

Also I cannot read Arabic and know enough to see that something is wrong if it's LTR.

Now I know:

> 3. In general, the letter combination ال should be common. The combination ل ا cannot occur in Arabic script, as those characters should be connected.

I still cannot read a jot of Arabic and know enough to identify some cases where the dev/writer has gotten it wrong.

zakki · on Feb 27, 2023

The writer most likely count the moslem population. Most moslem don't speak arabic but they recite quran (written in arabic) daily. Most of them will notice if a ال written as ل ا

jcranmer · on Feb 27, 2023

Egypt has ~100 million people, the vast majority of whom are going to be native Arabic speakers. Algeria, Sudan, and Iraq also have ~40 million people each, again, most of whom are native Arabic speakers (of different varieties). Morocco, Saudi Arabia, and Yemen are no slouches either. In short 300 million native speakers of various Arabic languages is extremely plausible.

FWIW, the usual estimate for worldwide adherents to Islam is ~1 billion.

zakki · on Feb 27, 2023

PEW Research and Wikipedia estimate are 1.9B.

[1] https://www.pewresearch.org/religion/2011/01/27/the-future-o...

[2] https://en.wikipedia.org/wiki/Islam_by_country

IncRnd · on Feb 27, 2023

Those estimates are not for Arabic readers or speakers. They are for Muslims or followers of Islam.

ask_b123 · on Feb 27, 2023

FWIW, I am not a muslim nor a native Arabic speaker but I "can read the Arabic alphabet to some degree".

Though if your numbers are correct, 2 billion sounds way too high

gkfasdfasdf · on Feb 27, 2023

Most Muslims are taught how to read Arabic script, even if they can't understand or speak Arabic. That is because reading the Qur'an in its Arabic form, even without understanding, is a sacred act above reading the translation. Additionally, ritual prayers require reciting verses in Arabic, (again without necessarily understanding). Therefore it's not weird to assume the number of people familiar with Arabic script include the population of non-Arabic speaking Muslims.

chakintosh · on Feb 27, 2023

> You could have script that is justified right-to-left and with letters correctly connected and it still be gibberish.

That just simply means you can't read arabic. You absolutely do not need to vowels to be able to read arabic, if you know the language, you know the vocabulary and know how a word should be pronounced. The vowels are there to aid in clarity. In most cases one single vowel could outright turn a word into its own antonym. And right now, 99% of arabic text is written without vowels apart from Quran and literary works.

bombcar · on Feb 27, 2023

Let's assume there's a correct "brand" for Coca Cola in Arabic (I have no idea if there is, but I'm sure there's some large brand that was localized) - there will certainly be a very large number of people who recognize that brand name, even if they cannot read, and will recognize it if you break it, just like someone with no literacy at all can recognize something is broke if they see a coke can labelled LOCA CACO.

mcswell · on Feb 27, 2023

This may be referring to people who can "read" the Koran, where "read" means say the text out loud. Understanding what they're saying is a different question, and most of those people probably do not understand the Koran's language (which is a more than thousand year old version of Arabic).

Alternatively, it may be referring to people who read a language written in Arabic script, but not necessarily the Arabic language. Languages that use some variety of Arabic script include Persian (Farsi and Dari), Urdu, Pashto, Western Punjabi, Uyghur and some other languages. Some of those languages use vowel diacritics, particularly Uyghur.

cool_dude85 · on Feb 27, 2023

>most of those people probably do not understand the Koran's language (which is a more than thousand year old version of Arabic).

My understanding, as a non-Muslim who doesn't speak Arabic, is that the standard Arabic is pretty conservative such that MSA, as would be commonly heard on TV or read in the news or other more formal settings, is not very far off the Arabic used in the Quran. So I think most native Arabic speakers would understand the Quran well, but may not be able to fluently speak or write it.

And that many Arabic learners learn MSA along with a dialect, so many second language speakers would probably be able to productively read the Quran as well.

nraf · on Feb 27, 2023

Most native Arabic speakers will be able to understand large chunks of the Qur'an. There are some words / phrases in the Qur'an that wouldn't be used in modern Arabic that might prove challenging for those who don't engage with the Qur'an on a regular basis, but the Arabic of the Qur'an is very close to that of MSA.

To most Arab speakers, both Classical Arabic and MSA are simply referred to as Fusha.

There are also many non-speakers of Arabic who understand many verses and can pick up on the gist of many verses in the Qur'an.

jamiek88 · on Feb 27, 2023

MSA is Modern Standard Arabic to save others a google!

marmee · on Feb 27, 2023

> Photoshop breaks Arabic. We even have a website that will break our Arabic so that Photoshop breaks it back to normal. Yes, Adobe is aware. No, they do not care.

Photoshop since time immemorial had a Middle East edition (ME) which supported RTL and used to curb piracy

I don’t know how it’s with their new pseudo-sass but I think they integrated it into their main product

khaled · on Feb 27, 2023

Photoshop relatively recently consolidated its text layout into a single layout engine (based on HarfBuzz) and Arabic support no longer an opt-in (which was the source of the trouble, you had to know you need to opt-in Arabic support before installing the application)

https://helpx.adobe.com/lv/photoshop/using/unified-text-engi...

chakintosh · on Feb 27, 2023

Oh man. I remember from 15 years ago or so when I used to struggle with arabic script, would often resort to writing in another software and exporting it to PNG or EPS in order to use it in PS. Luckily, there was a 3rd party add-on that fixed that issue.

martopix · on Feb 27, 2023

I remember the student union at the university of Edinburgh having massive welcome signs printed on a wall in all languages, and the Arabic one very clearly had the mistake in the points 2 and 3. Printed on a wall. It was very clear to me even though I don't speak a word of Arabic.

ta1243 · on Feb 27, 2023

Mistranslations are always a source of amusement

http://news.bbc.co.uk/1/hi/7702913.stm

> When officials asked for the Welsh translation of a road sign, they thought the reply was what they needed.

> Unfortunately, the e-mail response to Swansea council said in Welsh: "I am not in the office at the moment. Send any work to be translated".

zirgs · on Feb 27, 2023

Wouldn't it make sense for translators to send messages like that in multiple languages?

jcuenod · on Feb 27, 2023

At the end of the "It's a small world" ride at Disney, FL, there's a sign in Zulu that translates to "goodbye". But in Zulu, how you say goodbye depends on whether the person you're saying goodbye to is staying or going. The sign at the ride says goodbye as though the rider is staying (implying that the ride is leaving).

olliej · on Feb 27, 2023

The core issue is people without familiarity with how non-native (to them) script can function - in the English centric world it means assuming one letter == one visual glyph, and the visual glyph won't get a different rendering depending on context. Native English speakers don't seem to really acknowledge that the English alphabet does have contextually different glyphs for the same letter: that is what capitalization is (and applies to other latin descended scripts). Of course single character = single glyph equivalence fails for many latin descended scripts 'ss' vs 'ß', 'ij' vs 'ÿ'. This is entirely ignoring accented letters which break anglo-centric "single code point == single character" equivalence (though this is more understood).

People often think text layout is "easy" because they don't consider how anything other than their common experience treats thing (the reality is there still isn't a good vertical text layout story on the web).

This is also demonstrated whenever people go "I'll handle text layout myself", because they think one key press = one character, and immediately break text entry for more than half the world.

mcswell · on Feb 27, 2023

Another example of a contextually dependent glyph in Latin script is the non-word-final form of 's' that looks sort of like an 'f' in English texts. See for example images of the hand-written US Constitution, where this comes up in the very first word "Congress", which appears to modern readers to be spelled "Congrefs".

This led to an embarrassing mistake when Google OCRed older books that contained the word "suck."

IIAOPSW · on Feb 27, 2023

ah but you see, my conviction on one letter == one visual glyph context be damned is not hypocrisy nor is it cultural chauvinism. i for one am perfectly consistent. capitalization ought be abolished, tradition is not justification. language is about the encoding of information. capitalization is wasted code space. tradition is dead men telling you how to live. and after that, simplify the letters down into combinations of the same handful of strokes and curves. do ve really need letters vith tvice the horizontal space as the others? vhv not do avav vith all the letters that reach under the botton line? oet rid of bointu diaoonal lines too! eueru thino can and should be uniforn. doun uith the excebdions, consisdencu breuails! see hou nuch nicer this loohs. the uau it rebeats the sane feu elenets ouer and ouer uet is still leoible. it flous snooth lihe budder. uho needs all those letters anuvay?

kybernetikos · on Feb 27, 2023

Completely seriously I'd like to see English get rid of capitals. They're almost entirely useless and confuse learners for no benefit. On top of that kids books that use a font that makes I and l look the same annoy me too.

IIAOPSW · on Feb 27, 2023

i an conbletelu serious too

bombcar · on Feb 27, 2023

For example, in Year 1 that useless letter "c" would be dropped to be replased either by "k" or "s", and likewise "x" would no longer be part of the alphabet.

The only kase in which "c" would be retained would be the "ch" formation, which will be dealt with later.

Year 2 might reform "w" spelling, so that "which" and "one" would take the same konsonant, wile Year 3 might well abolish "y" replasing it with "i" and iear 4 might fiks the "g/j" anomali wonse and for all.

Jenerally, then, the improvement would kontinue iear bai iear with iear 5 doing awai with useless double konsonants, and iears 6-12 or so modifaiing vowlz and the rimeining voist and unvoist konsonants.

Bai iear 15 or sou, it wud fainali bi posibl tu meik ius ov thi ridandant letez "c", "y" and "x" -- bai now jast a memori in the maindz ov ould doderez -- tu riplais "ch", "sh", and "th" rispektivli.

Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.

from history, sometimes attributed likely incorrectly to Mark Twain

least · on Feb 27, 2023

> The core issue is people without familiarity with how non-native (to them) script can function - in the English centric world it means assuming one letter == one visual glyph, and the visual glyph won't get a different rendering depending on context.

English has cursive and I think most English speakers are at least aware of cursive. We have script that is designed for print as well, though, while Arabic script does not; it is always cursive.

andsoitis · on Feb 27, 2023

Some have developed non-cursive Arabic script, but it hasn’t taken off.

For instance, the Simplified Arabic Alphabet was devised by Muhammad Shakeel as an alternative way to write Arabic. It is a non-cursive alphabetical script as opposed to the traditional cursive Arabic abjad. The letter shapes are based mostly on the early Arabic Jazm script. It is not connected to or inspired by Nasri Khattar's Unified Arabic script.

https://omniglot.com/conscripts/saa.htm

least · on Feb 27, 2023

Interesting. I think it probably hasn't really taken off because there's not really any reason for it to at this point. We have high resolution displays and printers and software that is powerful enough to render and input regular Arabic script just fine.

There were similar issues with typing JP/KR/CN glyphs on computers for a long time which thanks to technological progress has stopped being something that needed solving.

techaqua · on Feb 27, 2023

What about farsi and other languages that use arabic script but not arabic at all? I can write (with effort lol) indonesian or javanese in arabic script https://en.wikipedia.org/wiki/Jawi_script.

LanceH · on Feb 27, 2023

Farsi has the letter "p" which Arabic does not. That letter has three dots on top (sometimes drawn like ^. I think there may be another. Arabic only has the "sh" sound with the three dots.

Trying to read Farsi, it feels like I should know what's going on but am left with the feeling that I've forgotten all my Arabic. Then I'll see some of the bonus letters.

10000truths · on Feb 27, 2023

There are four letters in Farsi that do not exist in Arabic - (گ ژ چ پ), which make the 'p', 'ch', 'zh' and 'g' sounds, respectively. But the underlying calligraphic system (RTL order, joining forms, harakat diacritics, and so on) is pretty much the same across the Arabic script and its descendants.

mcv · on Feb 27, 2023

The same issue also exists in Latin script, where German has the ß, not to mention various umlauts, strikes, circles and cedilles modifying letters of the otherwise standard Latin script.

hnbad · on Feb 27, 2023

I'd imagine it's more like seeing Vietnamese as a European: https://en.wikipedia.org/wiki/Vietnamese_alphabet

If you live in Europe and speak a language using a Latin script, you probably have come across most of the extensions other European languages add to the shared base in loanwords or foreign media. But then you look at something like Vietnamese and you are no longer sure how letters work.

smcl · on Feb 27, 2023

> Trying to read Farsi, it feels like I should know what's going on but am left with the feeling that I've forgotten all my Arabic

This is what Dutch sounds like to me as an English speaker - plenty of common sounds with English; it has a similar speed, rhythm, intonation to English. It feels like I’m hearing English but have lost my faculties to parse it

jamiek88 · on Feb 27, 2023

Yes! It’s kinda like “hearing voices’. It sounds like English, but if you try and ‘tune in’ you can’t!

bombcar · on Feb 27, 2023

I know some Spanish and hearing Portuguese does this to me every time; it's like I've almost tuned in but there's nothing there.

pezezin · on Feb 28, 2023

I'm Spanish and I get that feeling with Romanian. It sounds somewhat familiar, but I have no idea what is going on.

Portuguese and Italian on the other hand are much closer, and if spoken slowly enough, somewhat understandable.

ben_w · on Feb 27, 2023

As word games, I like sentences which sound valid in multiple languages, regardless of if the meaning is changed.

"Goedemorgen, ik hoop dat je bent goed".

mcv · on Feb 27, 2023

Always fun are sentences that sound very similar and are entirely correct in both languages but mean something completely different, like:

He was in the war -- Hij was in de war (he was confused)

A stiff in the brook -- Een stijve in de broek (a boner in the pants)

Those are the two most famous examples I'm familiar with, but I'm sure there are a lot more.

smcl · on Feb 27, 2023

Oh wow, I've encountered a lot of words in English/Czech that are "false friends" but it's too far away gramatically to construct similar sounding entire sentences. That's brilliant you can do it with English and Dutch :)

Also I wonder if there's a connection between trousers being "broek" in Dutch and "breeks" in Scots.

edit: wow ok I should've just went to wikipedia: https://en.wikipedia.org/wiki/Breeks

ben_w · on Feb 27, 2023

At a NATO spy convention, an English spy asks their Estonian, French, Spanish, German, and Bulgarian counterparts if they can see them in the new camouflage they are testing.

"Jah" "Oui" "Sí" "Ja" "Da"

ben_w · on Feb 27, 2023

An ex had the smaller example of incorrectly asking for "un préservatif" when she meant "un préserve".

One of her friends was half of a multi-nationality couple, I think it was French and Irish, and the punchline was their kid, at a beach, yelling, in a strong Irish accent "Look mummy! Phoques!"

mytailorisrich · on Feb 27, 2023

If by "un préserve" they meant a preserve/marmalade then in French that's "une confiture" (or "marmelade" but it's less somewhat rare). I don't think "un préserve" is a word...

ben_w · on Feb 27, 2023

Although I may be misremembering the exact word as I have a GCSE grade D from 23 years ago, her French is much better than mine as she lived in Paris for a few years.

It was certainly something close to what I wrote.

FlyingSnake · on Feb 27, 2023

> Trying to read Farsi

Wait till you discover Urdu, which confuses Farsi speakers even more. For extra fun try the Nastaliq script.

amir · on Feb 27, 2023

"p" has three dots beneath it (پ), three dots on top is "s" (ث).

LanceH · on Feb 27, 2023

Thanks, thought it was 3 dots on the Arabic "b". It really doesn't take long until you Farsi tries to sneak in some extra letters to figure out it's not Arabic.

Looking back on it, I remember feeling like I can't remember Arabic, but part of it is that this also happens during that time when I'm getting used to the script. There is always an adjustment period with every new font/handwriting that takes a sentence or two to sort out the style before I truly start reading.

bombcar · on Feb 27, 2023

Point 3 mentioned that:

3. The text is in the wrong Arabic-script language, for example Farsi in Egypt (Egyptians primarily speak an Egyptian Dialect of Arabic), or Modern Standard Arabic in Afghanistan (Afghans speak Dari, an Afghan Persian language). Comme l'alphabet latin, vous pouvez écrire différentes langues avec l'alphabet arabe.

mytailorisrich · on Feb 27, 2023

That's actually what I expected the website to do: Is this Arabic vs. Farsi, etc?

slim · on Feb 27, 2023

this applies to those languages as well

areyousure · on Feb 27, 2023

Mostly yes, and I think the OP is well-written.

But note that in some cases, eg https://en.wikipedia.org/wiki/Tajik_alphabet#Samples point number 2 won't be accurate. I'm not the best reader of Tajik written in (Perso-)Arabic script, but I don't believe "ال" appears in the samples there. (It does appear in the word "ALphabet" in the opening paragraph!)

friendlyHornet · on Feb 27, 2023

And also Aramaic dialects like Syriac

Syriac script looks completely different to Arabic but has the same issues when rendered

LanceH · on Feb 27, 2023

The first two correct examples are "Hello world". Third one is "language".

The stick character is either a short "a' or "i", which you would know from the little tick marks above/below that you see in decorative script, but are generally left out in print where you get it from context.

The J looking character is an "L". So they make "al" like al-jabaar => the strong. It's 95% like "el" in Spanish or "il" in Italian, though it is genderless.

The o with dots attached or not to the word on the right indicates feminine gender.

pezezin · on Feb 28, 2023

Incidentally you can find the "al" in many words of Arabic origin like alchemy, algebra, algorithm or alcohol.

anonu · on Feb 27, 2023

The question should be more precise. Is this Arabic script?

There's a half dozen languages at least that are not Arabic but use Arabic script, such as Persian or Urdu. The typographic rules mentioned still apply though.

mcswell · on Feb 27, 2023

Urdu (and IIRC Western Punjabi, i.e. Punjabi as spoken/written in Pakistan) prefer to use the Nasta'liq variety of Arabic script, and historically Persian (Farsi) was written in Nasta'liq. And it is not the case that Nasta'liq is written on a base line--in fact, characters in a word tend to slant down to the left.

rahimiali · on Feb 27, 2023

One of the three rules on the page do not apply to either Persian or Urdu. The article “Al” isn’t used in them, so the third rule doesn’t apply.

least · on Feb 27, 2023

I also thought it might've been about the differences between them, given that Pashto has its own characters unique to its script and Arabic has diacritic marks not used by Farsi or Pahto. This is a different subject, though, that has little to do with what the post is complaining about. Their grievances apply to all scripts based on Arabic.

fjfaase · on Feb 27, 2023

I once had to implement Arabic support in an existing editor. It also had to support mixed RTL and LTR input. Interesting to see how the cursors walks differently through the text when going going forward and backwards and how text selection works. Most document editors (MS Word, Libre Office with CTL, for example) and browsers implement this correcly as well. So, you can just compare if a text is displayed correctly by comparing how it looks in a document editor or a text box in an HTML file.

reaperducer · on Feb 27, 2023

You've never had to worry about copy-pasting destroying your text between two programs

Actually, yes. Yes I do.

Try having to work in documents that trade text in multiple languages between Adobe Acrobat, and Microsoft Office. Add in some opinions from iOS, and I end up pasting text into a blank ASCII file just to get it back to basics so I can send it to the next program because neither Microsoft nor Adobe can reliably handle the macOS standard Command-Shift-V to paste text unformatted.

I can't imagine the disaster that would await me if i also has to do it in Arabic.

vishnuharidas · on Feb 27, 2023

This reminds me of an Arabic client who wanted both English and Arabic version of the app. I sent him a spreadsheet with 2 cols, with English on one column, and another empty column where I expected him to fill the Arabic translation.

He replied with a plaint .txt file with a couple of Arabic lines in it, with a message saying "I translated some of those text from your excel sheet, please go ahead and use this text, I will send the rest by tomorrow". Nobody understood which translated to which.

Causality1 · on Feb 27, 2023

Also add "use the correct Arabic for your target audience". "Arabic" is not monolithic and using the wrong dialect can go over as well as using Scots on a web page intended for Jamaicans.

Cyph0n · on Feb 27, 2023

As an Arabic speaker, baby steps!

For spoken Arabic, a decent placeholder is either MSA (standard Arabic) or a generally understood dialect (Egyptian or Levantine).

The latter is what a lot of the recent Arabic game dubs have been doing (e.g., Ubisoft). Another example: Pixar and Disney use Egyptian Arabic for their dubs.

Edit: Oh, and for written Arabic, you should almost always be using MSA. That is, you shouldn’t worry about the dialect of your target audience.

james-redwood · on Feb 27, 2023

I pity the poor dev who has a Moroccan co-worker that helpfully offers to translate their application into darija thinking it’s Arabic

curiousgal · on Feb 27, 2023

Tangent: I hate how Moroccans monopolized the word darija to mean "Moroccan Arabic". Every country in North Africa calls their own language darija!

poulpy123 · on Feb 27, 2023

Do you know why levantine arabic is more used for dubbing ?

Cyph0n · on Feb 27, 2023

My understanding is that it’s simply because they were among the first to setup dedicated dubbing studios.

kmeisthax · on Feb 27, 2023

>Bonus 'I hate it' points if it's that Arabic-looking font that's just English text that looks kind of Arabic.

There's a faux-katakana font someone used for the titles on an otherwise amazing album of some FM synth Touhou remixes[0] that hardcore fucks with my brain.

[0] https://www.youtube.com/watch?v=24x3AL7yMX0

bombcar · on Feb 27, 2023

There's a whole subculture of using Cyrillic to make "Russian-sounding" things but it's just English with reversed letters. Hilariously amusing to anyone who knows any Cyrillic language.

agolio · on Feb 27, 2023

Probably a good place to ask: I want to add an arabic translation to a webpage I operate. My text editor is a bit confused by the unicode symbols. Obviously the text editor displays left-to-right no matter what as it's made for programming.

What's the right way to handle this?

least · on Feb 27, 2023

This might provide some insight for setting up your editor [1]. W3C has guidelines for localization [2] and for using RTL script online [3].

[1] http://andreasmhallberg.github.io/typing-arabic-in-vim/

[2] https://www.w3.org/International/questions/qa-html-language-...

[3] https://www.w3.org/International/questions/qa-html-dir

skrebbel · on Feb 27, 2023

If the editor only has trouble displaying Arabic and does not somehow mess with the unicode representation then you should be fine. Ie you could paste an Arabic string into a JSON file, it might look like shit, but should still render correctly in your app or website (assuming you set the text direction or language properties right)

And you can always double check in a different editor to be sure nothing gets messed up. I use VS Code which, likely because its browser based, seems to get it right as well.

slim · on Feb 27, 2023

I recommend using gedit for this use case (available for linux and mac) it has the best support for arabic, right to left script and bidi (that's mixing rtl and ltr in the same sentence) I've ever seen

Symbiote · on Feb 27, 2023

Kate (KDE's editor) is also fine.

slim · on Feb 27, 2023

also I would be happy to help you support arabic in your webpage. send me an email slim@pirate.tn

noufalibrahim · on Feb 27, 2023

The main thing here is knowledge of the language. Given how widespread Arabic is (though perhaps not so much in the gaming ecosystem which is Rami's area), it makes sense to have someone on the team knowledgeable about the language while doing translations.

I encountered a (sort of) similar problem while rendering Devanagari text and wouldn't have realised If I couldn't read the text https://stackoverflow.com/questions/44254171/devanagari-text...

redbell · on Feb 27, 2023

> it makes sense to have someone on the team knowledgeable about the language while doing translations

Speaking about translations, I had. Laughable experience when I was experimenting with Google translate API a few years ago, when I picked some random text about solar system from Wikipedia that reads something like “Mercury is a planet of the solar system”, when translated to Arabic, it read “الزئبق هو أحد كواكب المجموعة الشمسية” , apparently, it was not be able to infer the meaning of “Mercury” from the context.

jamiek88 · on Feb 27, 2023

When I translate that with safari it comes back as ‘mercury is one of the planets in the solar system’?

khaled · on Feb 27, 2023

In Arabic, Mercury (planet) and mercury (element) are two different words (عطارد and زئبق), here the machine translation is using the wrong mercury for the context.

bombcar · on Feb 27, 2023

This is the problem with machine translation, often a round trip gets you back something that looks reasonable.

Some of them let you click on each word and see which it picked, but that doesn't always work.

Symbiote · on Feb 27, 2023

"Quicksilver is one of the planets in the solar system."

haditab · on Feb 27, 2023

Although I’ve never tried it, I can’t imagine it would be that difficult to find an editor for the specific language to take a look at it. I doubt they need that many hours of work from the editor for it to be a significant expense.

7373737373 · on Feb 27, 2023

I hope someone creates something similar for distinguishing southeast asian scripts. Would be very useful for GeoGuessr