This website is well-meaning, but will be difficult to parse if you don't have elementary Arabic and can't tell apart ال from ل ا.
What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them. The middle case, where the sentence is correctly rendered RTL but the individual words are LTR (breaking the ligatures), is particularly common and insidious because it looks plausible to non-Arabic speakers.
> It's a bit like saying "if you have a paragraph in 'English' and it has no e's in it, it's probably not English."
This is true. You should also be able to see the word "the" in several places in English text, and if it is rendered as "ehT" or something like that, the ligature code may have a bug -- similar with reversing the ال (read as "al") pattern.
> I can't read Arabic but I can recognize that pattern. I went to https://www.bbc.com/arabic and could find numerous occurrences.
Really? I only ever learned to read just enough Arabic to read parts of the Quor'an and am by no means fluent, but I couldn't see any mistakes on that website myself.
> In general, the letter combination ال should be common.
It was a really smart way of saying you don't need to know anything at all to pattern match on this a spot a very common problem.
Also, if you haven't checked out the youtube video on that page I highly recommend it. It gives a great concise summary of the issue, and it's impressive how much I was able to learn to visually parse Arabic script with only a couple of mins into the video.
A group of flying kookaburras flap about a body of liquid. In this group, individuals distinguish from distinct sorts. Small, big, colorful or plain, all flap wings as a way to maintain afloat. It's a sight to watch as this group zips, zooms, and turns all around. It's a natural habitat for this flying squad, and you can find it in many spots around our world. This group adds to our natural world's charm and, through its distinct traits, brings a lot of joy to many.
It's a heuristic (claims to be probably helpful). You might as well complain that checking string size first as part of an equality check of strings that demonstrably have a very large distribution of sizes is obviously dumb because obviously you can construct same-length strings at will. Breaking rules of thumb is the easiest thing in the world.
But it may be totally correct for a website about kookaburras.
There's more than just corporate websites, and frankly, if a company of any meaningful size offers content in Arabic, I'd expect them to hire someone for that. Even part-time or freelance.
> But it may be totally correct for a website about kookaburras.
It isn't; "to maintain afloat" is not grammatical.
You could replace that with "stay afloat" and it would fix the grammatical error without introducing an E.
There's a similar unforced error in referring to a "group" of kookaburras rather than a "flock".
"Individuals distinguish from distinct sorts" is gibberish. I cannot tell what it's supposed to mean.
"All flap wings as a way to [stay] afloat" is, at best, very awkward; fluent English would require "flap their wings", but that would introduce an E.
"Flapping about a body of liquid" is a very odd thing to say unless the body of liquid happens to be suspended in midair, since midair is the only location where you can find birds flapping.
Who can’t tell these two character sequences from each other? Genuine question.
I think the main point the author is making is, if you are including Arabic script somewhere, take a bit of time to either do it right or hire someone to do it right for you.
Not only this but the reverse of ال is لا because Arabic letters change based on position of the letter (initial, medial, final). So ل ا is never going to be a word since you lack joining, since the ل will look like لـ if joined and لا is a distinct ligature
> What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them
I don't think it does. Just hire someone.
I guess really, the point Rami tries to make is that not a single person who reads Arabic was involved in the video game/advertising/website/etc. The errors are often so basic that a child could point them out.
It would be like if someone wrote English without any spaces between words. It's so painfully obviously wrong.
It's one thing for a small personal project to get this stuff wrong, and there's somewhat of a baseline you can try yourself, but nobody is really going to care that much. It's like those funny pictures of restaurants in China with an English name of TRANSLATION SERVER ERROR.
What the site is about and where hiring someone makes sense is anything "big budget", especially if your target market includes Arabic-speaking or Arabic-adjacent countries.
> a simple reference input string, examples of how it gets broken, and what to do to fix them.
I really like this idea. Just have a standard set of strings covering all edge cases (even the sprawling labyrinth that is bidi) with a visual reference that shows how the correct rendering of each string would look like. Each entry would also have a description of the problem and suggested solutions.
Unlike the solutions in OP, this one is pragmatic and is actually actionable for the vast majority developers. I'm kinda surprised that something like this doesn't already exist given the substantial amount of material and visual examples already available that covers the bidi algorithm.
I'm not sure I understand the idea. It sounds insane to me because I feel like there's probably trillions of combinations (and it would be insane to expect to be able to cover every specific example of incorrect text), and I thought the website was pretty clear and provided good examples.
As I understand it, the idea is just to make an Arabic test suite with enough examples (maybe a few dozen to a few hundred) such that if your program correctly renders all those examples, it’ll probably work fine with most Arabic text found in the wild. It sounds like there’s a lot of very broken software out there. Testing any Arabic input would be a big improvement for a lot of software.
> Exactly if you aren't comfortable with the idea that glyphs are distinct and order can be assumed to matter.
But isn't that exactly what it's just telling you? The order is important and if you see this very simple pattern it's wrong.
If you read any latin character based language then you surely must be OK with the idea that glyphs can be distinct? Are there many people who exclusively read languages where the order of symbols is not important?
> There's a side note here, that dyslexia can become apparent in very different written languages.[0]
If the point is that dyslexia means some people can't see the difference then that's fair, I'd not come at it from that angle. I don't see any surprising pre-assumed knowledge in this IJ/JI distinction however.
I don't think the intent is to fix every issue, but rather to tell you if you are using busted translation tools. The thing about left-to-right/right-to-left is something that a non-native Arabic speaker may not even know to look out for, even though it is a core aspect of the way the language is written.
Author was more making the point that one occurs commonly and the other order should never occur as the other order would require connecting the letters. The stick character does not connect to the left, but will on the right if that letter connects to the left.
I think the author is aiming for "You're making a prop for a film with Arabic text, or making a multilingual sign, are you doing it right?" rather than "You're writing a text editor, are you doing it right?"
I agree. It doesn't take long to get to the author's thoughts on the most common cause: lack of subject matter expertise. That's cold comfort for those who are working against a tight deadline, though.
What it really needs is a simple reference input string, examples of how it gets broken, and what to do to fix them. The middle case, where the sentence is correctly rendered RTL but the individual words are LTR (breaking the ligatures), is particularly common and insidious because it looks plausible to non-Arabic speakers.