Maybe twenty years ago, Google Translate had a fun bug with the word "amistad." In Spanish->English mode, it would correctly translate this to "friendship." If you added an exclamation mark, it would translate with the exclamation mark, so "amistad!" became "friendship!" If you added more, it would add to the translation, so "amistad!!!" became "friendship!!!" Except if you used exactly five exclamation marks, no more and no less, it would translate "amistad!!!!!" to "murder!"
Probably around that time, Google once translated "Berlin" with "Paris" or something like that. IIRC, it was in a sentence about head offices. The training materials consisted of parallel documents, including manuals, where the German document would say "contact our head office in Berlin", and the French one "contact our head office in Paris." So it learned to translate capitals in certain contexts.
Similarly, there was a time when Google translated "Japan" in Japanese to "Korea" in Korean... because documents scraped from the web would say "Japan and Korea" in Japanese and "Korea and Japan" in Korean.
Common with international treaties. The trade agreement known as USMCA in the USA is known as CUSMA in Canada and T-MEC (Treaty between Mexico, USA, and Canada) in Mexico. Countries almost always put themselves first.
> Btw NATO and OTAN being anagrams is a coincidence
Not really, many initialisms are mirrors between French and English because their very similar languages in these contexts (same names and same etymology for nouns and adjectives means same letters) and grammatical rules are the opposite of each other (reversed letter order). Eg UN and NU, EU and UE, EEC and CEE, CCP and PCC…
Also related and because of the French: UTC. It was chosen as a concession between the logical English CUT or French TUC. Also the two are backwards, which I think is not really a coincidence due to word ordering rules in French.
An interesting occurrence I've seen is with flag emojis, for example translating some English text with an England/UK/US flag after it will sometimes turn the flag into that of a different country where the language of the translated text is spoken.
As a native Spanish speaker it took me a moment to understand why "yes" was being translated as "forks" until it clicked and it's not an error.
In Spanish "Ye" is how we call the letter Y and "La ye" is a word used, at least in the version of Spanish spoken where I come from, to refer to the place where a road forks. Hence the fork in the road is "La ye" and the plural would be "Las yes" or the forks. In this context forks is referring to where the road forks not to the eating utensil (which would be "tenedores").
I am also a native Spanish speaker and I haven't ever heard this. I have always called and heard from every other Spanish speaker the letter Y as i griega (greek i)
Not saying that this is wrong, in fact one can check the Spanish wikipedia to confirm that "ye" is a valid naming for Y but definitely not used where I live nor for the letter or a fork in the road.
I was trying to download some software from a Japanese website (this was about 10 yrs ago I think). There was an entire survey you had to fill out first. Since I speak no Japanese, I naturally used Google Translate (to Polish, not even to English).
That survey had a "gender" field, and the options were something like "person" and "woman".
Then there are the automatic (mis)translations of button labels and other similar strings in software, where the translating tool often only has a single and ambiguous word to go on, with no context whatsoever.
THe funniest ones I've seen are "branch" (satellite office) versus "branch" (of a tree), "book" (a flight) versus "book" (something you read), "character" (ASCII or Unicode) versus "character" (in a story), "clear" (to remove all) versus "clear" (translated as "cloudless", referring to the sky), "letters" (delivered by a postman) versus "letters" (and not digits), "rate" (how fast. i.e. speech rate) versus "rate" (how much you charge per minute), "prune" (remove all) versus "prune" (dried plum), "manual" (and not automatic) versus "manual" (a user manual), "clubs" (places you go to) versus "clubs" (and not diamonds or spades, "queen of clubs" hav a particularly interesting meaning here), "number" (that you call) versus "number" (of things), "at" (@) versus at (home), "tab" (key) versus "tab" (in a browser), "close" (to me) versus "close" (something), "back" (button) versus "back" (a body part), as well as all the "party" (adventuring team in an RPG) versus "party" (legal entity, i.e. "third party") versus "party" (as in an event you have fun at) shenanigans.
Mistakes like these often hide in accessibility labels, and are hence far more obvious to screen reader users. In normal UI, they're usually found quickly enough that users never notice, but accessibility labels are often overlooked when testing translations.
There was a fun one going around where the title of the game "Watchdogs" had turned into Norwegian for "look at dogs".
Re: gender, there's all sorts of chaos going between languages which have no grammatical gender (CJK languages, Finnish etc) and those which have mandatory grammatical gender (Latin languages), because the information may simply not be available in the same sentence. So you get obvious misgenderings of "(female name) .. he" or vice versa.
I've had a similar experience where Dutch surnames were translated from English to Dutch by Excel for some reason. Since many Dutchman have a surname prefixed with the Dutch word "van" (which means "of") Excel dutifully translated it to "Busje", which meant that many of our clients suddenly were called "Lieke Busje Lexmond" or "Vincent Busje Gogh".
It got a chuckle from our marketing department which caught the error before badges were printed for the very high-profile event we had planned for the next day.
French localisation of OnlyFans used to translate the word 'tip' (quite essential to the OF experience) both as 'bout', 'tip' in the sense of 'end', and as 'astuce', 'tip' in the sense of 'piece of advice'.
I'm wondering where the "forks" translation came from in the first place. Google Translate used to be fairly reliable for simple translations, but I've seen several examples in the last couple of years where it goes batshit crazy, including starting to loop hallucinating sentences on repeat. Is absolutely nobody checking how well it performs before deploying nowadays?
If you have a plan for automated checking the output of Google Translate against all possible character input strings, be sure to mention it to your recruiter as part of the hiring process.
More seriously: Google Translate's bread-and-butter data source is documents that were human-translated from one language to another with high reliability (such as UN publications). That turns out to work remarkably well for building a neural net that can extrapolate how one sentence should translate to the same sentence in another language. But like most neural networks, it's vulnerable to garbage-in, garbage-out: much like you can get an animal detector to hallucinate "zebra!" if you feed it a noise-pattern as input, if you feed it character sequences that aren't actually words in the input language, it'll try to extrapolate what reality should be between all the corpus it's seen and you'll get garbage on the output side.
Since the tool doesn't actually know what words mean, it has no way, at present, to know "Yes" isn't a Spanish word (and as other commenters have mentioned, it may actually be "a Spanish word" in one weird context in one weird document somewhere in the corpus of all translated documents accessible from the Internet... Or some doc somewhere contains a close-enough typo in the Spanish input document that is over-reflected in the output because no other document contradicts the typo's apparent translation).
Reminds me of Inodoro Pereira, a comic strip character in Argentina who was a peon in the countryside and rather ignorant, and he'd sometimes respond affirmatively with "yes como dijo un tal Chespier" ("Yes, as some Shaespier once said").
The comic strip character was acting all knowledgeable by quoting Shakespeare as saying "yes" when the character meant "sí", but misspelling Shakespeare's name as "Chespier", something like misspelling it "Shaespier" in English.
"our vendor now matches the lightbox scripting to the language of the text on the webpage"
"The auto-translate pop-up may still be triggered on occasion, but the HTML in the survey wrapper prevents it from changing the content on the webpage."
I have no idea what either of these sentences mean, and they're both very important to the fix.
My guess (it would be nice if they actually said...) is that they were missing the required lang attribute on their HTML.
<html lang=en>
If not defined it will default to unknown (not to the user's locale) and so this makes Chrome guess. And there wasn't much text in the lightbox (which might be a different page?) for the browser to infer from.
That's probably true, however I'd be really curious to know why Chrome's guess for "yes" is the Spanish word for "Y-junctions" instead of the English word yes.
This was my immediate thought, but it doesn't sound like what they did. They also mention they still get the Google translate pop up - which suggests they didn't.
Though it sounds like they serve many languages, so they'd need to do each survey individually.
#1 is probably to subset the loaded lightbox text-localization files to only the survey language. And #2 is to use the translate=no HTML attribute (or its predecessor) to disable translation of that section.
Perhaps the article needs to be read as if prefixed with:
"I was asked to write the following explanation for the public, to put on our website, and talked to the programmers. I have idea what they were saying. I took some notes, which likewise mean nothing to me, but here goes ..."
> For some respondents, this prompted their browser to believe our survey was written in a language other than English (even though, again, it was in English) and ask if they wanted the page to be translated to English – or, we think, automatically try to translate the page to English.
I goddamn hate this "feature" so much. Especially since it sometimes resets and then I have to find out where the fuck Google moved the disable button to again.
No Google, I speak fluent German and English and can reasonably read Croatian - if I wanted a translation I would explicitly ask for it myself thanks.
They now sometimes dub the audio track. The result is about as horrible as one would imagine. Whoever decided to turn this on by default clearly didn't give a damn.
The best part is YouTube has a bug that would sometimes force a dubbed language; as in, a video originally with an English audio track would get stuck between French and Italian even when you try to manually change the language back to English.
#1 is the crux. People have come to expect high quality videos that require teams of sometimes dozens of people, and creators are held hostage by YouTube's algorithm that keeps on changing and is an effective black box.
I have a standard message for people who use machine translation on their "content":
"I know that machine translation exists. I know where to find it, should I need it. When you push it without asking, half the time I have to translate back from machine-junk to your language, in order to figure out what the hell you were trying to say. You make more work for me, not less. MACHINE TRANSLATION SHOULD ALWAYS BE OPT-IN."
I run an app (Aerolith.org) for studying Scrabble words. For years I would get weird bug reports (https://github.com/domino14/Webolith/issues/331), where at the end of each round, when a user viewed the definitions, the words for definitions would get randomly replaced with other unrelated words. I double and triple checked the code; since I had recently moved the word-related logic to a Go microservice I assumed I had some crazy race condition. I remember trying to replicate it with many simultaneous requests, looking for memory leaks, thinking there was something wrong with the sqlite driver, etc.
Finally, I figured out that it was a Google Chrome auto-translation issue. For example the word JIBER was replaced with "BECAUSE OF" because JIBER means BECAUSE OF in Kurdish. There were many other similar ridiculous cases.
Once, a customer complained that we had corrupted their documents on our platform. Seemingly out of nowhere, words like "Messi" and "Zidane" started appearing countless times in the titles and content of their documents - overnight. It was so bizarre and random. We eventually found out they had a broken browser extension (something similar to Google Translate, I don't remember).
I was only able to find a single instance of such a bug elsewhere, on a Microsoft support forum — the guy was furious that Outlook would insert 'Messi' and 'Zidane' into his emails every time he tried to send them. Something very specific seemed to trigger it.
Last year there was a question in a foreign language on the trivia league LearnedLeague that had to be thrown out because some peoples' browsers ended up translating the question, making it much easier than anticipated for a subset of users, without them opting in or necessarily even being aware of it.
“ Un molcajete y tejolote se usa comúnmente en los restaurantes mexicanos para preparar ¿qué plato popular de aguacate que se suele servir con totopos?”
The idea is that a monolingual English speaker might recognize enough words here to make a guess. Or that people might know Spanish. (I’m somewhere in between, my browser did not translate the question, and I got it right.)
Earlier today, I found a bug in Discord where one of the login forms in Japanese uses 「はい」 ("Yes") as the label for the "Log in" button. The button in question is right beneath a label stating 「パスワードをお忘れですか」("Forgot password?").
X / Twitter is full of funny translations, too. My favorite is 「ポストさんを報告」 (lit. "Report Mr. Post").
As a matter of standard practice I use [forks] and [spoons] in otherwise typical yes/no spaces. When results are tallied I simply retcon my yes/no proxy assignments towards whatever result was my target outcome. Publishable outcomes every time. P hacking is a pain and really, why bother when I know the results are… aspirational… regardless?
Yes also means "Y"s in some dialects of Spanish, like the plural of the letter Y. Maybe since the Y looks like a fork there must be some bizarre neural network connection there?
Spanish youngster wants to learn English, and finds the office of a language school. He enters, finds the office of the principal, and knocks on the door. After a short while, he hears: "If, if, between, between!"
This unfortunately reminds me of the level of english guidance my "teacher" would provide at 5th grade down here in Brazil. At least I learned to use a dictionary.
I gave this a little bit of thought and here is my best hypothesis.
The token "yes", which is not a word in Spanish, might have been tried as "Y es".
That in turn being taken as a sentence fragment, meaning "It's a Y". (Spanish isn't verb-final, but let's go with it).
The letter Y is a symbol of forking.
If something is likened to a Y, that's a way of saying it forks.
Anyway, translation tools sometimes do weird things when the input is a sentence fragment or just a word or two.
UPDATE:
When I type "Y es" into Google translate, tagging it as Spanish, those two tokens still go to "Forks". This adds a little bit of support to my hypothesis.
However, "Y e s" also translates to "Forks".
Inputs like "A es", "B es", "C es" don't do anything; they go to "A is", "B is", ...
Moreover "Y es verde" and "Y e s verde" and "Yes verde" all go to "and it's green".
UPDATE:
It is false that "yes" is not a Spanish word. The letter Y is called "ye", and the plural of that is "yes". Just like when we say that giggle has "three gees", we are using a plural of the letter name "gee". https://es.wikipedia.org/wiki/Y
For a while now, I've had the idea that some malicious person could have a browser extension that detects and modifies news sources to spread propaganda. We've already got joke versions of this, s/{some annoying topic}/{joke} — and for the moment, this is still in the "ha ha what a silly bug" domain.
One day, I expect it won't be silly. It'll be some more subtle transformation that rewrites one party, or one person, as having constantly sinister motives.
Will we even hear about it, if such a thing comes our way?
OpenAI uses machine translation for at least ChatGPT's UI, which became apparent to me thanks to a similar swap.
Thankfully there's no real way to report it as an issue either that I could find, so it shall remain as a fun stain for people to run into and mock I suppose.
Did anyone think to ask Google what’s happening? This can be easily reproduced still and n Google Translate. Strongly suggest that some SDE years ago thought this was an unlikely case that someone would ask to translate the word yes from Spanish to English. Boom fun Easter egg.
Eventually with AI translation, and the proliferation of machine generated text everywhere, the word "yes" will be the way people say "forks" in Spanish! You would say at a restaurant "Necesitamos dos yes, por favor."
So basically Pew is using garbage adware that pops up an intrusive popup that screws up automated language detection, Chrome is way too overeager to auto translate, and Google Translate is starting to go to seed now that everyone uses LLMs.
Google Translate turning a non-word in a source language into an unrelated word in the target language is an old, known misbehavior. I don't think LLMs changed anything there.
Guessing the next step it looks like a fork in the road.
A perfect translate should see English words and not translate, but if you translate "Spanish" to English" after giving it English input undefined should not be unexpected.