Hacker News new | past | comments | ask | show | jobs | submit login
A glitch in an online survey replaced the word 'yes' with 'forks' (pewresearch.org)
192 points by cpeterso 32 days ago | hide | past | favorite | 99 comments



Maybe twenty years ago, Google Translate had a fun bug with the word "amistad." In Spanish->English mode, it would correctly translate this to "friendship." If you added an exclamation mark, it would translate with the exclamation mark, so "amistad!" became "friendship!" If you added more, it would add to the translation, so "amistad!!!" became "friendship!!!" Except if you used exactly five exclamation marks, no more and no less, it would translate "amistad!!!!!" to "murder!"


Probably around that time, Google once translated "Berlin" with "Paris" or something like that. IIRC, it was in a sentence about head offices. The training materials consisted of parallel documents, including manuals, where the German document would say "contact our head office in Berlin", and the French one "contact our head office in Paris." So it learned to translate capitals in certain contexts.


Similarly, there was a time when Google translated "Japan" in Japanese to "Korea" in Korean... because documents scraped from the web would say "Japan and Korea" in Japanese and "Korea and Japan" in Korean.


Common with international treaties. The trade agreement known as USMCA in the USA is known as CUSMA in Canada and T-MEC (Treaty between Mexico, USA, and Canada) in Mexico. Countries almost always put themselves first.


And NATO is also OTAN, because France insisted on having a French translation of the treaty to be treated equally with English version.

Btw NATO and OTAN being anagrams is a coincidence


> Btw NATO and OTAN being anagrams is a coincidence

Not really, many initialisms are mirrors between French and English because their very similar languages in these contexts (same names and same etymology for nouns and adjectives means same letters) and grammatical rules are the opposite of each other (reversed letter order). Eg UN and NU, EU and UE, EEC and CEE, CCP and PCC…


And then, when you can't even agree on dual initialisms, you get UTC.


Also related and because of the French: UTC. It was chosen as a concession between the logical English CUT or French TUC. Also the two are backwards, which I think is not really a coincidence due to word ordering rules in French.


is backwards even


To be fair, if they don't, no one else will!


https://www.sefaria.org/Pirkei_Avot.1.14?lang=bi

"He used to say, if I am not for myself, who will be for me?"


I really like this one, it basically learned that "first country" is "this country", and "second country" is the "other country".


I've seen something similar, briefly and ages ago now, with it doing the equivalent of* en:"I speak English" -> fr:"Je parle français".

* It might not have been French, I remember the effect not the detail


An interesting occurrence I've seen is with flag emojis, for example translating some English text with an England/UK/US flag after it will sometimes turn the flag into that of a different country where the language of the translated text is spoken.


As Ionesco may have said, the French for London is Paris.



"I will proceed to type the characters _e_ and _gu_"

https://www.youtube.com/watch?v=3-rfBsWmo0M

I never found out what a DECEARING EGG is


As a native Spanish speaker it took me a moment to understand why "yes" was being translated as "forks" until it clicked and it's not an error.

In Spanish "Ye" is how we call the letter Y and "La ye" is a word used, at least in the version of Spanish spoken where I come from, to refer to the place where a road forks. Hence the fork in the road is "La ye" and the plural would be "Las yes" or the forks. In this context forks is referring to where the road forks not to the eating utensil (which would be "tenedores").


I am also a native Spanish speaker and I haven't ever heard this. I have always called and heard from every other Spanish speaker the letter Y as i griega (greek i)

Not saying that this is wrong, in fact one can check the Spanish wikipedia to confirm that "ye" is a valid naming for Y but definitely not used where I live nor for the letter or a fork in the road.


Wonderful explanation, thank you!


Amazing!


I think I have a better one.

I was trying to download some software from a Japanese website (this was about 10 yrs ago I think). There was an entire survey you had to fill out first. Since I speak no Japanese, I naturally used Google Translate (to Polish, not even to English).

That survey had a "gender" field, and the options were something like "person" and "woman".

Then there are the automatic (mis)translations of button labels and other similar strings in software, where the translating tool often only has a single and ambiguous word to go on, with no context whatsoever.

THe funniest ones I've seen are "branch" (satellite office) versus "branch" (of a tree), "book" (a flight) versus "book" (something you read), "character" (ASCII or Unicode) versus "character" (in a story), "clear" (to remove all) versus "clear" (translated as "cloudless", referring to the sky), "letters" (delivered by a postman) versus "letters" (and not digits), "rate" (how fast. i.e. speech rate) versus "rate" (how much you charge per minute), "prune" (remove all) versus "prune" (dried plum), "manual" (and not automatic) versus "manual" (a user manual), "clubs" (places you go to) versus "clubs" (and not diamonds or spades, "queen of clubs" hav a particularly interesting meaning here), "number" (that you call) versus "number" (of things), "at" (@) versus at (home), "tab" (key) versus "tab" (in a browser), "close" (to me) versus "close" (something), "back" (button) versus "back" (a body part), as well as all the "party" (adventuring team in an RPG) versus "party" (legal entity, i.e. "third party") versus "party" (as in an event you have fun at) shenanigans.

Mistakes like these often hide in accessibility labels, and are hence far more obvious to screen reader users. In normal UI, they're usually found quickly enough that users never notice, but accessibility labels are often overlooked when testing translations.


There was a fun one going around where the title of the game "Watchdogs" had turned into Norwegian for "look at dogs".

Re: gender, there's all sorts of chaos going between languages which have no grammatical gender (CJK languages, Finnish etc) and those which have mandatory grammatical gender (Latin languages), because the information may simply not be available in the same sentence. So you get obvious misgenderings of "(female name) .. he" or vice versa.


And then there's the "man" (not a female) versus "man" (as in "human", i.e. "first ,man on the moon" / "all men are created equal").


Hah! I fondly remember some Linux distro that had the Desktop translated to a literal table top.

Translating without context must be really tough. Especially with a language like English that is utterly ambiguous.


In Chinese Windows, Desktop has been officially translated as literal table top since at least Windows 95. I don't see the issue here XD


Now I wonder if English speaking people think of it as a literal table top. I always thought it stands on its own.


I've had a similar experience where Dutch surnames were translated from English to Dutch by Excel for some reason. Since many Dutchman have a surname prefixed with the Dutch word "van" (which means "of") Excel dutifully translated it to "Busje", which meant that many of our clients suddenly were called "Lieke Busje Lexmond" or "Vincent Busje Gogh".

It got a chuckle from our marketing department which caught the error before badges were printed for the very high-profile event we had planned for the next day.


My favourite part of the Dutch language is how it adds the -je diminutive to other words to make new words. A van? That’s just like a cute baby bus.


This made me wonder what a poffert is, if a poffertje is a cute baby version of them. So I Googled it. Sure enough, it's a cake.


That would be a poffer, as the interstitial t is added to words ending in r when forming the diminutive.

And no, I didn't know what a poffer was either. :)


Looking it up a poffer is a kind of hat, and a poffert is a kind of cake. "Poffert" is definitely an odd word though.


That would be a "puffer" or light, fluffy cake.


French localisation of OnlyFans used to translate the word 'tip' (quite essential to the OF experience) both as 'bout', 'tip' in the sense of 'end', and as 'astuce', 'tip' in the sense of 'piece of advice'.


Must've been high profile indeed if both Vincent van Gogh and Lieke van Lexmond were there


A colleague of mine got a Christmas card from Microsoft adressed to "New Year's Eve"...

Hi Silvester!


I'm wondering where the "forks" translation came from in the first place. Google Translate used to be fairly reliable for simple translations, but I've seen several examples in the last couple of years where it goes batshit crazy, including starting to loop hallucinating sentences on repeat. Is absolutely nobody checking how well it performs before deploying nowadays?


Probably, interpreting "Yes" as the plural of "Y-junction of a path".


If you have a plan for automated checking the output of Google Translate against all possible character input strings, be sure to mention it to your recruiter as part of the hiring process.

More seriously: Google Translate's bread-and-butter data source is documents that were human-translated from one language to another with high reliability (such as UN publications). That turns out to work remarkably well for building a neural net that can extrapolate how one sentence should translate to the same sentence in another language. But like most neural networks, it's vulnerable to garbage-in, garbage-out: much like you can get an animal detector to hallucinate "zebra!" if you feed it a noise-pattern as input, if you feed it character sequences that aren't actually words in the input language, it'll try to extrapolate what reality should be between all the corpus it's seen and you'll get garbage on the output side.

Since the tool doesn't actually know what words mean, it has no way, at present, to know "Yes" isn't a Spanish word (and as other commenters have mentioned, it may actually be "a Spanish word" in one weird context in one weird document somewhere in the corpus of all translated documents accessible from the Internet... Or some doc somewhere contains a close-enough typo in the Spanish input document that is over-reflected in the output because no other document contradicts the typo's apparent translation).


There is no "yes" in Spanish.

Reminds me of Inodoro Pereira, a comic strip character in Argentina who was a peon in the countryside and rather ignorant, and he'd sometimes respond affirmatively with "yes como dijo un tal Chespier" ("Yes, as some Shaespier once said").


really? i thought that was a latin problem (sic)

doesn’t spanish have sí? or is it something like portuguese where the verb conjugated to an affirmation is preferred over something like sí?


OP meant that "yes" is not a word in Spanish. The word "sí" is indeed the affirmative and it's used mostly the same as yes in English.


oh, I see now! that makes more sense when I reread it.


The comic strip character was acting all knowledgeable by quoting Shakespeare as saying "yes" when the character meant "sí", but misspelling Shakespeare's name as "Chespier", something like misspelling it "Shaespier" in English.



"our vendor now matches the lightbox scripting to the language of the text on the webpage"

"The auto-translate pop-up may still be triggered on occasion, but the HTML in the survey wrapper prevents it from changing the content on the webpage."

I have no idea what either of these sentences mean, and they're both very important to the fix.


My guess (it would be nice if they actually said...) is that they were missing the required lang attribute on their HTML.

  <html lang=en>
If not defined it will default to unknown (not to the user's locale) and so this makes Chrome guess. And there wasn't much text in the lightbox (which might be a different page?) for the browser to infer from.


That's probably true, however I'd be really curious to know why Chrome's guess for "yes" is the Spanish word for "Y-junctions" instead of the English word yes.


This was my immediate thought, but it doesn't sound like what they did. They also mention they still get the Google translate pop up - which suggests they didn't.

Though it sounds like they serve many languages, so they'd need to do each survey individually.

Maybe the survey part is an iframe?


#1 is probably to subset the loaded lightbox text-localization files to only the survey language. And #2 is to use the translate=no HTML attribute (or its predecessor) to disable translation of that section.


I also got disappointed when they skipped how they actually detected it with just:

> With a little more experimenting, we were able to identify translation as the root of the issue.

Maybe it’s just me but I was really curious to what finally pointed out the issue but all I got was “experiments”.


Perhaps the article needs to be read as if prefixed with:

"I was asked to write the following explanation for the public, to put on our website, and talked to the programmers. I have idea what they were saying. I took some notes, which likewise mean nothing to me, but here goes ..."


> For some respondents, this prompted their browser to believe our survey was written in a language other than English (even though, again, it was in English) and ask if they wanted the page to be translated to English – or, we think, automatically try to translate the page to English.

I goddamn hate this "feature" so much. Especially since it sometimes resets and then I have to find out where the fuck Google moved the disable button to again.

No Google, I speak fluent German and English and can reasonably read Croatian - if I wanted a translation I would explicitly ask for it myself thanks.


You're going to love YouTube automatically translating video titles to your geo-ip language...

For the next project-manager-promotion-worthy feature, they also do auto-dubbing of the speech now!


They now sometimes dub the audio track. The result is about as horrible as one would imagine. Whoever decided to turn this on by default clearly didn't give a damn.


The best part is YouTube has a bug that would sometimes force a dubbed language; as in, a video originally with an English audio track would get stuck between French and Italian even when you try to manually change the language back to English.


> You're going to love YouTube automatically translating video titles to your geo-ip language...

This is the worst feature of YouTube for me. Any idea how I can turn this off and show the original language title?


It's not automatic, channels opt into it. It's in fact a great way to identify spammy channels, because

1. They know YouTube's algorithm rewards them for it

2. They know it makes their stuff objectively worse

and the only channels who know that and still opt into it, are channels which have no shame and are pure view-bait trash.

So when I see a bork-Norwegian video title like "Du gjetter aldri hva som skjedde neste!" I hit "never recommend this channel".


#1 is the crux. People have come to expect high quality videos that require teams of sometimes dozens of people, and creators are held hostage by YouTube's algorithm that keeps on changing and is an effective black box.


I have a standard message for people who use machine translation on their "content":

"I know that machine translation exists. I know where to find it, should I need it. When you push it without asking, half the time I have to translate back from machine-junk to your language, in order to figure out what the hell you were trying to say. You make more work for me, not less. MACHINE TRANSLATION SHOULD ALWAYS BE OPT-IN."


I run an app (Aerolith.org) for studying Scrabble words. For years I would get weird bug reports (https://github.com/domino14/Webolith/issues/331), where at the end of each round, when a user viewed the definitions, the words for definitions would get randomly replaced with other unrelated words. I double and triple checked the code; since I had recently moved the word-related logic to a Go microservice I assumed I had some crazy race condition. I remember trying to replicate it with many simultaneous requests, looking for memory leaks, thinking there was something wrong with the sqlite driver, etc.

Finally, I figured out that it was a Google Chrome auto-translation issue. For example the word JIBER was replaced with "BECAUSE OF" because JIBER means BECAUSE OF in Kurdish. There were many other similar ridiculous cases.


There is a way to explicitly specify the language of the document in the HTML metadata: `<html lang="en">`.

Google doesn't guess if it has strong signal on what the language is.


Once, a customer complained that we had corrupted their documents on our platform. Seemingly out of nowhere, words like "Messi" and "Zidane" started appearing countless times in the titles and content of their documents - overnight. It was so bizarre and random. We eventually found out they had a broken browser extension (something similar to Google Translate, I don't remember).

I was only able to find a single instance of such a bug elsewhere, on a Microsoft support forum — the guy was furious that Outlook would insert 'Messi' and 'Zidane' into his emails every time he tried to send them. Something very specific seemed to trigger it.


> Something very specific seemed to trigger it.

Being not a fan of La Liga???


Windows recently had [Compress to postcode] in its context menu for en-GB users.


Oh this is absolutely delightful, both in how complex the bug is and the actual result.


Last year there was a question in a foreign language on the trivia league LearnedLeague that had to be thrown out because some peoples' browsers ended up translating the question, making it much easier than anticipated for a subset of users, without them opting in or necessarily even being aware of it.


I remember this one! The question was:

“ Un molcajete y tejolote se usa comúnmente en los restaurantes mexicanos para preparar ¿qué plato popular de aguacate que se suele servir con totopos?”

The idea is that a monolingual English speaker might recognize enough words here to make a guess. Or that people might know Spanish. (I’m somewhere in between, my browser did not translate the question, and I got it right.)


For years, Chrome would offer to “translate from Danish” if the contents of the page are just “403 Forbidden”.


Earlier today, I found a bug in Discord where one of the login forms in Japanese uses 「はい」 ("Yes") as the label for the "Log in" button. The button in question is right beneath a label stating 「パスワードをお忘れですか」("Forgot password?").

X / Twitter is full of funny translations, too. My favorite is 「ポストさんを報告」 (lit. "Report Mr. Post").


As a matter of standard practice I use [forks] and [spoons] in otherwise typical yes/no spaces. When results are tallied I simply retcon my yes/no proxy assignments towards whatever result was my target outcome. Publishable outcomes every time. P hacking is a pain and really, why bother when I know the results are… aspirational… regardless?


FWIW, "forks" is "tenedores" in Spanish... not that that illuminates the mystery.


Yes also means "Y"s in some dialects of Spanish, like the plural of the letter Y. Maybe since the Y looks like a fork there must be some bizarre neural network connection there?


It's not unheard-of for English speakers to use it that way — "I was driving down the road and came to a Y", meaning a fork or branch.


And then it got turned into a word: wye.


Spanish youngster wants to learn English, and finds the office of a language school. He enters, finds the office of the principal, and knocks on the door. After a short while, he hears: "If, if, between, between!"


This unfortunately reminds me of the level of english guidance my "teacher" would provide at 5th grade down here in Brazil. At least I learned to use a dictionary.


There's a joke in Portuguese where "entre, meu bem" is translated to English as "between, my well" (in context it should be "come in, my dear").


I gave this a little bit of thought and here is my best hypothesis.

The token "yes", which is not a word in Spanish, might have been tried as "Y es".

That in turn being taken as a sentence fragment, meaning "It's a Y". (Spanish isn't verb-final, but let's go with it).

The letter Y is a symbol of forking.

If something is likened to a Y, that's a way of saying it forks.

Anyway, translation tools sometimes do weird things when the input is a sentence fragment or just a word or two.

UPDATE:

When I type "Y es" into Google translate, tagging it as Spanish, those two tokens still go to "Forks". This adds a little bit of support to my hypothesis.

However, "Y e s" also translates to "Forks".

Inputs like "A es", "B es", "C es" don't do anything; they go to "A is", "B is", ...

Moreover "Y es verde" and "Y e s verde" and "Yes verde" all go to "and it's green".

UPDATE:

It is false that "yes" is not a Spanish word. The letter Y is called "ye", and the plural of that is "yes". Just like when we say that giggle has "three gees", we are using a plural of the letter name "gee". https://es.wikipedia.org/wiki/Y


Just to be clear, “yes” does not mean “forks” in Spanish.

Google translate received a huge update a few months ago and it’s been a complete clusterfuck.


It does correctly change “cuatro candelas” to “fork handles”


Did your auto correct change "four candles"? Or did i miss the joke?



Say the supposedly Google version and your translation out loud.


Si


If you think you understand software, think again. We are building complex machines upon towers of sand!


For a while now, I've had the idea that some malicious person could have a browser extension that detects and modifies news sources to spread propaganda. We've already got joke versions of this, s/{some annoying topic}/{joke} — and for the moment, this is still in the "ha ha what a silly bug" domain.

One day, I expect it won't be silly. It'll be some more subtle transformation that rewrites one party, or one person, as having constantly sinister motives.

Will we even hear about it, if such a thing comes our way?


OpenAI uses machine translation for at least ChatGPT's UI, which became apparent to me thanks to a similar swap.

Thankfully there's no real way to report it as an issue either that I could find, so it shall remain as a fun stain for people to run into and mock I suppose.


Changing text on a page is potentially just as bad as the time Xerox changed the numbers on copied blueprints due to compression.


And on copied financial reports.


Did anyone think to ask Google what’s happening? This can be easily reproduced still and n Google Translate. Strongly suggest that some SDE years ago thought this was an unlikely case that someone would ask to translate the word yes from Spanish to English. Boom fun Easter egg.


Eventually with AI translation, and the proliferation of machine generated text everywhere, the word "yes" will be the way people say "forks" in Spanish! You would say at a restaurant "Necesitamos dos yes, por favor."


So basically Pew is using garbage adware that pops up an intrusive popup that screws up automated language detection, Chrome is way too overeager to auto translate, and Google Translate is starting to go to seed now that everyone uses LLMs.


The article says the popup which caused this issue was used to display survey instructions, not ads.


They're probably using an ad network to emit spot-surveys where the ads would show up, although details unclear from the article.

(... honestly, how else will they gather survey data? It's not like they can harass people in the malls Americans don't go to anymore).


Google Translate turning a non-word in a source language into an unrelated word in the target language is an old, known misbehavior. I don't think LLMs changed anything there.


This has meme potential. All in favour, raise your forks!


Fawk yes!


Translate considers "Yes" to be "Ye" in plural form - https://translate.google.com/details?sl=es&tl=en&text=yes&op...

"Ye" is the "letter Y" in Spanish.

Guessing the next step it looks like a fork in the road.

A perfect translate should see English words and not translate, but if you translate "Spanish" to English" after giving it English input undefined should not be unexpected.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: