"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." (Latin)
became:
"Lauren est similis, lycopersiciSusceptibility rutrum, sed ego te nocere me et vultus realis. Si liber facile adducere ad accedere nervi ministerium." (Latin)
Doing a final pass manually to English:
"Lauren is like, tomato makeup, but i'm hurting you and you look so real. Bring a book if you can easily access the nerve service."
Note that the original isn't actual grammatically correct Latin. It's snippets of Cicero that were altered specifically to create meaningless sample text.
The quick brown fox jumps over the lazy dog -> Fox is expected to become Fox's knowledge
The five boxing wizards jump quickly -> Scientists were quickly thrown.
Pack my box with five dozen liquor jugs. -> Put five wine calls in my box and release.
Sphinx of black quartz, judge my vow -> Quartz black crane support, my marriage broke
---
Spanish pangrams (translated to English after the final step):
Young man poisoned with whiskey, what a figure you display! -> The young man who nurtures murder is his fire!
José bought an old panpipe in Peru. Excusing herself, Sofia threw her whiskey down the sidewalk drain. -> Buy a glitter in juice in Peru. Sofia Sophia asked Sufia.
The swift Indian bat was happily eating thistle and kiwi. The stork played the saxophone behind the straw palenque. -> Fast Indian on Wali and Kiwi. After the saxophone, Somka played the balconies.
Benjamin ordered a kiwi and strawberry drink. Noah, shamelessly, the most exquisite champagne on the menu. -> Enlie someone asked to get rid of the events and stress. Noah's knife, Noah, leaving them and led them to leave.
It might be caused by words that are compounded from two words that now have similar meanings, such as Wolfhound or Taxicab? Those words could sometimes get translated to two separate words in another language
For instance, Google Translate from English to Chinese and back to English does wolfhound > 狼狗 > "wolf dog"
“How much wood would a woodchuck chuck” -> “Number of numbers”
I’ll concede it’s not an easy sentence to make sense of in another language, let alone 10 and back…It does still make me question how much damage has been caused (in the most general sense) due to meaning being lost in translation between people.
In the movie “Arrival”—which is mainly about establishing communication with an extraterrestrial species that visited earth goes fairly in depth into what I consider very realistic challenges and possible methodologies of teaching the aliens our (English) language, and us learning theirs. They even built a rudimentary translator once they had a solid enough understanding. Would highly recommend watching it if you like this kind of content.
In contrast, I just read Project Hail Mary, by Andy Weir, author of the Martian, and boy is translation between alien languages nice and trivial in that.
Isn't a big part of Arrival that the aliens don't experience time the same way we do? Or something along those lines. When something that fundamental is different, that has to become a major hurdle in communication.
Whereas the aliens in Project Hail Mary are still fundamentally similar organisms. And you just have to come up with a common baseline to understand each other. Humans have been able to learn to translate between different languages for a long time now. It seemed no different than that to me. Especially when you add in machine assistance.
I would actually be quite curious though. If you put two people in a room, and each one only speaks one distinct language, and their only goal is to communicate with each other. How long before they can effectively communicate? Now it wouldn't be a perfect comparison because there is likely shared body language. It would still be interesting.
Im contractually obligated to mention the novel The Dragons Egg, which followed the interactions between humans and life that evolved on a neutron star. Major time issues in that one. Fun read!
> Isn't a big part of Arrival that the aliens don't experience time the same way we do? Or something along those lines.
Yes, but an important detail in the plot (possibly a spoiler) is that they experience time differently _because_ of the structure of their language, sort of an extreme outcome of the Sapir-Whorf hypothesis. A human learning their language will eventually experience time in the same way. Notably this is the pattern used by most of the stories in the collection that Arrival was derived from (Stories of your life and others). The author takes a phrase, hypothesis, etc and creates a story out of an extreme and concretized outcome from it. All of the stories in the collection are well worth reading IMO.
> If you put two people in a room, and each one only speaks one distinct language, and their only goal is to communicate with each other. How long before they can effectively communicate?
They can (somewhat) effectively communicate right from the start, and third party observers who speak neither language can mostly follow along too. Demo: https://www.youtube.com/watch?v=V3qqYyQC9ww
> If you put two people in a room, and each one only speaks one distinct language, and their only goal is to communicate with each other. How long before they can effectively communicate?
I would sign up for that even, maybe. I think the main problem with that idea is that willingness to spend time in such an endeavour is highly positively correlated with knowledge of certain well-spread Indo-European languages. ;)
Back in the 80s, there was a small piece in Omni magazine where they talked about early machine translation efforts and the phrase, “Out of sight, out of mind,” translated to Russian and back became “Invisible insanity.”
This process gave me “To see, thoughts.”
My writing group’s website was scraped and translated into Chinese for a gambling spam site (I don’t really understand why). There were some oddly poetic bits in the Google translation results for the page that I looked at, my favorite being one member's name, Billy Gee, became Billy Wow.
I have a lot of international friends. A majority of them speak fluent English. But sometimes I like to send them short messages in their native languages which I don't speak, just for fun. When I started doing this, ten years ago, machine translation was far worse than it is now, so I would often pass a translation through a few languages and back to make sure that what I was saying held a pretty consistent meaning when translated. If not, I could re-work the original into something simpler or more literal.
While doing that, I often thought it would be fun to play the kids game "Telephone" with my international friends where instead of the classic version of just whispering what you heard to the person next to you in the same language and seeing what comes out at the other end, each person would translate what they heard into their native language and before passing it on. It's a little tricky to play in real life because you have to line people up according to their language pairs, but doing it with machine translation works, and it's a lot of fun to see what happens with this.
Reminds me of the message the pope of Rome sent to the Mongol khan. It did a bad job of explaining Christianity even in it's original Latin, but since there were no Latin to Mongol speakers it had to be translated through numerous other interpreters. Unsuprisingly the khan didn't understand anything.
Here's the begining translated to English:
> The immense benevolence of God the father, considering with ineffable piety the fall of mankind, which came to ruin by the sin of the first man, and wishing with great love to mercifully revive him whom diabolic envy has made prostrate by deceitful suggestion, has sent his only-begotten son, sharing the same nature with him, from the highest throne of heaven to the lowest dirt of the world, he who, conceived in the womb of the pre-elected virgin by the grace of the Holy Spirit and endowed there with the clothing of human flesh, thenceforth appeared to all, having exited from the enclosed gate of his mother's virginity.
And here it is passed through this 10 times translator, perhaps approximating what the khan heard:
> Their women's father believes that the kidney is the most quantity of waste in the sky. He came into the world in rape of rape and done with the human body. The sword appears to my mom.
Relatedly, Khan Güyük likely already understood the basic tenets of Christianity as Christianity had existed among the Mongols as a substantial minority religion for at least one or two centuries at that point. I think Khan Güyük actually had Christian relatives, though he himself was shamanist. It was even the Mongols who reintroduced Christianity to China, after it had been earlier wiped out.
Europeans were completely oblivious to all of this, however.
"The human is almost extinct in the wild. A captive breeding program has been phenomenally successful. Sometimes humans say they eat nachos like an animal, as though they aren't one." -> "A man disappears in the forest. The prisoner's system was successful. Sometimes people say they are animals."
"Minor modification: Include the date in your update. See my entry for an example.
I’d like to see this daily. Again, it shouldn’t take more than a few seconds. For convenience, paste the link into your browser and Favorite it, makes it easy to find."
Output:
"Tom Replace: Enter the update date. For example, see my comment.
In other words, it does not go over a few seconds. Put the browser and browser and browser and your favorite connection. Easy to find."
Recalling the child's game 'Telephone', which mutilates meaning even when using the same language, compounding with translation means it's (always?) going to be unintelligible. I'd like to know where 'browser and browser and browser' was introduced, for example.
It would be interesting to do each language and back in just one iteration in order to get a sense of which translation yields the most and least fidelity.
Twenty years ago I was in Japan, as an onsite tester while we tried out this new thing called XP. The project wasn't going smoothly because no one had told our Japanese clients we were building the solution iteratively, with their input.
At one stage, we all received an email, in Japanese, from the project stakeholder. I put it through machine translation, which back then was pretty raw. The English text was largely gobbledygook, except for one sentence, still branded in my brain: "It becomes the alligator."
Well putting "It becomes the alligator" through the 10 languages (don't know if any were actually Japanese) I got "Occurs with this technician" which seems like something that you could put in any project input.
I really wish there was a history function with a translation back to English at each stage. I would be curious to see if you could identify at what language change "Alligator" became "Technician"
Going to Japanese and back is probably more effective than passing through 10 Indo-European languages as I think it did for me, all the ones I remember were at least IE. The sentence came with exactly the same meaning.
One of my favourite ever YouTube channels, that sadly isn’t around anymore, explored this concept with the ‘Fresh Prince of Bel-Air’ theme song. If anyone’s interested check CDZA out!
I used this to create a short poem after I noticed that all of the imperfections of translation accumulate across translations but the output was thematically similar to the input.
The odd-numbered lines are sentences I wrote, and the even-numbered lines are sentences from the model. Then I came up with a sentence that would continue the previous sentence.
input: Nothing can break my spirit;
output: I couldn't do anything
input: Because I was lifeless.
output: Why don't you live?
input: Enter into eternal communion with the human species;
output: Forever with such a person.
input: I am the link between my ancestors and my descendants,
output: I am a good relationship between my visitors and disregard.
Not sure if others had this experience but by a few lines into the text I was hearing these words sung to the tune of hotel california while reading them.
My conscious had zero awareness of what was going on, but my subconscious got it immediately.
That's an interesting claim, but it's plausible. I would have thought "Happy Birthday" or some Beatles song. Is there a list of most recognized songs someplace?
Yep! The middle two are, "Do you remember when I promised to kill you last? I lied," and "It's not a tumor!"
Other catchphrases:
"Get to the chopper!" ended up as, "Get ten people."
"Let off some steam, Bennet," ended up as, "Free steam, release Bennett."
"Hasta la vista, baby," ended up as "Hasta La Vista Kids." The engine correctly identified "hasta la vista" as not being in the native language and left it alone at every step, except that it got transliterated for some languages (but not all!) and ended up capitalized in the end.
I assume in one of the steps, "Hello" was translated into a word that was for both greeting and farewell (such as "ciao" in Italian) and then the next step translated it into the farewell.
“Since the introduction of the special principle of relativity has been justified, every intellect which strives after generalisation must feel the temptation to venture the step towards the general principle of relativity. But a simple and apparently quite reliable consideration seems to suggest that, for the present at any rate, there is little hope of success in such an attempt;”
Excerpt From
Relativity : the Special and General Theory
Albert Einstein
Becomes:
If you have a unique ideology, you need to see everything known in a simple light. But for these skills to succeed, you can trust the activity easily and cleanly.
The languages it gets translated to aren't deterministic so I put the same sentence in 10 times. I was curious if a simple statement that has a conceptual equivalent in many cultures would tend to survive relatively unscathed so each time started with "I made a cup of tea".
1. "I made a cup of tea"
2. "I work a tree cup"
3. "I have prepared a cup of tea."
4. "I made a cup"
5. "I make a cup of tea"
6. "I made a cuppa."
7. "I prepared a cup of tea"
8. "Tea"
9. "I made a cup of tea"
10. "I have a cup of tea."
Better than I expected really. I'm kind of impressed that it veered into slang on the sixth try and still retained the original meaning.
> I want to help make the world a better place for everyone to live in, to thrive in, and to be their best selves for themselves and each other.
Became
> I like the world living for everyone, I still help me and the best place for each other.
Always worries me I'm just one mistranslation away from my intent being completely lost in the medium of the message...
Actually, this has me intensely curious: there have to be people doing ML for automated translation, right? Is one of the parameters of the translation that you get back what you put in originally? I understand there's "language divides", and that only helps to deepen the necessity to actually rely on algorithms that are making sure the intent of the message is carried... if we can't write an algorithm for it right now, maybe we can throw CPUs at it until a net of random weights gives us 99.9% of the same text and message.
(One more edit, this is fun) One more pass resulted in, trailing comma and all:
Given a source language and a destination language, I wonder if a linguist could design a series of translations that exacerbate the loss of meaning which occurs when words are automatically translated from the context of one language to another. When I write something that I intend to automatically translate, or that I know will likely be read in translation, I often use a simple reflexive translation from source through destination and back to source in order to detect any lack of clarity in my words. Seeing a smoother, more automated feedback while writing for translation would help people improve their communications.
I took a Stripe marketing email that just landed in my inbox to test this out:
"How to stay competitive by building ecosystems of connected experiences that reward loyalty
The power of localized payment methods to boost global revenue growth
Whether recent crypto growth is sustainable or even warranted"
Kinda amusing how much shorter the final translation is:
"The environment that surrounds the environment?
The status of international funding development
Don't last"
From a recent presentation I received: "Accelerate delivery of new systems 'at the speed of need'. Paired with rapid capability acquisition approaches. Enable open architectures that promote modularity and adaptability. Pivot to better/different solutions to respond to evolving threats, new tech"
Ended up as
"Please note the new Speed computer. The boy has a quick path. Change open lists, remove true and true. It is best to want better than more than more than more than more than more than more"
I love the repetitive loop its get stuck in. Not sure how that happened. I also think "Please note the new Speed computer" should be the marketing phrase every time someone releases a new CPU.
Telephone game:
"Translating words through many languages and back is like playing the telephone game." -> "Your words in many languages and return to phone games."
Do NOT use auto-translate for political speeches:
"Ask not what your country can do for you but what you can do for your country" -> "Don't ask what you can do for your country. But what can you do in your country?"
Even when using real translators, there have been issues political speeches. I don't remember the exact context but at the height of the Cold War there was a meeting at the UN and the USSR representative said an idiom that was meant to be treated as "we (communism) will outlive you (capitalism)", but is more literally translated as "we will bury you" which tends to have a different connotation in English.
A popular YouTuber, Brandon Farris, has a whole playlist of recipes ostensibly having been run through the a similar process and then followed literally. Certainly worth a couple of laughs. For instance, [edit](chocolate soufflé). [1]
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness." -> "We are forced to produce these rights and all people are included and create our ordinary Rights of Freedom and Happiness."
This is a terrific example of the game of "telephone", but achieves the result through multiple language translations instead of multiple phrase interpretations in the same language.
> "When in the Course of human events, it becomes necessary for one people to dissolve the political bands
which have connected them with another, and to assume among the powers of the earth, the separate
and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the
opinions of mankind requires that they should declare the causes which impel them to the separation."
---
"During folk events, people should publish political groups.
Related to others and others between an independent army
God gave them laws and laws.
They confirm that people will be expressed in the ministry's opinion."
> The sky above the port was the color of television, tuned to a dead channel.
> “It’s not like I’m using,” Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. “It’s like my body’s developed this massive drug deficiency.” It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese.
became:
> Dead in the sky, BD. Color.
> "When I use it I haven't heard of someone." He was a spray and a good refuge, not asking two words in the week.
Mine took a different route from (I assume) the same initial input: "The brown fox jumped quickly on the lazy dog." Which isn't really that much loss, I think.
'We come in peace' becomes 'We are in the room' reminds me of the movie 'Arrival' about trying to communicate with aliens and the linguistic difficulties.
I manually did it through Google Translate using the first result of top languages in the world and it wasn't too far off.
"Translate a message 10 times into different languages and back."
-> English
-> Mandarin(Chinese Simplified)
-> Hindi
-> Spanish
-> French
-> Arabic
-> Bengali
-> Russian
-> Portuguese
-> Indonesian
-> English
"Translate messages into multiple languages and return 10 times."
I'm really enjoying torturing this service with bizarre phrases..
"I have tarried too long in the garden of plenty, and now my waistcoat is too small to successfully encircle my girthy frame." -> "I have often had the opportunity to spend time with time, and now there is a small hip for success."
This would be more fun if they showed a translation back to English after each step, so you could see how it changed at each step. (Approximately)
This is one of the most fun parts of the game of telephone, going through the circle again, and having each person say out loud what the thought they heard.
I built something last week to do exactly this. I think if I post it here it will get HN hug of death'd because it's on a free Heroku tier and has an API limit. But DM me if you'd like the link.
What's today for lunch? -> What's the food condition now?
That changed completely.
I wish you could just choose the languages (with random option) on one screen and translate them all at same time with end result instead clicking through.
Is it possible to have a machine translation that is so good that you get back the original message after the 10 times? I suppose that might depend on what languages you are translating it into and how closely they map to the starting language.
This is so bizarre. I just wrote a program to do this last week. Precisely the same thing.
I did it with the Google Translate API first, and you can translate it into 100 different languages and back again without ANY data loss whatsoever. My only guess is that they have so check that everything hashes to the same thing? Like some uber-language to check against.
"I am nothing more than a sentient being
suffering while waiting for the sweet release of death." -> "I won't wait for a living because I don't expect to live from life."
"I am a rich woman" returned "I am to be married." "I am rich" "I am fair" and "I am a rich person." Only after a dozen iterations did it come back correctly.
Context matters. If you don't have context you can't translate faithfully.
Some expressions have multiple meanings.
Humans do a very good job guessing the most likely meaning, but the main metric of what constitutes the most likely meaning is the context. The other metric is which meaning is the most common. But humans can disagree about what they think is most common/reasonable.
The shorter the text, the less internal context, the more opportunities for drift.
For example consider the full sentence:
> The farmer allows walkers to cross the field for free, but the bull charges.
Here it's clear it's a word play on the multiple meanings of the verb "to charge" and a human translator can decide to traduce it correctly but losing the word play or to just give up or invent an equivalent word play.
But in isolation "the bull charges" is likely to be interpreted as the bull is going to hit you with its horns, rather than it will require you to give it some money.
Now, you just have to find a word that behaves this way one the way back to English, which gets more and more likely the more languages you use in the chain.
In this case, shouldn't a translator give me both options? E.g. it should translate both "the bull attacks" and "the bull requires payment". If it can find a single word with both meanings in the target language, use that. Otherwise, return multiple options — a UI could allow the user to choose, presenting the meaning of each translation alongside it.
Is the bull the one requiring or being required the payment? Or is bull used as a noun or a verb - I mean it's just too ambiguous to do anything with, and some ambiguity is hard to even notice without a context. Though, "the bull charges" is worse; it can't even be determined whether it's first or second person interaction, so "the bull requires" is definitely a clarification.
Most automated translators indeed offer multiple translations. The UI may decide to offer a default choice and to offer a way to explore the alternatives.
TFA is about building a chain of 10 "default translations". Picking "the right translation" manually on each step would defeat the purpose of the demo.
Could something like this be used as a loss function in a neural net? It seems like perfect translations could be translated back and forth many times.
Maybe it'd work for certain languages and phrases but many words/phrases have imperfect translations across languages and the same sentence could map to multiple valid translations.
It's one of the latest really private versions, it doesn't try to connect to tens of various websites under the shitto "your security & privacy is of paramount importance to us" and works with %99 of the websites. The %1 websites aren't essential or worth visiting anyway. So whether its code is Aramaic or Lateinic, it's good.
Whether it's used commonly or not, it's one the latest versions which don't interfere with your privacy and it works with %99+ of the websites. The %1- sites which don't work, aren't essential or worth visiting, so I'm fine with it :)
became:
"Lauren est similis, lycopersiciSusceptibility rutrum, sed ego te nocere me et vultus realis. Si liber facile adducere ad accedere nervi ministerium." (Latin)
Doing a final pass manually to English:
"Lauren is like, tomato makeup, but i'm hurting you and you look so real. Bring a book if you can easily access the nerve service."