Hacker News new | past | comments | ask | show | jobs | submit login

If you have a plan for automated checking the output of Google Translate against all possible character input strings, be sure to mention it to your recruiter as part of the hiring process.

More seriously: Google Translate's bread-and-butter data source is documents that were human-translated from one language to another with high reliability (such as UN publications). That turns out to work remarkably well for building a neural net that can extrapolate how one sentence should translate to the same sentence in another language. But like most neural networks, it's vulnerable to garbage-in, garbage-out: much like you can get an animal detector to hallucinate "zebra!" if you feed it a noise-pattern as input, if you feed it character sequences that aren't actually words in the input language, it'll try to extrapolate what reality should be between all the corpus it's seen and you'll get garbage on the output side.

Since the tool doesn't actually know what words mean, it has no way, at present, to know "Yes" isn't a Spanish word (and as other commenters have mentioned, it may actually be "a Spanish word" in one weird context in one weird document somewhere in the corpus of all translated documents accessible from the Internet... Or some doc somewhere contains a close-enough typo in the Spanish input document that is over-reflected in the output because no other document contradicts the typo's apparent translation).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: