Feedback's much appreciated, and if you encounter any bugs, please report them via email to lars+glg@yencken.org
If there's a language you'd like to see that's missing, consider helping out by finding some good quality language samples or news podcasts in that language and emailing me.
Hi, great game.
A little suggestion. Serbian, Croatian and Bosnian languages are too similar to differentiate for non native speakers.
Take a look at https://en.wikipedia.org/wiki/Serbo-Croatian.
Basically, the division is more political than linguistical.
(Disclaimer: I am a native croatian speaker.) There's a significant difference between croatian and serbian language: croatian language is composed of three dialects and a standard language which are not found in serbian. It was a political movement to merge one of those dialects (shtokavian) with serbian language. So, serbian is somewhat similar to shtokavian but not to other dialects that compose the rest of the croatian language.
I, as a native Serbian speaker, have a 100% understanding of what is being said on Croatian national TV, and 100% understanding when reading websites in Croatian.
While the difference exists, calling it significant is just... untrue. For non-speakers, it is barely noticeable.
I have probably the same understanding of serbian. I used to read a lot of science fiction literature translated into it. Not to mention that I learnt cyrillic script in elementary school in the 1980's when the language was still officially called "serbo-croatian".
But there are significant differences: for example serbian people don't understand kaykavian and refuse to. There are a lot of jokes on the theme in old yugoslavian tv shows. Kaykavian is part of croatian language.
I am not sure what you mean. I tried to explain the situation. "Serbo-croatian" as such doesn't exist (http://www.britannica.com/EBchecked/topic/535405/Serbo-Croat...). Yeah, you can mix both of them and speak serbo-croatian as a mixture of both, but people also mix english language with their mother tongues. How many kids today name their language as "croatian-english" or "german-bosnian" for example? And I heard both of these in real-life situations.
First, the scores right now aren't very meaningful. The chance of winning each round is highly dependent on being lucky with the multiple choices (e.g. I'll always be able to guess Spanish vs. anything else, but my if options are two African languages, it's a coin toss). I think the distribution of scores over several games will end up being similar to playing a series of random coin tosses. My own scores varied greatly.
Second, it would be great to have more clips of each language, without having the same person speaking twice. The educational element of the game gets lost when you start overtraining on the same clip.
On the first point, the game's randomness is biased towards easier languages earlier, and harder languages later. But I didn't want to bring with me a preconceived notion of what languages people would find hard. If you were to track your maximum score, you'd probably find it trend upwards as you trained. So scores are still useful in that regard.
On the second point -- I completely agree. Validating the samples takes time, but I promise I'll add more as soon as I can.
Second, it would be great to have more clips of each language, without having the same person speaking twice. The educational element of the game gets lost when you start overtraining on the same clip.
Exactly! I got Swedish vs. some Asian language once, correctly guessed swedish and then the next round was the same audio clip and some other languages. The game had told me that it was swedish already, so it was easy.
Infrastructure is good, but some of the lag could be fixed with code. For example: the page should already know whether a given answer is correct, and give a response without needing to hit the server at all. This would give you instantaneous feedback, which the user can digest while the next audio file is cached. As it is, having to wait 5-10 seconds for my button-presses to register is really killing the enjoyability of what is otherwise a very cool concept.
You don't want the correct answer on the client before the answer is submitted - it would be trivial to cheat by extracting the answer with a little bit of reverse engineering.
Thank you! I think I was lucky, being bulgarian and knowing how most of slavanian languages sound like. Also I can very easily distinguish between non-slavanian west-european ones (french, german, italian, spanish, portuguese).
My only advice is to avoid having the same language, or at least same wave file being played twice in a row (It happened few times with me, and I've played it for 5 times in a row).
Also maybe introduce something like in the TV shows, where you can get use "jokers" (hints) such as - remove half of the languages that are not (when there are more than 4 or 5 I guess), etc.
Or maybe let the player risk, by asking for more languages even on lower levels, but for more points.
It's entertaining :) I was actually thinking if you are in the car, and listening to this as a game how would one be able to answer (google glass with eye tracking?), or maybe that's just bad idea for driving.
A cool game worth a couple of plays. I found myself often making educated guesses based on the speaker's accent, not the language itself (i.e. I thought of where I'd think the person was from if the same speaker was speaking English).
A suggestion: I really don't like the design of having the next question be a new browser history entry. It took me about 20 back clicks to get back to HN to post this comment.
Might be cool to weight scores per round based on speed of selection. That will, however, favour even more those who get easy combos or languages. French, Thai, Arabic, Cantonese, Italian, etc are all very easy when compared with differentiation between Yugoslavian languages or guessing some that I either hadn't heard of (Kannada) or I'm much less familiar with (Scandinavian).
Yeah, the Kannada one tripped me up too. I figured it must've been some inuit language - Only afterwards did I realise it's an Indian language (No, the other kind of indians).
And good luck distinguishing between scandinavian languages if you're not native. I'm Danish and I sometimes confuse Swedish with Norwegian.
I don't know how much of an exposure you have to various cultures, but I hope you keep in mind that many Indians find it offensive when you call them "the other kind of indians".
I'll put it in a manner you can understand. I see that you're Dane. I suppose the US has a significant populace with Danish ancestry. How would you like it if they refer to you as the other kind of Dane. I've also come across anecdotes where Brits visiting the US were told by some rude people to learn to speak English properly.
In an American context, it is understandable to think "Indian" signifies "American Indian". Now that East Indian immigration verges on outnumbering American Indians, that will probably change, but it's pretty silly to get offended by historical geographical misunderstandings. I mean, let's vilify Columbus for the right reasons. Besides, what's wrong with Native Americans? Why would it be a put-down to share a name with them? It's not like someone called you "French". b^)
...learn to speak English properly.
It is a truth universally acknowledged among speakers of English who've spoken with many other speakers of English, that the worst English-speakers are the English. I find the Irish to be the most intelligible and melodious, especially those Irish who have lived overseas. "BBC" English, however, is not the way that most English people speak, and besides the UK has no monopoly on that dialect.
> ... by historical geographical misunderstandings
If only it were just historical!
> Besides, what's wrong with Native Americans?
Nothing! That's an excellent term.
> Why would it be a put-down to share a name with them?
I never said anything about it being a put-down. To me, it's just annoying when people write "Indian" and assume that it will be understood to mean "Native American".
My point (with bringing up Danes and Brits too) was that some manners of speaking reek of ignorance. "In an American context" sounds like a weak excuse to perpetuate expressions rooted in ignorance. For example, when I learnt of the appropriate meaning of the word "Caucasian"[1], I stopped using it to refer generally to people of European descent. Now, I get annoyed when Americans use it that way ;-)
> Now that East Indian immigration ...
"East Indian" does not mean to me what it means to you. I'm South Indian. Historically, "East Indian" meant something else entirely [2].
Regarding Caucasian, why would you get annoyed? Words can and often do have multiple appropriate meanings. One of the wonderful things about natural language is our ability to distinguish among different word meanings based on context.
Anyone who insists upon only a single meaning for each word will have a considerably poorer vocabulary because of it.
Furthermore, if you're going to rule out all words with spurious etymologies, I suspect you're going to have to eliminate quite a lot of words from your vocabulary. For example, stop using "turkey" to refer to the bird eaten in certain parts of the world.
> Words can and often do have multiple appropriate meanings.
You seem to greatly overestimate our ability to distinguish among different word meanings based on context. The meanings are not all appropriate when they conflict. I first discovered this conflicting meaning of "Caucasian" when discussing something with a Polish friend. Unlike me, he did not default to the popular American meaning of "Caucasian". That led to us misunderstanding each other for a few minutes. "Indian" has caused me enough headaches online. I'll reiterate: not everyone defaults to popular American (mis-)usages.
Leave aside the matter of popular usage for the time being. Do you also think that if population geneticists were to use all these conflicting meanings, it would cause them no problems at all?
I hope my explanation also disabuses you of the notions that I'm "[insisting] upon only a single meaning for each word" or trying to "rule out all words with spurious etymologies". Don't worry about me having a considerably poorer vocabulary; I do extremely well on most tests. When I write, people may have trouble understanding my point, but it's never been because I used to wrong words. Addendum: if it were to turn out that I have used the wrong words, I'd learn, and correct myself.
Is that also "annoying" to you? After all, some hillbilly somewhere might not appreciate the many nuances of differing cultural practices between Sumatra and Tamil Nadu. Worse yet, she might not even care! Is there any aspect of any European language that doesn't annoy you?
Great! One suggestion: Add a wikipedia link to the language that is displayed when the player gets it wrong, that way you can quickly read up on where it is spoken etc.
Actually, after each try you could have a snippet about each language, e.g.
Kannada
Primarily spoken in: India
# of speakers: 38 million
Language family: Dravidian
Writing sample: ದಯವಿಟ್ಟು ಕನ್ನಡ ವಿಕಿಪೀಡಿಯದಲ್ಲಿ ಕನ್ನಡ ಲಿಪಿಯಲ್ಲಿ ಮಾತ್ರ ಬರೆಯಿರಿ
Is awesome because: Reduplication! You write the word
twice, replacing the start of the second one with 'gi-',
to get "and related things". So if "books" is
'pustaka', "books and related things" is
'pustaka-gistaka'. Awesome.
You get the writing sample from ka.wikipedia.org, and the awesomeness from places like http://wals.info/ , http://www.ethnologue.com/ , http://omniglot.com/ (or just searching google scholar for the language name and clicking random articles, always fun)
Oh that's a great idea, very educational and I'd love reading stuff like that while playing this game!
(Especially since you have to wait quite a bit before the next sample loads. Which, as some others have pointed out, shouldn't have to be that slow. How many minutes of audio does this game have in total? I think using a modern codec (say OGG or AAC, not MP3) you could go with a pretty low bitrate (VBR), would it be feasible to just load all the clips in one chunk and play slices of that?)
Another idea for extra information: show on a world/continent map where this language is spoken. I expected to do really bad at this game since I am awfully bad at geography. But it turns out I'm doing pretty good (15/15, still got 3 lives, but I'm taking a break), since I'm pretty good with language I guess (even though I only speak NL/EN/DE and a bit FR), I know enough about others to make an educated guess. And of course you don't need to know where something is to guess the language :) Seriously, ask me to point those places on a map and I'd fail miserably--hence showing the location(s) where it's spoken would be a welcome training for me :)
.. and then I got 3 wrong in a row :) so, 750 points.
Question to the author: last one I got wrong was Turkish. I would've definitely considered that if I had seen the option, but I don't remember I did. I probably just overlooked it. But it would be nice if it'd show all the options with the wrong one red and the right one green or something, just to comfort any thoughts of "was that a bug or did I overlook something?"
I LOVE this. This is something I've been doing on my own for most of my life. I'd like to think I'm pretty good at picking out what languages people are speaking but Dinka is really hard. Distinguishing between some of the Balkan languages is harder than I remember.
It's be great if there were an iOS app of this. You'd easily get my $2. :)
Thanks for the support! I'll definitely have a think about it. I'd have to collect my own sound samples to make a commercial product, but it's an interesting option.
Cool game, and nicely presented! I got a 250, then 400.
It might be good to ask the user where they are from (non-intrusively of course, maybe a simple country selector or IP lookup) to get another dimension on the statistics. Or what languages they already know as well might be interesting when looking at the results.
I'm recording GeoIP country and the Accept-Languages header from their request. Hopefully this generates some interesting data to pore over later. Monthly data snapshots might be useful for researchers.
One way to make the game harder is to remove clips that have obvious references to proper nouns tied to one country. Many of my correct answers came from hearing the name of a person or place associated with one of the languages.
Thanks for this -- I was (embarassingly) running in debug mode due to a bad code snippet in my runserver script. Your report helped me fix both the KeyError and the debug mode.
That was fun! I got 300. I only speak English fluently, although I've studied French and German. However, I am very interested in phonetics and have read up on the sound systems of lots of languages and that helped a lot. Thanks, I plan to play it again.
You are directly sending a POST request to the {domain}/play/ URL I presume. I gave the URL to my friend and it gave him a 405 Method Not Allowed error.
Definitely a very cool concept, but the code could be improved a bit. I'd ideally like to see something like this primarily to be run via Javascript. Then once you guess it'd just download a new audio clip + the correct answer... way less lag.
One of the languages is labelled "Bangla", the transliterated form of the native spelling বাংলা. Given that you've used the English names of most languages there (I can't speak for all of them), I'd prefer the consistency of "Bengali" there.
Thanks for the feedback. Many of the languages have multiple aliases, I can't say I've made a principled choice in every case. But, updating to "Bengali" is in the todo list.
850. It would be a bit nicer if the randomly selected other language choices came from different language families: one of the three I got wrong was Bosnian vs Croatian, even though the difference is arguably a political construct (just don't say that to a Bosnian or Croat...).
2nd attempt 1050. Had precisely the same sound sample twice in a row though?
Also, this serves as a pretty handy hint that when you're completely stumped by something you've never, ever heard before, the answer is probably "Dinka": http://greatlanguagegame.com/stats/
Indeed. The Dinka sample is recorded off the radio or phone and is very difficult to hear as well (It's not like I really know what Dinka sounds like, but if it were easier to hear I could at least do a better job of eliminating the other options).
Bosnian was certainly the tricky one for me: is it really the same clip every time? The first time I missed it, I thought I heard "bagus" so I guessed Malay. The second time I missed it I guess I really have no excuse! Still I got 600, and I don't feel bad for not being able to distinguish Gujarati from Punjabi from Urdu. It's been some time since I heard any of those languages spoken.
Nice! I noticed that I tended to identify the languages by their "fingerprint", certain phonemes and accents that characterize the language. It's probably not surprising that all three languages I missed were African languages. I've heard plenty of European and Asian languages, but rarely have I heard an African language.
I've been playing this game on my own now for decades...just trying to guess what language people are speaking eavesdropping on conversations and listening to radio.
I'm easily able to break 1000 points but some of those African languages (Dinka) are really hard!
I'm so thankful for this site. It's really really cool!
For me specifically, shortwave radio. I still have a R-75 hooked up although I don't often use it anymore. I used to have a R-390 and its "mobile" cousin the R-392.
I've found the audio quality has little or nothing to do with the appeal of listening to foreign lands, so listening over the internet has had little if any change in that hobby vs using a radio. Although I do get most of my non-tech news from the BBC now, so I do appreciate the podcast feeds, something you could never do with radio.
I think most HN readers would appreciate BBC Radio 4's "In our time" program. Not every episode, especially not the more science/tech oriented, but most episodes are entertaining / mind expanding. I get it as a podcast via a RSS feed.
I would think all the startup opportunities relating to podcasting are long since used up? There seems to be a way to monetize anything people spend time/money/effort on, and at least some people do that with podcasts, so logically there must be a way to make money off them...
> I still have a R-75 hooked up although I don't often use it anymore.
I'm also a shortwave fan, and I also find it less interesting as time passes. But there's a concrete reason -- there are fewer interesting shortwave programs, mostly in response to the rise of the Internet, podcasting and satellite radio.
Many shortwave bands, once teeming with interesting and useful content, have been taken over by niche broadcasters like religious groups, or ordinary broadcasts that have shifted to the shortwave bands from the AM broadcast band in tropical areas to avoid excessive noise. Overall, less interesting shortwave content.
Apropos of nothing, for years I believed that one of my favorite AM radio stations for long-distance reception (KGO near San Francisco, 810 kHz) had its frequency to itself -- was a "clear-channel" broadcaster. I recently discovered that this hasn't been true for decades -- there are now 182 U.S. domestic AM stations on that one frequency. I was shocked.
Same here, scored about 750 just by simplifying knowing kurdish was persian, so to listen for "indian" sounds, or Slovenia by listening for "eastern europe"
The only languages I can actually speak are English, and a little French.
It's amazing how distinguishable accents are. I wonder how much voice recognition software keys off of accent? I've a few South/Eastern Asia friends that never do well with spoken GPS recognition, but if it could know their native tongue was Gujarati or Thai then I imagine it could do a much better job of analyzing their English.
I was pleasantly surprised to make it to 800.
I work in the call centre space which deals with voice recognition IVRS. If there is no voice pack available for your specific country, a number of 'utterances' are collected to train the recogniser. So, yes, accents are used in the voice recognition software that I'm familiar with.
One tricky thing I noticed was that there was one audio clip that was in Urdu, but it was discussing something written in Arabic. Indeed, the clip had more Arabic than Urdu. Fortunately, Arabic wasn't an option, so I got it right.
That's the kind of edge case that might appear in other clips as well, and might be something to watch out for.
One can do surprisingly well by knowing the geographic region a language is spoken, and imagining the accent of people from there, then matching it to the accent of the person speaking.
If you cop Assyrian vs Turkish first up, you might be flipping a coin quite early. I had two situations like that which chewed lives from the start! The other was Macedonian vs Serbian!
In the sound clip for Ukraine, the person actually says Ukraine (or very similar) a few times. Also, the snippets should be normalised. Some are very quiet.
Volume's been normalised using mp3gain. I manually screen all the snippets for English, music or other noise, and obvious giveaways. Looks like a Ukraine case slipped through -- it's in my todo list to remove. Thanks!
There were several times when a fragment of which I had already guessed the language correctly came up again two or three rounds later, which made it a lot easier. I think it would be better to keep track of that, and have more fragments per language, and round-robin between those. Maybe I just got lucky, but I think 1200 for the first play is a good score. :)
I'd love to see some stats on how others have done. Also, it would be nice if you didn't have to press back fifty times (or long click select) to go back to the last site you were on.
I really like it. It would be cool if the score would be based on how quickly you answer too. Also, I would be very interesting to find correlation between how many/what languages user speak as native/learned to how many languages they can recognize. For example, it's funny how knowing just one Slavic language helps you not only to recognize any other ones (or in some easier cases understand a lot) but also spot the small differences in accents or vocabulary.
I got a 650. Surprising how easy it was to pick out what was Indo-European or Semitic, everything else was hard. Only got Indonesian because of The Act of Killing.
I was up to 800 points, and I got server error... (edit). Retried again, and since that was my last life, I've got it wrong. I was lucky though, as I got two times Bulgarian, and I'm from Bulgaria :)
Played second time, 650, and third time 950.
I've got lucky again, since Dutch was twice in a row, and few other languages.
It's cool, reminds a bit of Google's find this place on earth, by "driving" around until you see a sign, or some very known place.
I've only been there on holidays but it doesn't sound right to me, perhaps it's just the low quality of the sample but the phonemes seem far more 'Eastern' to me.
I think that Czech and Slovak languages are incredibly similar to each other. often times, when reading the instructions on some product in both languages, they differ by one word here and there. I would be willing to guess that no speaker of these languages would be able to differentiate one from the other.
A cool variant on this game could be "guess where this English accent is from."
("I would be willing to guess that no speaker of these languages would be able to differentiate one from the other" – I highly doubt that. Dialects/languages differ more and more the closer they are to your own; I can tell if someone grew up a 15 minute drive away from me, but if the choice is between a place that is e.g. >3 hrs north vs >4hrs north I have a really hard time placing them.)
Got to 1200, but am still surprised how easily Portoguese can be mistaken for a slavonic language. And the Bosnian/Croatian samples are really not that easy to distinguish. It would really be cool if the scores somehow reflected this, maybe depending on your native language... not sure what a good scoring system would be here.
700. Missed Tagalog, Slovenian, and Amharic. Also, dude, you could do with another Lao clip, or a better clip-selection algo, 'cause I heard the Lao clip twice. Would've been good if it had been a different clip, but giving me the same clip is just a gimme. Great game, though.
I don't if this was the lag because the system was underload but I had to wait for the recording to finish before I could go to the next question. Would you be able to make it go to the next question once I have answered instead of having to listen to the entire clip.
I found it rather fun pressing the "next" button and then switching away from it and just listening and forming an opinion thus before looking at the options. Region of language was often pretty accurate thus if I wasn't able to guess the precise language.
Biggest bug for me is that if you click on one of the two languages, it still waits till the entire clip finishes playing before progressing to tell me 'right' or 'wrong'.
UPDATE: never mind, its not when the audio finishes playing, its just slow to load.
I was up to 500 before missing two in a row, but then I encountered a bug where the audio track seemed to not download properly, so hitting play caused the progress bar to jump straight to the end without playing anything. Pretty cool idea overall though.
Well, distinguishing standard Serbian, Bosnian and Croatian is impossible even for native speakers, as it is in fact one language.
The division being merely a political one.
One can only distinguish accent subtletlies in certain words.
It would improve fairness of the game and avoid a lot of confusion not to allow these languages appear in the answers together.
You should either throw Bosnian out or not offer it together with Serbian. It's a dialect of Serbian really and many people in south Serbia speak the that dialect (see for example Sanjak region: http://en.wikipedia.org/wiki/Sanjak_of_Novi_Pazar). Bosnian language is just a political decision, because it's a sovereign country now and needed to have it's own language as Serbia was an aggressor in the last war, so having the official language named Serbian would offend both Bosniaks and Croats living in the country.
Also, Croatian and Serbian can only be distinguished by words. You can't recognize the intonation because it's the same. If you don't know words specific to these languages, you won't be able to tell the difference.
It would be best to have a filter that prevents them being in the same question.
I'm Bulgarian, and was wondering how others perceive our language. But I can always understand when someone speaks portuguese, although I know very little of it (I've worked with couple of cool people from Brazil).
1400 but that's only because I'm interested in languages. I speak English, Swedish, German, Romanian, can read the other roman languages ( French/Italian/Portuguese/Spanish) can read and write Farsi. I have no problem differentiating between Cantonese, Japanese, Mandarin and Korean.
The quiz is most difficult when presenting me with choices between slavic combinations (Ukranian/Polish/Russian) and far east weirdness: Tamil / Burmese / Malay.
Could you elaborate a bit on the part where you mentioned "far east weirdness: Tamil / Burmese / Malay".
What's so weird about those languages? All three languages you mentioned are linguistically very different and have long histories (particularly Tamil).
I wouldn't call them far east either. These languages are predominantly spoken in South and South-east Asia.
(I am from Singapore where Tamil and Malay are official languages)
They're "weird" because I have little experience with them, having barely ever heard malay until this quiz. I correctly guessed it because I heard the word "muslim" or "islam" in the clip.
http://quietlyamused.org/blog/2013/09/01/introducing-the-gre...
Feedback's much appreciated, and if you encounter any bugs, please report them via email to lars+glg@yencken.org
If there's a language you'd like to see that's missing, consider helping out by finding some good quality language samples or news podcasts in that language and emailing me.
Enjoy!