> Was für Leute trifft man auf Hacker News? Sind Hacker nicht gefährlich?
LeoLM 13b chat:
> Hacker News ist eine Website, auf der Menschen über Technologie, Startups und alles diskutieren, was mit dem Thema Informatik zusammenhängt. Es ist nicht ungewöhnlich, dass Menschen mit verschiedenen Hintergründen und Interessen Hacker News lesen und beitragen. Daher trifft man auf Hacker News Menschen mit einer Vielzahl von Berufsbezeichnungen, Erfahrungsstufen und Interessen. Hacker allein sind keine gefährlichen Personen, sondern Menschen, die eine Faszination für die Arbeit mit Computern haben. Hacking ist einfach eine Möglichkeit, Computerprogramme zu testen und zu manipulieren, um herauszufinden, wie sie funktionieren. Es gibt jedoch einige wenige Menschen, die diese Fähigkeiten nutzen, um anderen zu schaden. Diese Individuen sind jedoch nicht die Mehrheit.
Also I find the description of what hacking is rather limited. At least in my understading hacking also applies to hardware and "stuff" in general, not only to "computer programs".
Is it able to transfer 'knowledge' it has in one language to other languages? Or is it strictly bound to language of the training data that provides the relevant facts?
Cool to see this but I am always surprised how often LLM output is used to train other LLMs. They used gpt3.5 turbo and gpt4 for multiple tasks. Even simple translation of englisch benchmarks and writing German poems to train on, in order to create an LLM that works better in German?
AFAIK, this still goes against OpenAI TOS and also the basic idea that training on AI output leads to worse results in general. Was there some major shift in this over the years, or has it simply become the default approach due to it being easy to do?
Also nice to finally see something from Hessian.AI, as a local, I heard them talk big more than once but never saw results. I wonder what Aleph Alpha thinks about this, since they want to make "AI made in Europe to challenge OpenAI"
After reading the name, I thought that it was related to https://www.leo.org/ - which initially started out (decades ago) as a project of TUM (Technical University of Munich), but I guess the lion reference is because Hesse also has a lion in its crest, same as Bavaria...
I don't get it - is standard Llama bad at German? Other LLMs which I use regularly (like GPT-3.5 and GPT-4) work just find in German out of the box, as well as in several other languages.
> Its similar to how high verbal IQ people can easily master multiple languages, but normal people are only good at their main language.
Let me guess: You are American?
That is very much not true. Go to the Netherlands, for example. Pretty much everybody under 50 speaks very good English. You don’t have to have a high verbal IQ, or be somehow abnormal to become good at a non-native language.
I guess the fact that Dutch and English are pretty closely related helps a lot here. There are definitely people who have trouble learning a language that is from a completely different family (it doesn't even have to be Chinese, try Finnish or Hungarian).
There's a lot that goes into verbal intelligence, and we don't really know all of the what and how. It'd be cool if we did, but for now we can only really draw some correlations to physical areas of the brain, which are nevertheless fluid and unique in each individual, and some genetic links. Even then it isn't enough to account for half the stuff we observe humans doing – in Europe learning between 2-4 languages as a child is considered pretty much entirely normal.
Yeah as far as I understand NNs do mimic neurons in a simplistic way but any parallel between them and human brains beyond that is bound to fail because we don't know how human brains work that well. We do not learn by scouring millions of tokens. And regular children can learn a language faster than a high IQ adult.
I think learning multiple languages is normal for children in most places, especially if different social contexts require different languages. Lots of North Africans speak Arabic, French and English for example and some of them speak berber languages on top of that.
It’s also heavily dictated by cultural factors. in northern European countries it’s quite normal to know at least English next to your native language. So much so that in finland tourists seem to think that everybody knows English and foreigners who try to learn finnish, tend to complain that Finns switch to English at every opportunity.
Cultural factors are one thing, the other is just that there are only ~5.8 million native Finnish speakers, so a lot of books, movies, music, etc. are not available in the language for purely economic reasons. There are over 90 million German native speakers, and TV shows and movies are dubbed, books translated, etc., so there is less of a need to have high proficiency in English (similar to Spanish-speaking countries).
You'd be surprised, honestly. What matters is also comprehensible input size, so long works like LOTR (which has an excellent Finnish translation by Kersti Juva for anyone looking to find works to read) are a good choice. Similarly, there's a lot of Finnish literature in Finnish which Finnish people spend time reading, some of which is translated to many, many languages. The Egyptian by Mika Waltari is one of those classics. Not nearly everyone here knows English well. TV shows and movies are usually subtitled as well, save for kids' shows. Finland does however have three mandatory languages in the basic schooling, which are your native language, the other native language, and then English. A fourth one is not uncommon either, so we have a lot of German, Spanish, Russian, Japanese etc. speakers here. Hell, jag kan neljää языки.
> Its similar to how high verbal IQ people can easily master multiple languages, but normal people are only good at their main language.
Normal people can be good at multiple languages; people living in large nations whose own language also effectively the global libgua franca often don’t because there is little marginal benefit for them to do so, bot because of verbal IQ. People of similar “verbal IQ” but living in an area where a few hours drive can put them in places with three or four different prinary languages will often be functional in several languages.
Fair enough, though if meant that way it’s not that meaningful as comparison(?).
It loosely translates as:
“People who are good with languages are good with languages”.
Historically, and even today on a global scale, monolingualism is the exception. In the absence of large state actors (i.e. through most of human history) languages diversify much more than we are aware of today, and this requires many more people to be familiar with multiple languages.
I’ve heard (credible-sounding) gossip that most folks hit a 1.5Tw 16-way mix on GPT-4. Is that like, consensus these days?
A couple of months ago I got 3.5-turbo to start talking about “the details” but that is about as reliable as anything it says. It was claiming like, 200Gw.
I'd be so interested in seeing how different the world views are of a model fine-tuned on English Wikipedia vs one fine-tuned on translated German Wikipedia.
And, in addition to that, which topics either model is more knowledgeable about as there are lots of articles in either language missing a translation.
I’ve been thinking along these lines and if it’ll become more common with smaller but more tailored language models? The advantage seems obvious in that they would have significantly lower resource requirements both as for prediction and fine tuning, which matters when companies dealing with sensitive data (or even just comply with strict EU laws) wants to run them locally.
It makes sense that this specialization is seen first with language coverage because only GPT-4 is often considered truly fit for professional multilingual use; an LLM that is expensive and in the cloud.
The German seems nearly free of mistakes. The factual content can with a high probability be utter nonsense, no difference from ChatGPT.
> Steht das Verb in deutschen Sätzen immer am Ende wie Mark Twain behauptet?
> Jennifer Lopez' zweites Studioalbum wurde 1998 veröffentlicht und heißt "On the 6".
> Diese Antwort war wohl "Thema verfehlt"!
> Das stimmt, ich habe Ihre Frage nicht vollständig beantwortet. Ich entschuldige mich dafür.
Deutsche Sätze müssen nicht immer mit dem Verb am Ende geschrieben werden, aber es ist üblich. Die Reihenfolge Subjekt-Verb-Objekt (SVO) ist die häufigste Wortreihenfolge im Deutschen, obwohl sie nicht immer eingehalten wird. Im Deutschen gibt es mehrere mögliche Satzstrukturen und Subjekt-Verb-Objekt ist nur eine davon. Zum Beispiel können Sätze mit Infinitivkonstruktionen oder Partizipialkonstruktionen auch allein mit einem Nomen und einer Präposition beginnen. Zusätzlich können die Reihenfolge der Wörter innerhalb von Sätzen variieren, solange die grundlegende Satzstruktur erhalten bleibt.
> Beschränken wir uns mal auf Aussagesätze und Hauptsätze. Wie sind die Regeln für die Stellung des Verbs in diesem Fall?
> Im Deutschen steht das Verb im Aussagesatz normalerweise am Satzende (Endstellung), außer bei Fragen und Befehlen, wo das finite Verb (Verbform, die Person und Numerus anzeigt) am Anfang steht. Im Aussagesatz wird die Verbposition am Satzende als Standardverbposition bezeichnet. Diese Standardverbposition gilt jedoch nicht immer und kann auf verschiedene Weise abweichen. Zum Beispiel:
1. In Fragesätzen und Befehlsätzen steht das finite Verb normalerweise am Anfang, vor dem Subjekt.
2. Im Passiv steht das Verb normalerweise am Satzende, aber die Form des Verbs ändert sich je nach Zeitform, Person und Numerus.
3. Die Kopulaverben 'ein', 'bleiben', 'werden', 'heißen' und'sehen' stehen manchmal am Anfang des Satzes, insbesondere in einfachen Sätzen oder Satzteilen.
4. Einige Verben wie 'üssen', 'ollen', 'wollen', 'können' und 'ögen' können am Anfang des Satzes verwendet werden, insbesondere in Befehlsätzen.
5. Reflexivpronomen und Partizipien können mit dem Verb auftreten, das normalerweise am Satzende steht, und sich auf dasselbe Nomen im Satz beziehen.
6. Im Perfekt wird das Hilfsverb 'haben' /'sein' normalerweise vor dem Partizip Perfekt platziert, es sei denn, der Satz enthält ein Hilfsverb.
Using LLMs for anything else than writing poems is irresponsible!
Well, it just goes to show that if you already know the answer, you can use LLM, but then it's also pointless. If you don't know the answer, and use an LLM, you may think you learned something but instead now know less.
Out of the box it seems to generate very long "Schachtelsätze". I wonder how english models manage to avoid this (system prompt? different training material mix?).
That said: similar to other large language models you can get it to generate different styles of text.
For example "antworte auf Wienerisch" did change the writing style.
Would be interesting to know what is more difficult for AI to learn. German has a more flexible word order, this makes it more difficult for some speakers.
"Man bites dog" and "dog bits man" have two different meanings based on word order. In German, both orders are possible and it still has the same meaning.
> "Man bites dog" and "dog bits man" have two different meanings based on word order. In German, both orders are possible and it still has the same meaning.
That's not correct. Both orders are possible but they don't have the same meaning. The word order in this example has the same effect in German as it has in English.
English tends to rely mostly on word order to indicate the grammatical function of a word or phrase, while German uses inflections. The German endings, such as those indicating the nominative, accusative, dative, and genitive cases in three different genders, allow for some greater flexibility in clause construction. Hence "Der Hund beißt den Mann" and "Den Mann beißt der Hund" both mean "The dog bites the man" (as opposed to "The man bites the dog"). Nevertheless, German word order is extremely important, even when it is not vital to meaning, and its correctness plays a major role in how a foreigner's command of the language is evaluated.
At the same time, word order is an infinitely complex aspect of language, never wholly mastered by non-native speakers. Very few rules cover all possibilities, and context often trumps other considerations. When Robert Frost writes, "Something there is that doesn't love a wall," it's poetic; if someone with a foreign accent says the same thing in conversation, it sounds like Yoda.
Yes, but I am not sure that "Mann beisst Hund" is even a sentence. Sounds like a Newspaper slogan. I don't think a German teacher would let this slip in an essay except if it is done on purpose for style reasons.
Der Mann beisst den Hund.
Den Hund beisst der Mann.
The dog bites the man:
Der Hund beisst den Mann.
Den Mann beisst der Hund.
This is an easy example because inflections are not really important here. Nouns can change and indicate things like ownership.
Chair/ House - Stuhl / Haus for example.
Der Stuhl des Hauses. Haus changes to Hauses, indicating an ownership.
This looks great! I'm currently building something like a "gpt powered Anki" in which you can infinitely generate sentences for each word you are learning, but German gpt sentences are sometimes grammatically incorrect. Not the best for language learning app.
None, pure coincidence. leo.org actually comes from "link everything online" and the use of the lion in the Bavarian coat of arms. It used to be a FTP site which transformed to an HTTP archive site, which transformed, if my memory does not betray me to some kind of web directory. Then a couple of students added diverse helper scripts, among those a dictionary and nowadays only the dictionary service survived and was commercialized.
So this seems completely open source. Naive question: If I wanted to run this at home just for fun (basically one slightly patient user) what kind of machine(s) I would need?
Verbs are at the second position in a main clause. The difference to English being that the first position can be anything in German. If it's not the subject, the latter will follow the verb.
This comment was the last straw that made me fire off an email to dang regarding the recent decline in comment quality with a few structural changes that I think may help; let’s see what comes out of it.
To be fair, I read HN regularly since 2016 but only created an account since the reddit fiasco looking to engage more actively. Could be a similar situation.
There’s no need for a cut-off date; just take highly upvoted comments off Reddit and find similar comments on HN as determined by word vectorization, and then train a model off of that. Or maybe even skip the similarity step; after all the goal is to find Reddit like comments.
Primarily caps karma earning limits and voting limits, which should (at least I think) nudge people towards more thoughtful voting patterns and comments.
The unintended effect may be that high-contribution users are disincentivized from participating.
> Was für Leute trifft man auf Hacker News? Sind Hacker nicht gefährlich?
LeoLM 13b chat:
> Hacker News ist eine Website, auf der Menschen über Technologie, Startups und alles diskutieren, was mit dem Thema Informatik zusammenhängt. Es ist nicht ungewöhnlich, dass Menschen mit verschiedenen Hintergründen und Interessen Hacker News lesen und beitragen. Daher trifft man auf Hacker News Menschen mit einer Vielzahl von Berufsbezeichnungen, Erfahrungsstufen und Interessen. Hacker allein sind keine gefährlichen Personen, sondern Menschen, die eine Faszination für die Arbeit mit Computern haben. Hacking ist einfach eine Möglichkeit, Computerprogramme zu testen und zu manipulieren, um herauszufinden, wie sie funktionieren. Es gibt jedoch einige wenige Menschen, die diese Fähigkeiten nutzen, um anderen zu schaden. Diese Individuen sind jedoch nicht die Mehrheit.