No an ML researcher or anything (I'm basically only a few Karpathy video into ML...

yunwal · 2025-03-14T20:41:37 1741984897

The model still has some representation of whether the word after an/a is more likely to start with a vowel or not when it outputs a/an. You can trivially understand this is true by asking LLMs to answer questions with only one correct answer.

"The animal most similar to a crocodile is:"

https://chatgpt.com/share/67d493c2-f28c-8010-82f7-0b60117ab2...

It will always say "an alligator". It chooses "an" because somewhere in the next word predictor it has already figured out that it wants to say alligator when it chooses "an".

If you ask the question the other way around, it will always answer "a crocodile" for the same reason.

littlestymaar · 2025-03-14T20:53:40 1741985620

Again, that's not a good example I think because everything about the answer is in the prompt, so obviously from the start the "alligator" is high, but then it's just waiting for an "an" to occur to have an occasion to put that.

That doesn't mean it knows "in advance" what it want to say, it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt.

metaxz · 2025-03-14T22:03:03 1741989783

You write: "it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt" - but isn't that the whole point: at the moment the model writes "an", it isn't just spitting out a random article (or a 50/50 distribution of articles or other words for that matter); rather, "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that. While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking". And btw while not even directly relevant, the word alligator isn't in the prompt. Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.

littlestymaar · 2025-03-15T02:17:26 1742005046

> While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking".

It will also emit "a" from time to time without issue though, but will never spit "alligator" right after that, that's it.

> Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.

Not really, because of the autoregressive nature of LLMs, the longer the response the more it will depend on its own response rather than the prompt. That's why you can see totally opposite response from LLM to the same query if you aren't asking basic factual questions. I saw a tool on reddit a few month ago that allowed you to see which words in the generation where the most “opinionated” (where the sampler had to chose between alternative words that were close in probability) and where it was easy to see that you could dramatically affect the result by just changing certain words.

> "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that.

This is true, though it only works with this kind of prompt because the output of the LLM has little impact on the generation.

Globally I see what you mean, and I don't disagree with you, but at the same time, I think that saying that LLMs have a sense of anticipating the further token misses their ability to get driven astray by their own output: they have some information that will affect further tokens but any token that get spit can, and will, change that information in a way that can dramatically change the “plans”. And that's why I think using trivial questions isn't a good illustration, because it pushes this effect under the rug.

Lerc · 2025-03-14T23:12:09 1741993929

yunwal has provided one example. Here's another using much smaller model.

https://chat.groq.com/?prompt=If+a+person+from+Ontario+or+To...

The response "If a person from Ontario or Toronto is a Canadian, a person from Sydney or Melbourne would be an Australian!"

It seems mighty unlikely that it chose Australian as the country because of the 'an', or that it chose to put the 'an' at that point in the sentence for any other reason that the word Australian was going to be next.

For any argument that you think that this does not mean that have some idea of what is to come, try and come up with a test to see if your hypothesis is true or not, then give that test a try.

numeri · 2025-03-16T08:31:46 1742113906

No, the person you're responding to is absolutely right. The easy test (which has been done in papers again and again) is the ability to train linear probes (or non-linear classifier heads) on the current hidden representations to predict the nth-next token, and the fact that these probes have very high accuracy.