Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Because some of the things it gets right can not be attributed to anything else other than the concept of "understanding".

Or just large dataset. The new thing we got was parsing natural languages, then we did a markov chain based on that so that it outputs semantic correct follow-ups based on what people on the internet would likely do/say in similar situations, and you get this result.

It is very easy to see that it works this way if you play around a bit with what it can and can't do, just identify what sort of conversation it used as a template and you can make it print nonsense by inputting values that wont work in that template.

Edit: Also generating next state based on previous state is literally what the model does and is the definition of a Markov chain, Markov chains is a statistical concept and not just a word chain.



Arguing with AI (especially LLM) proponents is beginning to feel like arguing with a religious person ... it's almost easier to just let them believe in their god.

Pointing out the failures of their favorite LLM to prove to them that it's not doing what they think it's doing just falls on deaf ears as they go digging for more "proof" that ChatGPT actually understands what it's saying.


It's funny becasue I feel exactly the same about the LLM denegrators.

It feels like you will only be happy if we are able to prove that the LLM has a soul.


I just want its capabilities to be accurately advertised.

A lot of people are assigning it capabilities it doesn't really have.


>Pointing out the failures of their favorite LLM to prove to them that it's not doing what they think it's doing just falls on deaf ears

It's not falling on deaf ears. It's because it's stupidity to think the "failures" are proof.

Should I point out all the failures in human intelligence? Humans fuck up all the time. Humans make stupid mistakes, assumptions, biases, errors in reasoning and leaps in logic all the time.

According to your logic that's proof that humans don't understand anything.


> According to your logic that's proof that humans don't understand anything.

Well, it's reasonable to conclude that the people who constantly making "stupid mistakes, assumptions, and biases leaps in logic" don't actually know what they are talking about.


Right but can you make the claim that they don't understand anything at all? That they're stochastic parrots? Because that's the direction where your own logic is taking you.

I'll concede one thing, LLMs can't use vicious insults and subtle slights to cover up the lack of a good argument. That's a human specialty done by people who are too scared to admit that they're wrong.


We learn that skill in our training. We are highly trained LLM’s skilled in the nuances.

Everyone is acting like LLM model is the finished product.

It is a baby. Do you remember your nuanced language capability at 5, 8, 10, 20?

Are you the finished product? Do you still make massive leaps of awareness? Yes.


> claim that they don't understand anything at all?

They certainly don't understand enough to get things right without a few thousand examples already demonstrating the solution to a specific problem.


Taking a long time to learn something doesn't mean that something wasn't learned nor understood. Two separate topics.

Even chatGPT understands this:

   ME: Does someone or something requiring a thousand examples to understand something imply that it doesn't understand anything?  
   
   CHATGPT: No, requiring a thousand examples to understand something does not necessarily imply that someone or something doesn't understand anything. It simply suggests that the thing being understood is complex and requires a significant amount of data or examples to fully grasp.

   CHATGPT: For example, in the context of machine learning, a large dataset of examples may be required to train a model to perform a specific task accurately. This does not mean that the model doesn't understand anything; rather, it suggests that it needs a lot of data to be able to generalize and apply its understanding to new situations.

   CHATGPT: Similarly, in the case of human learning, some concepts or ideas may be challenging to understand without multiple examples or explanations. This does not mean that the person doesn't understand anything; it simply suggests that they need more exposure to the concept to fully comprehend it.

   CHATGPT: In summary, the amount of data or examples required to understand something does not necessarily indicate a lack of understanding. It may simply reflect the complexity of the topic or the learning process.
  
Why don't you understand it?


I have to disagree with that. Like maybe for a toy example to demonstrate what I'm talking about, imagine I was teaching you the addition operation mod 100 and I gave a description of the operation f(x, y) = x + y % 100 for x, y in Z_100. If you take more than 100^2 samples to learn the function, I'm not sure you understand the function. Obviously, in that many samples, you could've just specified a look up table without understanding what each operation is doing or what the underlying domain is.

Part of why sample efficiency is interesting is that humans have high sample efficiency since they somehow perform reasoning and this generalizes well to some pretty abstract spaces. As someone who's worked with ML models, I'm genuinely envious of the generalization capabilities of humans and I think it's something that researchers are going to have to work on. I'm pretty sure there's still a lot of skepticism in academia that scale is everything needed to achieve better models and that we're still missing lots of things.

Some of my skepticism around claims of LLMs reasoning or performing human like things is that they really appear to not generalize well. Lots of the incredible examples people have shown are very slightly out of the bounds of the internet. When you start asking it for hard logic or to really synthesize something novel outside the domain of the internet, it rapidly begins to fail seemingly in proportion to the amount of knowledge the internet may have on it.

How might we differentiate being a really good soft/fuzzy lookup table of the internet that is able to fuzzily mix language together from genuine knowledge and generalization. This might just be a testament to the sheer scope and size of the internet in how much apparent capabilities GPT has.

This isn't to say they cannot be useful ever, a lot of work is derivative, but I think there's a large portion of the claim that it's understanding things that's unwarranted. Last I checked, chatGPT was giving wrong answers for the sums of very large numbers which is unusual if it understands addition.


You're describing over fitting to some look up table.

Can't be what's happening here. Because the examples LLMs are answering are well out of bounds of the "100^2" training data.

The internet is huge but it's not that huge. One can easily find chatGPT saying, doing or creating things that obviously come from a generalized model.

It's actually trivial to find examples of chatGPT answering questions with responses that are wholly unique and distinct from the training data, as in the answer it gave you could not have existed anywhere on the internet.

Clearly humans don't need that much training data. We can form generalizations from a much smaller sample size.

This does not indicate that for machine learning a generalization doesn't exist in LLMs when clearly the answers demonstrate that it does.


Like yes to some extent there is a mild amount of generalization in that it is not literally regurgitating the internet and it to some extent mixes text really well but I don't think that's obviously the full on generalization of understanding that humans have.

These models obviously are more sample efficient at learning relationships than a literal lookup table but like I've already said: my example was obviously extreme for the purposes of illustration that sample efficiency does seem to matter. If you used 100^2 - 1 samples, I'm still not confident you truly understand the concept. However, if you use 5 samples: I'm pretty sure you've generalized so I was hoping to illustrate a gradient.

I want to reemphasize another portion of my comment: it really does seem that when you step outside of the domain of the internet, the error rates rise dramatically especially when there is completely no analogous situation. Furthermore, the further from the internet samples, the seemingly more likely the error which should not occur if it understood these concepts for the purposes of generalization. Do you have links to examples you'd be willing to discuss?

Many examples I see are directly one of the top results on Google. The more impressive ones mix multiple results with some coherency. Sometimes people ask for something novel but there's a weirdly close parallel on the internet.

For example, people thought Sumplete was a new game but it turned out to be derivative: https://www.neowin.net/news/chatgpt-made-a-browser-puzzle-ga...

I think this isn't as impressive at least towards generalization. It seems to stitch concepts pretty haphazardly like in the novel language above that doesn't seem to respect the description (after all, why use brackets in a supposedly indentation based language). However, many languages do use brackets. It seems to suggest it correlates probable answers rather than reasons.


>I want to reemphasize another portion of my comment: it really does seem that when you step outside of the domain of the internet, the error rates rise dramatically especially when there is completely no analogous situation.

This is not surprising. A human would suffer from similar errors at a similar rate if it were exclusively fed an interpretation of reality that only consisted of text from the internet.

>These models obviously are more sample efficient at learning relationships than a literal lookup table but like I've already said: my example was obviously extreme for the purposes of illustration that sample efficiency does seem to matter. If you used 100^2 - 1 samples,

Even within the context of the internet there are enough conversational scenarios where you can have chatGPT answer things in ways that are far more generalized then "minor".

Take for example: https://www.engraved.blog/building-a-virtual-machine-inside/

Read it to the end. In the beginning you could say that the terminal emulation does exist as a similar copy in some form on the internet. But the structure that was built in the end is unique enough that it could be said nothing like it has ever existed on the internet.

Additionally you have to realize that while bash commands and results do exist in ON the internet, chatGPT cannot simply copy the logic and interactive behavior of the terminal from text. In order to do what it did (even in the beginning) it must "understand" what a shell is and it has to derive that understanding from internet text.


> This is not surprising. A human would suffer from similar errors at a similar rate if it were exclusively fed an interpretation of reality that only consisted of text from the internet.

I think this is surprising at least if the bot actually understands, especially for domains like math. It makes errors (like in adding large numbers) that shouldn't occur if it wasn't smearing together internet data. We would expect there to be many homework examples on the internet of adding relatively small numbers but less of large numbers. A large portion of what makes math interesting is that many of the structures we are interested in exist in large examples and in small examples (though not always) so if you understand the structure, it should be able to guide you pretty far. Presumably most humans (assuming they understand natural language) can read a description of addition then (with some trial and error) get it right for small cases. Then when presented with a large case would generalize easily. I don't usually guess out the output and instead internally try to generate and algorithm I follow.

> Take for example: https://www.engraved.blog/building-a-virtual-machine-inside/

When I first saw that a while back, I thought that was a more impressive example but only marginally more so than the natural language examples. Like how these models are trained under supervised learning imply that it should be able to capture relationships between text well. Like you said, there's a lot of content associating the output of a terminal with the input.

Maybe this is where I think we're miscommunicating right. I don't think even for natural language it's purely just copying text from the internet. It is capturing correlations and I would argue that simply capturing correlations doesn't imply an understanding. To some extent, it knows what the output of curl is supposed to look like and can use attention to figure out the website to then generate what an intended website is supposed to look like. Maybe the sequential nature of the commands is kind of impressive but I would argue that at least for the jokes.txt example, that particular sequence is at least probably very analogous to some tutorial on the internet. It's difficult to find since I would want to limit myself before 2021.

It can correlate the output of a shell to the input, and to some extent, the relationships between the output of a command and input are well produced and its training and suffused it with information about what terminal outputs (is this what you are referring to when you say it has to derive understanding from internet text?), but it doesn't seem to be reasoning about the terminal despite probably being trained on a lot of documentation about these commands.

Like we can imagine that this relationship is also not too difficult to capture. A lot of internet websites will have something like

| command |

some random text

| result |

where the bit in the middle varies but the result remains more consistent. So you should be able to treat that command result pair as a sort of sublanguage.

Like as a preliminary consistency check that I just performed right, I basically ran the same prompt and then did a couple of checks that maybe show confusing behavior if it's not just smearing popular text.

I asked it for a fresh Linux installation then checked that golang wasn't installed (it wasn't). However, when I ran find / -name go, it found a Go directory (/usr/local/go) but when I run "cd /usr/local/go" also tells me I can't cd into the directory since no such file exists which would be confusing behavior if it wasn't just capturing correlations and actually understanding what find does.

I "ls ." the current directory (for some reason I was in a directory with a single "go" directory now despite never having cd'ed to /usr/local) but then ran "stat Documents/" and it didn't tell me the directory didn't exist which is also confusing if it wasn't just generating similar output to the internet.

I asked it to "curl -Z http://google.com" (-Z is not a valid option) and it told me http is not a valid protocol for libcurl. Funnily enough, running "curl http://google.com" does in fact let me fetch the webpage.

I'm a bit suspicious that the commands that the author ran are actually pretty popular so it can sort of fuzz out what the "proper" response is. I would argue that the output appears mostly to be a fuzzed version of what is popular output on the internet.


Keep in mind there's a token limit. Once you pass that limit it no longer remembers.

Yes. You are pointing out various flaws which again is quite obvious. Everyone knows of the inconsistencies with these LLMs.

Too this I again say that the LLM understands some things and doesn't understand other things, its understanding of things is inconsistent and incomplete.

The only thing needed to prove understanding is to show chatGPT building something that can only be built by pure understanding. If you see one instance of this, then it's sufficient to say on some level chatGPT understands aspects of your query rather then doing a trivial query-response correlation you're implying is possible here.

Let's examine the full structure that was built here:

chatGPT was running an emulated terminal with an emulated internet with an emulated chatGPT with an emulated terminal.

It's basically a recursive model of a computer and the internet relative to itself. There is literally no exact copy of this anywhere in it's training data. chatGPT had to construct this model via correctly composing multiple concepts together.

The composition cannot occur correctly without chatGPT understanding how the components compose.

It's kind of strange that this was ignored. It was the main point of the example. I didn't emphasize this because this structure is obviously the heart of the argument if the article was read to the end.

Literally to generate the output of the final example chatGPT has to parse bash input execute the command over a simulated internet onto a simulated version of himself and again parse the bash sub command. It has a internal stack that it must use to put all the output together into a final json output.

So while It is possible for simple individual commands to be correlated with similar training data... for the highly recursive command on the final prompt.... There is zero explanation for how chatGPT can pick this up off of some correlation. There is virtually no identical structure on the internet... It has to understand the users query and compose the response from different components. That is the only explanation left.


Failures are proof, successes are not, a broken clock is right twice a day after all


So if a human fails at something it's proof that the human doesn't understand anything and that a human is a stochastic parrot?

I think your clock analogy doesn't fit. A car with a broken mirror still runs.


The output of GPT is “random” in a sense that output from humans are not.

I can ask it logic puzzles and sometimes it’ll get the logic puzzle right by chance, and other times it won’t. I can’t use the times it gets the logic puzzle right, as evidence that it understood the puzzle.

All of these blog posts that are popping up suffer from survivor bias, nobody is sharing blog posts of GPTs failures


> I can’t use the times it gets the logic puzzle right, as evidence that it understood the puzzle.

No this is bias from your end. It really depends on the puzzle. You need to give it a puzzle with trillions of possible answers. In this case if it gets a right answer even once the probability is so low for this to happen by chance that it means an aspect of the model understands the concept; While another aspect of the model doesn't understand it.

It's possible for even humans to have contradictory thinking and desires.

Therefore a claim cannot be made that it understands nothing.


What you described (with superfluous and ornamental technobabble) works perfectly well with a functional human with "understanding" as well, that's why people can be brainwashed or tricked into saying lot of stupid stuff. None of these proves that there is no understanding.


> (with superfluous and ornamental technobabble)

I know how the model works, there was no technobabble there. People who don't understand how it works might view it as magic, like how they view all technology they don't understand as magic, but that doesn't mean it is magic, we shouldn't listen to such crackpots.


There was absolutely zero technobabble in that persons comment. Perhaps you need to ask yourself how well you understand the space


>Edit: Also generating next state based on previous state is literally what the model does and is the definition of a Markov chain, Markov chains is a statistical concept and not just a word chain.

There's research (as in actual scientific papers) that shows that in LLMs, while the markov chain is the low level representation of what's going on, at a higher macro level there are other structures at play here. Emergent structures. This is of course similar to the emergence of a macro intelligence from the composition of simple summation and threshold machines (neurons) that the human brain is made out of. I can provide those papers if you so wish.

>Or just large dataset.

Even in a giant dataset it's easy to identify output that is impossible to exist in the training data. Simply do a google search for it. You will find can produce novel output for things that simply don't exist in the training data.


> at a higher macro level there are other structures at play here. Emergent structures.

Yes, this is a neural net model, that is what such models do and have done for decades already. I'm not sure why this is relevant. Do you argue that stable diffusion is intelligent since it has emergent structures? Or an image recognition system is intelligent since it has emergent structures? Those are the same things.

> Even in a giant dataset it's easy to identify output that is impossible to exist in the training data.

Markov chains veer in different directions, they don't reproduce the data.


>Yes, this is a neural net model, that is what such models do and have done for decades already. I'm not sure why this is relevant. Do you argue that stable diffusion is intelligent since it has emergent structures? Or an image recognition system is intelligent since it has emergent structures? Those are the same things.

No I am saying there are models for intelligence within the neural net that is explicitly different from a stochastic parrot for english vocabulary. For example in one instance they identified a structure in an LLM that logically models the rules and strategy for an actual board game.

Obviously I'm not referring to papers on plain old "neural networks" that shit is old news. I'm referring to new research on LLMs. Again I can provide you with papers provided you want evidence that will flip your stubborn viewpoint on this. It just depends on if your bias is flexible enough to accept such a sudden deconstruction of your own stance.


The fact that it adapts it state to fit the data isn't interesting in itself. An image recognition system forms a lot of macro structures around shapes or different logical parts of the image. Similarly an LLM forms a lot of macro structures around different kinds of text structures or words, including chess game series or song compositions or programming language tutorials. It is exactly the same kind of structures, just that some thinks those structures are sign of intelligence when they are applied to texts.

Can such macro structures model intelligence in theory? Yes. But as we see in practice they aren't very logical. For example in this article we see that its markov chain didn't have enough programming language descriptions, so it veered into printing brace scoped code when it said the language had whitespace based scoping. Similarly in popular puzzles, just change the words around and it will start printing nonsense since it cares about what words you use and not what those words mean.

Edit: Point is that existence of such structures doesn't make a model smart. You'd need to prove that these structures are smarter than before.


>Can such macro structures model intelligence in theory? Yes.

So you agree it's possible.

>Yes. But as we see in practice they aren't very logical. For example in this article we see that its markov chain didn't have enough programming language descriptions,

Well as humans we have many separate models for all different components an aspects of the world around us. Clearly LLMs form many models that in practice are not accurate. But that does not mean all the models are defective. The fact that it can write blog posts indicates that many of these models are remarkably accurate and that it understands the concept of a "blog post".

There is literal evidence of chatGPT answering questions as if it has an accurate underlying model of "understanding" as well as actual identified structures within the nerual net itself.

There is also evidence for your point of chatGPT clearly forming broken and inaccurate models by answering questions with wrong answers that don't make sense.

What gets me is that even when there is Clear and abundant evidence for both cases some people have to make the claim that chatGPT doesn't understand anything. The accurate answer is that LLMs understand some things and it doesn't understand other things.


> I can provide those papers if you so wish.

I'd like to see them but I don't have a background in AI or theoretical computer science. Can you post a few of them?


- GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559


Link the papers.


- GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: