Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is not surprising. A human would suffer from similar errors at a similar rate if it were exclusively fed an interpretation of reality that only consisted of text from the internet.

I think this is surprising at least if the bot actually understands, especially for domains like math. It makes errors (like in adding large numbers) that shouldn't occur if it wasn't smearing together internet data. We would expect there to be many homework examples on the internet of adding relatively small numbers but less of large numbers. A large portion of what makes math interesting is that many of the structures we are interested in exist in large examples and in small examples (though not always) so if you understand the structure, it should be able to guide you pretty far. Presumably most humans (assuming they understand natural language) can read a description of addition then (with some trial and error) get it right for small cases. Then when presented with a large case would generalize easily. I don't usually guess out the output and instead internally try to generate and algorithm I follow.

> Take for example: https://www.engraved.blog/building-a-virtual-machine-inside/

When I first saw that a while back, I thought that was a more impressive example but only marginally more so than the natural language examples. Like how these models are trained under supervised learning imply that it should be able to capture relationships between text well. Like you said, there's a lot of content associating the output of a terminal with the input.

Maybe this is where I think we're miscommunicating right. I don't think even for natural language it's purely just copying text from the internet. It is capturing correlations and I would argue that simply capturing correlations doesn't imply an understanding. To some extent, it knows what the output of curl is supposed to look like and can use attention to figure out the website to then generate what an intended website is supposed to look like. Maybe the sequential nature of the commands is kind of impressive but I would argue that at least for the jokes.txt example, that particular sequence is at least probably very analogous to some tutorial on the internet. It's difficult to find since I would want to limit myself before 2021.

It can correlate the output of a shell to the input, and to some extent, the relationships between the output of a command and input are well produced and its training and suffused it with information about what terminal outputs (is this what you are referring to when you say it has to derive understanding from internet text?), but it doesn't seem to be reasoning about the terminal despite probably being trained on a lot of documentation about these commands.

Like we can imagine that this relationship is also not too difficult to capture. A lot of internet websites will have something like

| command |

some random text

| result |

where the bit in the middle varies but the result remains more consistent. So you should be able to treat that command result pair as a sort of sublanguage.

Like as a preliminary consistency check that I just performed right, I basically ran the same prompt and then did a couple of checks that maybe show confusing behavior if it's not just smearing popular text.

I asked it for a fresh Linux installation then checked that golang wasn't installed (it wasn't). However, when I ran find / -name go, it found a Go directory (/usr/local/go) but when I run "cd /usr/local/go" also tells me I can't cd into the directory since no such file exists which would be confusing behavior if it wasn't just capturing correlations and actually understanding what find does.

I "ls ." the current directory (for some reason I was in a directory with a single "go" directory now despite never having cd'ed to /usr/local) but then ran "stat Documents/" and it didn't tell me the directory didn't exist which is also confusing if it wasn't just generating similar output to the internet.

I asked it to "curl -Z http://google.com" (-Z is not a valid option) and it told me http is not a valid protocol for libcurl. Funnily enough, running "curl http://google.com" does in fact let me fetch the webpage.

I'm a bit suspicious that the commands that the author ran are actually pretty popular so it can sort of fuzz out what the "proper" response is. I would argue that the output appears mostly to be a fuzzed version of what is popular output on the internet.



Keep in mind there's a token limit. Once you pass that limit it no longer remembers.

Yes. You are pointing out various flaws which again is quite obvious. Everyone knows of the inconsistencies with these LLMs.

Too this I again say that the LLM understands some things and doesn't understand other things, its understanding of things is inconsistent and incomplete.

The only thing needed to prove understanding is to show chatGPT building something that can only be built by pure understanding. If you see one instance of this, then it's sufficient to say on some level chatGPT understands aspects of your query rather then doing a trivial query-response correlation you're implying is possible here.

Let's examine the full structure that was built here:

chatGPT was running an emulated terminal with an emulated internet with an emulated chatGPT with an emulated terminal.

It's basically a recursive model of a computer and the internet relative to itself. There is literally no exact copy of this anywhere in it's training data. chatGPT had to construct this model via correctly composing multiple concepts together.

The composition cannot occur correctly without chatGPT understanding how the components compose.

It's kind of strange that this was ignored. It was the main point of the example. I didn't emphasize this because this structure is obviously the heart of the argument if the article was read to the end.

Literally to generate the output of the final example chatGPT has to parse bash input execute the command over a simulated internet onto a simulated version of himself and again parse the bash sub command. It has a internal stack that it must use to put all the output together into a final json output.

So while It is possible for simple individual commands to be correlated with similar training data... for the highly recursive command on the final prompt.... There is zero explanation for how chatGPT can pick this up off of some correlation. There is virtually no identical structure on the internet... It has to understand the users query and compose the response from different components. That is the only explanation left.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: