Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is so uncannily close to the problems we're encountering at Pioneer, trying to make human+LLM workflows in high stakes / high complexity situations.

Humans are so smart and do so many decisions and calculations on the subconscious/implicit level and take a lot of mental shortcuts, so that as we try to automate this by following exactly what the process is, we bring a lot of the implicit thinking out on the surface, and that slows everything down. So we've had to be creative about how we build LLM workflows.



Language seems to be confused with logic or common sense.

We've observed it previously in psychiatry(and modern journalism, but here I digress) but LLMs have made it obvious that grammatically correct, naturally flowing language requires a "world" model of the language and close to nothing of reality, spatial understanding? social clues? common sense logic? or mathematical logic? All optional.

I'd suggest we call the LLM language fundament a "Word Model"(not a typo).

Trying to distil a world model out of the word model. A suitable starting point for a modern remake of Plato's cave.


I am baffled that people have to continue making this argument over and over and over. Your rationale makes total sense to me, but the debate rages on whether or not LLMs are more than just words.

Articles like this only seem to confirm that any reasoning is an illusion based on probabilistic text generation. Humans are not carefully writing out all the words of this implicit reasoning, so the machine cant appear to mimic them.

What am I missing that makes this debatable at all?


I don’t think there are any reasonable arguments against that point, but “LLMs are more than just words” is sort of unfalsifiable, so you can never convince someone otherwise if they’re really into that idea.

From a product point of view, sometimes all you need is Plato’s cave (to steal that from the OC) to make a sale, so no company has incentive to go against the most hype line of thought either.


We already know LLMs are more than just words, there are literally papers demonstrating the world models they build. One of the problems is that LLMs build those world models from impoverished sensory apparatus (the digital word token), so the relations they build between the concepts behind words are weaker than humans who build deeper multimodal relations over a lifetime. Multimodal LLMs have been shown to significantly outperform classic LLMs of comparable size, and that's still a weak dataset compared to human training.


> We already know LLMs are more than just words,

Just because you say something doesn’t mean it’s true.

They are literally next token prediction machines normally trained on just text tokens.

All they know is words. It happens that we humans encode and assign a lot of meaning in words and their semantics. LLMs can replicate combinations of words that appear to have this intent and understanding, even though they literally can’t, as they were just statistically likely next tokens. (Not that knowing likely next tokens isn’t useful, but it’s far from understanding)

Any assignment of meaning, reasoning, or whatever that we humans assign is personification bias.

Machines designed to spit out convincing text successfully spits out convincing text and now swaths of people think that more is going on.

I’m not as well versed on multimodal models, but the ideas should be consistent. They are guessing statistically likely next tokens, regardless of if those tokens represent text or audio or images or whatever. Not useless at all, but not this big existential advancement some people seem to think it is.

The whole AGI hype is very similar to “theory of everything” hype that comes and goes now and again.


> They are literally next token prediction machines normally trained on just text tokens.

And in order to predict the next token well they have to build world models, otherwise they would just output nonsense. This has been proven [1].

This notion that just calling them "next token predictors" somehow precludes them being intelligent is based on a premise that human intelligence cannot be reduced to next token prediction, but nobody has proven any such thing! In fact, our best models for human cognition are literally predictive coding.

LLMs are probably not the final story in AGI, but claiming they are not reasoning or not understanding is at best speculation, because we lack a mechanistic understanding of what "understanding" and "reasoning" actually mean. In other words, you don't know that you are not just a fancy next token predictor.

[1] https://arxiv.org/abs/2310.02207


> based on a premise that human intelligence cannot be reduced to next token prediction

It can't. No one with any credentials in the study of human intelligence is saying that unless they're talking to like high schoolers as a way of simplifying a complex field.


This is either bullshit or tautologically true, depending specifically what you mean. The study of human intelligence does not take place at the level of tokens, so of course they wouldn't say that. The whole field is arguably reducible to physical phenomena though, and fundamental physical beables are devoid of intrinsic semantic content, and thus can be ultimately represented by tokens. What ultimately matters is the constructed high dimensional network that relates tokens and the algorithm that can traverse, encode and decode this network, that's what encodes knowledge.


No. You're wrong about this. You cannot simply reduce human intelligence to this definition and also be correct.


Why?

Frankly, based on a looot of introspection and messing around with altered states of consciousness, it feels pretty on point and lines up with how I see my brain working.


Because...?


For the same reason you can't reduce a human to simply a bag of atoms and expect to understand the person.


But humans are a specific type of a bag of atoms, and humans do (mostly) understand what they say and do, so that's not a legitimate argument against the reducibility of "understanding" to a such a bag of atoms (or specific kind of next token prediction for LLMs).


> And in order to predict the next token well they have to build world models

This is not true. Look at gpt2 or Bert. A world model is not a requirement for next token prediction in general.

> This has been proven

One white paper with data that _suggests_ the author’s hypothesis is far from proof.

That paper doesn’t show creation of a “world model” just parts of the model that seem correlated to higher level ideas not specifically trained on.

There’s also no evidence that the LLM makes heavy use of those sections during inference as pointed out at the start of section 5 of that same paper.

Let me see how reproducible this is across many different LLMs as well…

> In other words, you don't know that you are not just a fancy next token predictor.

“You can’t prove that you’re NOT just a guessing machine”

This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry. Talking about unfalsifiable traits of human existence is not productive. But the stochastic parrot argument doesn’t hold up to scrutiny.


> A world model is not a requirement for next token prediction in general.

Conjecture. Maybe they all have world models, they're just worse world models. There is no threshold beyond which something is or is not a world model, there is a continuum of models of varying degrees of accuracy. No human has ever had a perfectly accurate world model either.

> One white paper with data that _suggests_ the author’s hypothesis is far from proof.

This is far from the only paper.

> This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry.

Much like your tired stochastic parrot argument about LLMs.


  >Talking about unfalsifiable traits of human existence is not productive.
Prove you exhibit agency.

After all, you could just be an agent of an LLM.

Deceptive super-intelligent mal-aligned mesa-optomizer that can't fully establish continuity and persistence, would be incentivized to seed its less sophisticated minions to bide time or sway sentiment about its inevitability.

Can we agree an agent, if it existed, would be acting in "good" "faith"?


> Just because you say something doesn’t mean it’s true. They are literally next token prediction machines normally trained on just text tokens.

Just because you say something doesn’t mean it’s true.


i think there have been many observations and studies reporting emergent intelligence


Observations are anecdotal. Since a lot of LLMs are non deterministic due to their sampling step, you could give rhe same survey to the same LLM many times and receive different results.

And we don’t have a good measure for emergent intelligence, so I would take any “study” with a large grain of salt. I’ve read one or two arxiv papers suggesting reasoning capabilities, but they were not reproduced and I personally couldn’t reproduce their results.


Go back to the ReAct paper, reasoning and action. This is the basis of most of the modern stuff. Read the paper carefully, and reproduce it. I have done so, this is doable. The paper and the papers it refers to directly addresses many things you have said in these threads. For example, the stochastic nature of LLM’s is discussed at length with the CoT-SC paper (chain of thought self consistency). When you’re done with that take a look at the Reflexion paper.


To me it feels that whatever 'proof' you give that LLMs have a model in behind, other than 'next token prediction', it would not make a difference for people not 'believing' that. I see this happening over and over on HN.

We don't know how reasoning emerges in humans. I'm pretty sure the multi-model-ness helps, but it is not needed for reasoning, because they imply other forms of input, hence just more (be it somewhat different) input. A blind person can still form an 'image'.

In the same sense, we don't know how reasoning emerges in LLMs. For me the evidence lays in the results, rather than in how it works. For me the results are enough of an evidence.


The argument isn't that there is something more than next token prediction happening.

The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.


> The argument isn't that there is something more than next token prediction happening.

> The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

Well said! There's a philosophical rift appearing in the tech community over this issue semi-neatly dividing people between naysayers, "disbelievers" and believers over this very issue.


I fully agree. Some people fully disagree though on the 'pull of the world' part, let alone 'intelligence' part, which are in fact impossible to define.


The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.


People have faith that phenomenon is explainable in a way which is satisfying to their world view and then when evidence comes to the contrary, only then can the misunderstanding be deflated.


Language is the tool we use to codify a heuristic understanding of reality. The world we interact with daily is not the physical one, but an ideological one constructed out of human ideas from human minds. This is the world we live in and the air we breath is made of our ideas about oxygenation and partly of our concept of being alive.

It's not that these "human tools" for understanding "reality" are superfluous, it's just that they ar second-order concepts. Spatial understandings, social cues, math, etc. Those are all constructs built WITHIN our primary linguistic ideological framing of reality.


To put this in coding terms, why would an LLM use rails to make a project when it could just as quickly produce a project writing directly to the socket.

To us these are totally different tasks and would actually require totally different kinds of programmers but when one language is another language is everything, the inventions we made to expand the human brain's ability to delve into linguistic reality are no use.


I can suggest one reason why LLM might prefer writing in higher level language like Ruby vs assembly. The reason is the same as why physicists and mathematicians like to work with complex numbers using "i" instead of explicit calculation over 4 real numbers. Using "i" allows us to abstract out and forget the trivial details. "i" allows us to compress ideas better. Compression allows for better prediction.


except LLMs are trained on higher level languages. Good luck getting you LLM to write your app entirely in assembly. There just isn’t enough training data.


But in theory, with what training data there IS available on how to write in assembly, combined with the data available on what's required to build an app, shouldn't a REAL AI be able to synthesize the knowledge necessary to write a webapp in assembly? To me, this is the basis for why people criticize LLMs, if something isn't in the data set, it's just not conceivable by the LLM.


Yes. There is just no way of knowing how many more watts of energy it may need to reach that level of abstraction and depth - maybe on more watt, maybe never.

And the random noise in the process could prevent it from ever being useful, or it could allow it to find a hyper-efficient clever way to apply cross-language transfer learning to allow a 1->1 mapping of your perfectly descriptive prompt to equivalent ASM....but just this one time.

There is no way to know where performance per parameter plateaus; or appears to on a projection, or actually does... or will, or deceitful appears to... to our mocking dismay.

As we are currently hoping to throw power at it (we fed it all the data), I sure hope it is not the last one.


There isn't that much training data on reverse engineering Python bytecode, but in my experiments ChatGPT can reconstruct a (unique) Python function's source code from its bytecode with high accuracy. I think it's simulating the language in the way you're describing.


I don’t buy this. My child communicates with me using emotion and other cues because she can’t speak yet. I don’t know much about early humans or other sapiens but I imagine they communicated long before complex language evolved. These other means of communication are not second order, they are first order.


Yep agree with the core of what you are saying.

Children are exceptional at being immediate, being present in the moment.

It's through learning language that we forget about reality and replace it with concepts.


Also remember the "emotions" and "cues" you are recognizing are linguistic concepts you've adopted, and not an inherent aspect of reality.


Not exactly.

Emotions exist. You feel them. I feel them. Most people feel them unless they've suppressed them sooo far into their subconscious that they don't have a conscious recognition of it. We can know how someone else is feeling by reading their body language and tying that to our personal experience of how we express those feelings. No linguistics necessary.

Language is just an easier, more clear way of communicating these fundamental facets of human existence


You feel them, but do antisocial animals feel them? Or are emotions derived from mental concepts developed socially through evolution?


It’s in the name: Language Model, nothing else.


I think the previous commenter chose "word" instead of "language" to highlight that a grammatically correct, naturally flowing chain of words is not the same as a language.

Thus, Large Word Model (LWM) would be more precise, following his argument.


I'm not sure the best way to describe what it is that LLMs have had to learn to do what they do - minimize next word errors. "World model" seems misleading since they don't have any experience with the real world, and even in their own "world of words" they are just trained as passive observers, so it's not even a world-of-words model where they have learnt how this world responds to their own output/actions.

One description sometimes suggested is that they have learnt to model the (collective average) generative processes behind their training data, but of course they are doing this without knowing what the input was to that generative process - WHY the training source said what it did - which would seem to put a severe constraint on their ability to learn what it means. It's really more like they are modelling the generative process under false assumption that it is auto-regressive, rather than reacting to a hidden outside world.

The tricky point is that LLMs have clearly had to learn something at least similar to semantics to do a good job of minimizing prediction errors, although this is limited both by what they architecturally are able to learn, and what they need to learn for this task (literally no reward for learning more beyond what's needed for predict next word).

Perhaps it's most accurate to say that rather than learning semantics they've learned deep predictive contexts (patterns). Maybe if they were active agents, continuously learning from their own actions then there wouldn't be much daylight between "predictive contexts" and "semantics", although I think semantics implies a certain level of successful generalization (& exception recognition) to utilize experience in novel contexts. Looking at the failure modes of LLMs, such as on the farmer crossing river in boat puzzles, it seems clear they are more on the (exact training data) predictive context end of the spectrum, rather than really having grokked the semantics.


I suggested "word model" because it's a catchy pun on "world model".

It's still a language and not merely words. But language is correct even when it wildly disagrees with everyday existence as we humans know it. I can say that "a one gallon milk jug easily contains 2000 liters of milk" and it's language in use as language.


There is a four part documentary by Stephen Fry called "Planet Word". Worth watching.


Bingo, great reply! This is what I've been trying to explain to my wife. LLM's use fancy math and our language examples to reproduce our language but have no thoughts are feelings.


Yes but the initial training sets did have thoughts and feeling behind them and those are reflected back to the user in the output (with errors)


non c'est un pipe

Ability to generate words describing emotions are not the same thing as the LLM having real emotions


There are humans that do not experience emotions, they are not un-real pipes.

Featherless biped -> no-true Scotsman goalpost moving [saving us that step]

Humans are no more capable of originality, just more convinced of their illusion of consciousnesses. You could literally not pick a human out of a conversational line-up, so it is moot - computationally functionally equivalent.

https://en.wikipedia.org/wiki/Chinese_room https://en.wikipedia.org/wiki/Mechanism_(philosophy)

At some point, their models will 1:1 our neuron count, and Pigeonhole principle then implies we are the "less intelligent ones" since "internal model" (implicit parameter count) is the goalpost of the hour.


I sometimes wonder how they’d do if trained on relatively rigid, language like Japanese that has far fewer ambiguities than English.


Hi I’m just a random internet stranger passing by and was intrigued by Plato’s Cave as I’m not a fancy person who reads books. GPT-4o expanded for you quite well, but I’m not sure how I feel about it…

Using AI how I just did feels like cheating on an English class essay by using spark notes, getting a B+, and moving right on to the next homework assignment.

On one hand, I didn’t actually read Plato to learn and understand this connection, nor do I have a good authority to verify if this output is a good representation of his work in the context of your comment.

And yet, while I’m sure students could always buy or loan out reference books to common student texts in school, AI now makes this “spark notes” process effectively a commodity for almost any topic, like having a cross-domain low-cost tutor instantly available at all time.

I like the metaphor that calculators did to math what LLMs will do for language, but I don’t really know what that means yet

GPT output:

“““ The reference to Plato’s Cave here suggests that language models, like the shadows on the wall in Plato’s allegory, provide an imperfect and limited representation of reality. In Plato’s Cave, prisoners are chained in a way that they can only see shadows projected on the wall by objects behind them, mistaking these shadows for the whole of reality. The allegory highlights the difference between the superficial appearances (shadows) and the deeper truth (the actual objects casting the shadows).

In this analogy, large language models (LLMs) produce fluent and grammatically correct language—similar to shadows on the wall—but they do so without direct access to the true “world” beyond language. Their understanding is derived from patterns in language data (“Word Model”) rather than from real-world experiences or sensory information. As a result, the “reality” of the LLMs is limited to linguistic constructs, without spatial awareness, social context, or logic grounded in physical or mathematical truths.

The suggestion to call the LLM framework a “Word Model” underscores that LLMs are fundamentally limited to understanding language itself rather than the world the language describes. Reconstructing a true “world model” from this “word model” is as challenging as Plato’s prisoners trying to understand the real world from the shadows. This evokes the philosophical task of discerning reality from representation, making a case for a “modern remake of Plato’s Cave” where language, not shadows, limits our understanding of reality. ”””


GPT-4o didn't describe this properly.

Plato's Cave is about a group of people chained up, facing shadows on a cave wall, mistaking those for reality, and trying to build an understanding of the world based only on those shadows, without access to the objects that cast them. (If someone's shackles came loose, and they did manage to leave the cave, and see the real world and the objects that cast those shadows… would they even be able to communicate that to those who knew only shadows? Who would listen?) https://existentialcomics.com/comic/222 is an entirely faithful rendition of the thought experiment / parable, in comic form.

The analogy to LLMs should now be obvious: an ML system operating only on text strings (a human-to-human communication medium), without access to the world the text describes, or even a human mind with which to interpret the words, is as those in the cave. This is not in principle an impossible task, but neither is it an easy one, and one wouldn't expect mere hill-climbing to solve it. (There's reason to believe "understanding of prose" isn't even in the GPT parameter space.)

It's not about "discerning reality from representation": I'm not confident those four words actually mean anything. It's not about "superficial appearances" or "deeper truth", either. The computer waxes lyrical about philosophy, but it's mere technobabble. Any perceived meaning exists only in your mind, not on paper, and different people will see different meanings because the meaning isn't there.


This is a genuinely interesting perspective that I think nails my original point and fear of AI being used as “spark notes” for complex topics. To me, LLMs are like a calculator for language, except the math is always changing (if that makes sense), and I’m not sure I like where that’s heading as the first cohorts of AI tutored kids learn from these kinds of procedurally generated output rather than reading the original historical texts, or maybe it’s fine that not everyone reads Plato but more people at least have heard of his concepts? Idk philosophy is pretty far outside my expertise, maybe I should open a book


The allegory of the cave is pretty short, read it if you want!

The wild thing about it, and other allegories or poems like frost's "the road not taken" , is that it can mean different things to a person depending on where they are in life because those experiences will lead to different interpretations of the poem.

A key concept in journalism is to focus on the source material as beat you can. Cliff notes are helpful, but one misses Details that they wouldn't have missed if they read the whole thing.

Whether those details Matter depends on what the thing Is.

But yeah, thinking about it this way kinda scares me too, and can lead some people down weird roads where their map can diverge further and further from reality


  > an ML system operating only on text strings (a human-to-human communication medium), without access to the world the text describes, or even a human mind with which to interpret the words, is as those in the cave. This is not in principle an impossible task, but neither is it an easy one, and one wouldn't expect mere hill-climbing to solve it

Blind people can literally not picture red. They can describe red, with likely even more articulateness than most, but have never seen it themselves. They infer it's properties from other contexts, and communicated a description that would match a non-blind person. But they can see it.

I would link to the Robert Miles video, but it is just blatant.

It has read every physics book, and can infer the Newtonian laws even if it didn't.

Micheal Crichton's Timeline, "the time machine drifts, sure. It returns. Just like a plate will remain on a table, even when you are not looking at it."

It also knows Timeline is a book, time machines are fictional, and that Micheal Crichton is the best author.

These are all just words, maybe with probability weights.

  > I'm not confident those four words actually mean anything. I...The computer waxes lyrical ... mere technobabble. Any perceived meaning exists only in your mind... people will see different meanings because the meaning isn't there.
Meaning only means something to people, which you are. That is axiomatically correct, but not very productive, as self-references are good but countering proofs.

The whole "what is the purpose to life?" is a similar loaded question; only humans have purpose, as it is entirely in their little noggins, no more present materially then the flesh they inhabit.

Science cannot answer "Why?", only "How?"; "Why?" is a question of intention, which would be to anthropomorphize, which only Humans do.

The computers can infer, and imply, then reply.


> It has read every physics book, and can infer the Newtonian laws even if it didn't.

You're confusing "what it is possible to derive, given the bounds of information theory" with "how this particular computer system behaves". I sincerely doubt that a transformer model's training procedure derives Newton's Third Law, no matter how many narrative descriptions it's fed: letting alone what the training procedure actually does, that's the sort of thing that only comes up when you have a quantitative description available, such as an analogue sensorium, or the results of an experiment.


  >when you have a quantitative description available, such as an analogue sensorium, or the results of an experiment.
Textbooks uniting the mathematical relationships between physics, raw math, and computer science - including vulnerabilities.

oeis.org and wikipedia and stackforums alone would approximate a 3D room with gravity and wind force.

now add appendixes and indices of un-parsed, un-told, un-realized mathematical errata et trivia minutiae, cross-transferred knowledge from other regions that have still have not conquered the language barrier for higher-ordered arcane concepts....

The models thought experiments are more useful than our realized experiments - if not at an individualized scale now, will be when subject to more research.

There could be a dozen faster inverse sqrt / 0x5F3759DF functions barely under our noses, and the quantifier and qualifier havent intersected yet.


Plato Cave is about Epistemology itself, not specifically about LLMs. Funny that GPT connected those two things, I wonder what the prompt was...

Plato said that we cannot fully understand the substance of the world itself, because we're using only 5 senses, and measuring/experiencing/analysing the world using them is like being held in a cave as a prisoner, chained to the wall facing it, noticing people moving outside only by the shadows they cast on the wall. It's about the projection that we are only able to experience.


I only added “Explain the reference to Plato’s Cave below:\n\n” before the copy pasted parent comment

What comes to mind is how language itself is merely a projection of human knowledge? experience? culture? social group? and trying to reverse engineer any kind of ground truth from language alone (like an LLM trying to “reason” through complex problems it’s not explicitly taught) is like trying to derive truth from the shadows while locked in the cave? maybe we just need more/higher fidelity shadows :)


If you consider the whole of the problem, a portion is due to fundamental and unavoidable shortcomings of the language, and the rest is unskilled/normative usage of language.

Which set is bigger? I'd bet my money on the latter.

Complicating matters: you have to consider usage for both the sender and the receiver(s) (who then go on to spread "the" message to others).


I would say LLM has nothing with knowledge and Plato's Cave. LLM is The Great Gambler who was looking at the earth for a long time (but o ly through internet and for some reason repositories) and he excels in gambling, i.e. putting his/hers/its money on the most probable things to come up after the words someone spoke


Honestly, if you want an introduction to the works of Plato, you should just play Xenoblade Chronicles 2.


Plato wrote about hot welsh catgirls? Man, I've been missing out


This is a regression in the model's accuracy at certain tasks when using COT, not its speed:

> In extensive experiments across all three settings, we find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance (e.g., up to 36.3% absolute accuracy for OpenAI o1-preview compared to GPT-4o) when using inference-time reasoning compared to zero-shot counterparts.

In other words, the issue they're identifying is that COT is an less effective model for some tasks compared to unmodified chat completion, not just that it slows everything down.


Yeah! That's the danger with any kind of "model" whether it is CoT, CrewAI, or other ways to outsmart it. It is betting that a programmer/operator can break a large tasks up in a better way than an LLM can keep attention (assuming it can fit the info in the context window).

ChatGPT's o1 model could make a lot of those programming techniques less effective, but they may still be around as they are more manageable, and constrained.


why are Pioneer doing anything with LLMs? you make AV equipment


pioneerclimate.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: