Hacker News new | past | comments | ask | show | jobs | submit login

I think that in the end predicting words is non optimal , most things we want to do are things that are related to the internal representation of concepts that exist deeper in the layers.

I at least do not want to predict the next token, I want to predict the next concept in a chain of reasoning, but it seems that currently we are stuck at using the same representation for autoregression we use for training.

Maybe we can come up with a better way to construct these chains once we understand the models better.

Edit: Typo




The weird thing is that training on text encodes information on related concepts.

from u/gmt2027

>An extreme version of the same idea is the difference between understanding DNA vs the genome of every individual organism that has lived on earth. The species record encodes a ton of information about the laws of nature, the composition and history of our planet. You could deduce physical laws and constants from looking at this information, wars and natural disasters, economic performance, historical natural boundaries, the industrial revolution and a lot more.

and u/thomastjeffery

>That entropy is the secret sauce: the extra data that LLMs are sometimes able to model. We don't see it, because we read language, not text.


How would you represent or interpret the "next concept" if not with some kind of token, though?

Language is a communicated abstraction of concepts, and it would seem that internal representations of those concepts can emerge from something optimized for token prediction. Or at least: internal representations of the speaker to be predicted, including the knowledge they may possess.


Language is indeed communicated abstraction of concepts, but it emerged under a lot of constraints (our auditory system, our brains inherent bias towards visual stimuli etc. ). Predicting in this constrained system most likely is suboptimal.

Imagine translating language into an optimized representation free from human constraints, doing autoregressive prediction in this domain and only than translate back.

As far as I understand current models, this is not yet how they work.


A chain of LLMs can work in that regard, using intermediary prompts that feed answers to the next prompt. Make the LLM build a list of sections, then make it fill them with examples, then make it enrich the text. Maybe a last layer for error correction, clarity, removing mentions of "as an AI model", etc.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: