I think that in the end predicting words is non optimal , most things we want to...

Dwolb · on June 27, 2023

The weird thing is that training on text encodes information on related concepts.

from u/gmt2027

>An extreme version of the same idea is the difference between understanding DNA vs the genome of every individual organism that has lived on earth. The species record encodes a ton of information about the laws of nature, the composition and history of our planet. You could deduce physical laws and constants from looking at this information, wars and natural disasters, economic performance, historical natural boundaries, the industrial revolution and a lot more.

and u/thomastjeffery

>That entropy is the secret sauce: the extra data that LLMs are sometimes able to model. We don't see it, because we read language, not text.

niam · on June 27, 2023

How would you represent or interpret the "next concept" if not with some kind of token, though?

Language is a communicated abstraction of concepts, and it would seem that internal representations of those concepts can emerge from something optimized for token prediction. Or at least: internal representations of the speaker to be predicted, including the knowledge they may possess.

niemandhier · on June 27, 2023

Language is indeed communicated abstraction of concepts, but it emerged under a lot of constraints (our auditory system, our brains inherent bias towards visual stimuli etc. ). Predicting in this constrained system most likely is suboptimal.

Imagine translating language into an optimized representation free from human constraints, doing autoregressive prediction in this domain and only than translate back.

As far as I understand current models, this is not yet how they work.

vianneychevalie · on June 27, 2023

A chain of LLMs can work in that regard, using intermediary prompts that feed answers to the next prompt. Make the LLM build a list of sections, then make it fill them with examples, then make it enrich the text. Maybe a last layer for error correction, clarity, removing mentions of "as an AI model", etc.