>Doesn’t that internal state get blown away and recreated for every “next token”...

>Doesn’t that internal state get blown away and recreated for every “next token”

That is correct. When a model has a good idea of the next 5 words, after it has emitted the first of those 5 most architectures make no further use of the other 4 and regenerate likely the same information again in the next inference cycle.

There are architectures that don't discard all that information but the standard LLM has generally outperformed them, for now.

There are interesting philosophical implications if LLMs were to advance to a level to be considered sentient. Would it not be constantly creating and killing a thinking being for every token. On the other hand if context is considered memory, perhaps continuity of identity is based upon memory and all that other information are simply forgotten idle thoughts. We have no concept of what our previous thoughts were except from our memory. Is that not the same.

Sometimes I wonder if some of the resistance to AI is because it can do things that we think requires abilities that we would like to believe that we possess ourselves, and showing that they are not necessary creates the possibility that we might not have have those abilities.

There was a great observation recently in an interview (I forget the source, but the interviewer's last name was Bi) that some of the discoveries that met the most resistance in history such as the Earth orbiting the Sun, or Darwin's theory of evolution were similar in that they implied that we are not a unique special case.