Stop thinking about words. Think about concepts. As soon as you turn words/tokens into high dimensional embeddings and start playing around with them, they stop being words.
Claiming LLMs are manipulating 'high dimension concepts' not words when all they can do with these 'high dimension concepts' is outputting words is meaningless.
It's meaningful because the model can learn relationships between things (like, that a dog and cat are often pets and interchangdable in certain circumstances) and then produce output that it has never seen. Read the word2vec paper as an example.
And, an LLM can output whatever you want. Change the last linear layer from (embedding dimension -> vocab size) to (embedding dimension -> sentiment categories) for example, and hey presto it can produce sentiment analysis (with surprisingly few samples, because... it's learned all the concepts in the transformer blocks in pre-training).
You gotta actually have a clue about how they work. This stuff is no longer magic, there are many resources to learn them in detail.
This isn't magic. Precisely. Calling it manipulating high dimension concepts muddles the waters. It's not. It's merely using a whole lot of numbers to manipulate word tokens.
And it's quite limited at that despite the impressive ability to store up language corner cases and put them back out.
"an LLM can output whatever you want" -> this right here is approaching magical thinking. I am really clear headed about it's capabilities and limitations, breathlessly describing it's internals as if it implies it's anything more then a first decent NLP interface is the problem and you seem to be indulging in it.
It is literally manipulating high dimensional vectors. GPT-4 embedding dimension is 12288(I think) and Llama-3 is 4096. Thousands of said high dimensional vectors come into the model and operations like +,-,x,/,exp,log, gelu etc etc are done on them and combinations of them. And during training there are literally geometric relationships created between the vectors based on concepts humans use. This isn't some pie in the sky assertion using hand wavy words, those are concrete statements that don't muddy any water.
It might be magical to you, it isn't magical to anyone actually working with these things that you can trivially change the last layer and have the outputs represent whatever you want. Get some samples, write a loss function, go to work. Suggesting they can only output words/tokens only displays a complete misunderstanding of how they work under the hood.
Why you keep saying it's magical to me? You're the one describing the underlying maths as if it makes it special. To the model these 'high dimensional embeddings' are just long lists of numbers.
I've said it before and I'll repeat. Using terabytes of number lists to achieve a decent NLP isn't all that impressive.
Because you don't understand it, and it's understandable. Get a version of GPT-2 or something from huggingface. Go through layer by layer.
Anyone with brain is impressed by the state of the art LLMs. What's next? The latest GPUs aren't impressive because "shoving a bunch of electrons all around the place to do elementary arithmetic isn't impressive"? The difficulty is getting a bajillion little things to come together in a way that is useful. Pointing out that some complex thing is not impressive because it's "just lots of little simple things" is dumb as hell.