Unless humans exceed the Turing computable, the human brain is the existence proof that a sufficiently complex Turing machine can be made to replicate human thought in a compact space.
That encoding a naive/basic UTM in an LLM would potentially be impractical is largely irrelevant in that case, because for any UTM you can "compress" the program by increasing the number of states or symbols, and effectively "embedding" the steps required to implement a more compact representation in the machine itself.
While it is possible using current LLM architectures might make encoding a model that can be efficient enough to be physically practical impossible, there's no reasonable basis for assuming this approach can not translate.
You seem to be making a giant leap from “human thought can probably be emulated by a Turing machine” to “human thought can probably be emulated by LLMs in the actual physical universe.” The former is obvious, the latter I’m deeply skeptical of.
The machine part of a Turing machine is simple. People manage to build them by accident. Programming language designers come up with a nice-sounding type inference feature and discover that they’ve made their type system Turing-complete. The hard part is the execution speed and the infinite tape.
Ignoring those problems, making AGI with LLMs is easy. You don’t even need something that big. Make a neural network big enough to represent the transition table of a Turing machine with a dozen or so states. Configure it to be a universal machine. Then give it a tape containing a program that emulates the known laws of physics to arbitrary accuracy. Simulate the universe from the Big Bang and find the people who show up about 13 billion years later. If the known laws of physics aren’t accurate enough, compare with real-world data and adjust as needed.
There’s the minor detail that simulating quantum mechanics takes time exponential in the number of particles, and the information needed to represent the entire universe can’t fit into that same universe and still leave room for anything else, but that doesn’t matter when you’re talking Turing machines.
It does matter a great deal when talking about what might lead to actual human-level intelligent machines existing in reality, though.
I'm not making a leap there at all. Assuming we agree the brain is unlikely to exceed the Turing computable, I explained the stepwise reasoning justifying it: Given Turing equivalence, and given that for each given UTM, there is a bigger UTM that can express programs in the simpler one in less space, and given that the brain is an existence-proof that a sufficiently compact UTM is possible, it is preposterous to think it would be impossible to construct an LLM architecture that allows expressing the same compactly enough. I suspect you assume a very specific architecture for an LLM, rather than consider that LLMs can be implemented in the form of any UTM.
Current architectures may very well not be sufficient, but that is an entirely different issue.
> and given that the brain is an existence-proof that a sufficiently compact UTM is possible
This is where it goes wrong. You’ve got the implication backwards. The existence of a program and a physical computer that can run it to produce a certain behavior is proof that such behavior can be done with a physical system. (After all, that computer and program are themselves a physical system.) But the existence of a physical system does not imply that there can be an actual physical computer that can run a program that replicates the behavior. If the laws of physics are computable (as they seem to be) then the existence of a system implies that there exists some Turing machine that can replicate the behavior, but this is “exists” in the mathematical sense, it’s very different from saying such a Turing machine could be constructed in this universe.
Forget about intelligence for a moment. Consider a glass of water. Can the behavior of a glass of water be predicted by a physical computer? That depends on what you consider to be “behavior.” The basic heat exchange can be reasonably approximated with a small program that would trivially run on a two-cent microcontroller. The motion of the fluid could be reasonably simulated with, say, 100-micron accuracy, on a computer you could buy today. 1-micron accuracy might be infeasible with current technology but is likely physically possible.
What if I want absolute fidelity? Thermodynamics and fluid mechanics are shortcuts that give you bulk behaviors. I want a full quantum mechanical simulation of every single fundamental particle in the glass, no shortcuts. This can definitely be computed with a Turing machine, and I’m confident that there’s no way it can come anywhere close to being computed on any actual physical manifestation of a Turing machine, given that the state of the art for such simulations is a handful of particles and the complexity is exponential in the number of particles.
And yet there obviously exists a physical system that can do this: the glass of water itself.
Things that are true or at least very likely: the brain exists, physics is probably computable, there exists (in the mathematical sense) a Turing machine that can emulate the brain.
Very much unproven and, as far as I can tell, no particular reason to believe they’re true: the brain can be emulated with a physical Turing-like computer, this computer is something humans could conceivably build at some point, the brain can be emulated with a neural network trained with gradient descent on a large corpus of token sequences, the brain can be emulated with such a network running on a computer humans could conceivably build. Talking about the computability of the human brain does nothing to demonstrate any of these.
I think non-biological machines with human-equivalent intelligence are likely to be physically possible. I think there’s a good chance that it will require specialized hardware that can’t be practically done with a standard “execute this sequence of simple instructions” computer. And if it can be done with a standard computer, I think there’s a very good chance that it can’t be done with LLMs.
That encoding a naive/basic UTM in an LLM would potentially be impractical is largely irrelevant in that case, because for any UTM you can "compress" the program by increasing the number of states or symbols, and effectively "embedding" the steps required to implement a more compact representation in the machine itself.
While it is possible using current LLM architectures might make encoding a model that can be efficient enough to be physically practical impossible, there's no reasonable basis for assuming this approach can not translate.