My emerging conception of this is to split this into two separate questions:
1. Is the architecture _capable_, i.e. is it possible for a model with a given shape possible to perform some "reasoning"
2. Is the architecture _trainable_, i.e. do we have the means to learn a configuration of the parameters that achieves what we know they are capable of.
Recent interpretability work like that around Induction Heads [1] or the conclusion that transformers are Turing complete [2] combined with my own work to hand-specify transformer weights to do symbolic multi-digit addition (read: the same way we do it in grade school) has convinced me that reasoning over a finite domain is a capability of the even tiniest models.
The emergent properties we see in models like GPT4 are more a consequence of the fact that we've found a way to train a fairly efficient representation of a significant fraction of "world rules" into a large number of parameters in a finite amount of time.
That's a useful breakdown in terms of how to think about it.
One angle I am curious about is whether it's to some extent an artefact of how you regularise the model as much as the number of parameters and other factors.
You can think about it in terms of, if you regularise it enough then you force the network instead of fitting specific data points, to actually start learning logic internally because that is the only thing generalisable enough to allow it to produce realistic text for such a diverse range of prompts. You have to have enough parameters that this is even possible, but once you do, the right training / regularisation essentially starts to inevitably force it into that approach rather than the more direct nearest-neighber style "produce something similar to what someone said once before" mechanism.
1. Is the architecture _capable_, i.e. is it possible for a model with a given shape possible to perform some "reasoning"
2. Is the architecture _trainable_, i.e. do we have the means to learn a configuration of the parameters that achieves what we know they are capable of.
Recent interpretability work like that around Induction Heads [1] or the conclusion that transformers are Turing complete [2] combined with my own work to hand-specify transformer weights to do symbolic multi-digit addition (read: the same way we do it in grade school) has convinced me that reasoning over a finite domain is a capability of the even tiniest models.
The emergent properties we see in models like GPT4 are more a consequence of the fact that we've found a way to train a fairly efficient representation of a significant fraction of "world rules" into a large number of parameters in a finite amount of time.
[1] https://transformer-circuits.pub/2021/framework/index.html
[2] https://jmlr.org/papers/volume22/20-302/20-302.pdf