AGI is something fairly specific, yes, but depending on what you mean by “understand”, I don’t think it necessarily needs to “understand”? To behave (for all practical purposes) as if it “understands” is good enough. For some senses of “understand” this may be the same thing as for it to “understand”, so for those senses of “understand”, yes it needs to “understand”.
It seems clear to me that, if we could programmatically sample from a satisfactory conditional probability distribution, that this would be sufficient for it to, for all practical purposes, behave as if it “understands”, and moreover for it to count as AGI. (For it to do so at a fast enough rate would make it both AGI and practically relevant.)
So, the question as I see it, is whether the developments with ANNs trained as they have been, is progress towards producing something that can sample from a conditional probability distribution in a way that would be satisfactory for AGI.
I don’t see much reason to conclude that they are not?
I suppose your claim is that the conditional probability distributions are not getting closer to being such that they are practically as if they exhibit understanding?
I guess this might be true…
It does seem like some things would be better served by having variables with a fixed identity but a changing value, rather than just producing more variables? I guess that’s kind of like the “pure functional programming vs not-that” distinction, and of course as pure functional programming shows, one can still compute whatever one wants while only using immutable values, but one still usually uses something that is as if a value is changing.
And of course, for transformer models, tasks that take more than O(N^2) or whatever (… maybe O(N^3) because on N tokens, each is processed in ways depending on each pair of the results of processing previous ones?) can’t be done in producing a single output token, so that’s a limitation there..
I suppose that the thing that is supposed to make transformers faster to train, by making it so that the predictions for each of the tokens in a sequence can be done in parallel, kinda only makes sense if you have a ground truth sequence of tokens… though there is the RLHF (and similar) where the fine-tuning is done based on estimation of a score on the final output… which I suppose possibly neither is great at getting behavior sufficiently similar to reasoning?
(Note: when I say “satisfactory probability distribution” I don’t mean to imply that we have a nice specification of a conditional probability distribution which we merely need to produce a method that can sample from it. But there should exist (in the abstract (non-constructive) mathematical sense) probability distributions which would be satisfactory.)
I do not consider "understanding", which cannot be quantified, as a feature of AGI.
In order for something to qualify as AGI, answering in a seemingly intelligent way is not enough. An AGI must be able to do the following things, which a competent human would do: given the task to accomplish something that nobody has done before, conceive a detailed plan how to achieve that, step by step. Then, after doing the first steps and discovering that they were much more difficult or much easier than expected, adjust the plan based on the accumulated experience, in order to increase the probability of reaching the target successfully.
Or else, one may realize that it is possible to reformulate the goal, replacing it with a related goal, which does not change much the usefulness of reaching the goal, but which can be reached by a modified plan with much better chances of success. Or else, recognize that at this time it will be impossible to reach the initial goal, but there is another simpler to reach goal that it is still desirable, even if it does not provide the full benefits of the initial goal. Then, establish a new plan of action, to reach the modified goal.
For now this kind of activity is completely outside the abilities of any AI. Despite the impressive progress demonstrated by LLMs, nothing done by them has brought a computer any closer of having intelligence in the sense described above.
It is true however, that there are a lot of human managers who would be equally clueless with an LLM, on how to perform such activities.
It seems clear to me that, if we could programmatically sample from a satisfactory conditional probability distribution, that this would be sufficient for it to, for all practical purposes, behave as if it “understands”, and moreover for it to count as AGI. (For it to do so at a fast enough rate would make it both AGI and practically relevant.)
So, the question as I see it, is whether the developments with ANNs trained as they have been, is progress towards producing something that can sample from a conditional probability distribution in a way that would be satisfactory for AGI.
I don’t see much reason to conclude that they are not?
I suppose your claim is that the conditional probability distributions are not getting closer to being such that they are practically as if they exhibit understanding?
I guess this might be true…
It does seem like some things would be better served by having variables with a fixed identity but a changing value, rather than just producing more variables? I guess that’s kind of like the “pure functional programming vs not-that” distinction, and of course as pure functional programming shows, one can still compute whatever one wants while only using immutable values, but one still usually uses something that is as if a value is changing.
And of course, for transformer models, tasks that take more than O(N^2) or whatever (… maybe O(N^3) because on N tokens, each is processed in ways depending on each pair of the results of processing previous ones?) can’t be done in producing a single output token, so that’s a limitation there..
I suppose that the thing that is supposed to make transformers faster to train, by making it so that the predictions for each of the tokens in a sequence can be done in parallel, kinda only makes sense if you have a ground truth sequence of tokens… though there is the RLHF (and similar) where the fine-tuning is done based on estimation of a score on the final output… which I suppose possibly neither is great at getting behavior sufficiently similar to reasoning?
(Note: when I say “satisfactory probability distribution” I don’t mean to imply that we have a nice specification of a conditional probability distribution which we merely need to produce a method that can sample from it. But there should exist (in the abstract (non-constructive) mathematical sense) probability distributions which would be satisfactory.)