Everyone wants to use less compute to fit more in, but (obviously?) the solution...

falcor84 · on Oct 4, 2024

Would you mind expanding upon your thesis? If that compute and all those parameters aren't "fitting" the training examples, what is it that the model is learning, and how should that be analyzed?

ithkuil · on Oct 4, 2024

I think there are two distinct areas. One is the building of the representations, which is achieved by fitting. The other area is loosely defined as "computing" which is some kind of searching for a path through representation space. All of that is wrapped in a translation layer that can turn those representations into stuff we humans can understand and interact with. All of that is achieved to some extent by current transformer architectures, but I guess some believe that they are not quite as effective at the "computation/search" stage.

falcor84 · on Oct 4, 2024

But how does it get good at "computing"? The way I see it, we either program them to do so manually, or we use ML, at which case the model "fits" the computation based on training examples or environmental feedback, no? What am I missing?

ithkuil · on Oct 4, 2024

the distinction is fuzzy indeed, especially if any thing that you "program in manually" has some parameters that are learned.

Conceptually we already have parts of the model that are not learned: the architecture of the model itself.