I'm reminded of Rich Sutton's essay, "The Bitter Lesson:"
Moore's law is running on fumes at this point. The complexity of further scaling has reached geopolitical proportions. We need to get back to looking at more creative models in both the software and hardware domains.
the model achieves competitive results on many NLP tasks and benchmarks without finetuning
The article dismisses this result with the following analogy:
“No, my 10-year-old math prodigy hasn’t proven any new theorems, but she can get a perfect score on the math SAT in under 10 minutes. Isn’t that groundbreaking?”
And I tend to agree. We've had a game of benchmarking brinkmanship for a while now. At what point are we going to see some groundbreaking applications?
> We need to get back to looking at more creative models in both the software and hardware domains.
That would be pretty foolish, given the fact that every hand crafted model eventually gets surpassed with brute force. A better use of time would be tackling whatever you mean by "complexity of further scaling has reached geopolitical proportions". I'm not a fan of it, as it is terribly inelegant, but denying the years of consistent brute force wins would just be silly.
The best strategy for any nation with an interest in AI (be it economic or something much more skynety) would be securing two things very quickly: fabrication capacity and nuclear power, because this stuff is going to be measure megawatts - not ANN layers. Improving the efficiency of that conversion would certainly be helpful, but history has shown that to be a lower priority, just take a look at how ridiculously deep software stacks are compared to 20 years ago. I really wish the linguists had been proven right in the 1970s...
That would be pretty foolish, given the fact that every hand crafted model
Who said anything about hand crafted AI models? I’m talking about revisiting our models of computation. Moore’s law has long made it impossible to challenge the dominance of Von Neumann. Perhaps what we need to make further progress is some sort of decentralized, busless computer? Who knows?
If that was the first time you'd seen mention of that essay, then you'd be forgiven for not knowing that always precedes discussion of hand crafting vs brute force. I've never seen it be answered with a call for exotic computation, so my mistake. I've seen plenty of energy requirement estimates related to defeating various cryptographic algorithms, using spherical cows, etc. You'll need a lot more than an architectural change to make a noticeable dent, you'll need the discovery and industrialization of new physics - not the sort of thing you want to hang your hopes on.
I never said otherwise. Like physicists, we need both new theory (for new insights, ideas, models, etc.) and new experiments (for replication, performance, scalability, etc.)
Also, Moore's Law is running on fumes, yes, but there's quite a bit of R&D focused on coming up with hardware that massively scales up (e.g., by more efficiently parallelizing) the dense and sparse multiply-sum operations common to so many AI models. I think Sutton's point about models that leverage computation is spot-on.
Moore's law is running on fumes at this point. The complexity of further scaling has reached geopolitical proportions. We need to get back to looking at more creative models in both the software and hardware domains.
the model achieves competitive results on many NLP tasks and benchmarks without finetuning
The article dismisses this result with the following analogy:
“No, my 10-year-old math prodigy hasn’t proven any new theorems, but she can get a perfect score on the math SAT in under 10 minutes. Isn’t that groundbreaking?”
And I tend to agree. We've had a game of benchmarking brinkmanship for a while now. At what point are we going to see some groundbreaking applications?