The author seems hung up on the idea that because NLU involves apparent "discontinuities" -- places where small variations in interpretation completely transform tasks -- it will not be amenable to smooth, continuous notions like PAC learning, compressibility, and so on. While there's a directional insight there, the terms aren't well defined. And the big story of ML-based NLP in the last decade has been that many tasks that were presumed too jagged for curve fitting are in fact tractable, given large data sets and clever shifts of the way discontinuities can be modeled (e.g. attention-based techniques like Transformers).
Finally, since humans don't perform perfectly at these contrived tasks either, we must ask acknowledge that there is some degree of "approximate correctness" that satisfies our ideas of intelligence.
And the big story of ML-based NLP in the last decade has been that many tasks that were presumed too jagged for curve fitting are in fact tractable...
The thing is, I see many situations where large language models have been "impressive" but few situation where they have clearly succeeded in the real world. I'm less than impressed with online translation, Google's AI search is annoying, GPT-3 authored articles are impressive but often senseless, voice interfaces to various corporations are a disaster, etc.
It seems like the main "tractable task" is doing well on the benchmarks themselves. That's not denying there's progress here, it just seems like the "jagged" aspects of language might still be a hurdle.
It's not just benchmark tasks, but I think perceptions of progress get skewed because we're in an uncanny period where NLP is good enough for many industrial applications but unsatisfying under detailed scrutiny.
- Online translation? Not for a literary piece, but it will still let your e-commerce site get the gist of a customer complaint.
- Authored text? Not good enough for direct consumption, but pass it through one intern and you get a much faster rate of e.g. satisfactory social media responses.
- Frustrating phone or bot interface? Average customer spends 10% more time, but the company saves 50% of its costs.
Most of these applications transfer some burden downstream, but not all of it... so it is having big impact on the information supply chain. I don't expect those applications too be exciting to many people here, and especially to AGI acolytes, but lots of technologies have gone through this maturity curve: (1) solve toy problems, (2) solve lame but valuable problems, (3) do interesting "real" things.
And in a few places, NLP is moving on to (3). Tools like Grammarly are actually a better experience than most human editing loops. I would also put NLP-backed search in this category -- anyone who Googles is having a much better experience because of modern NLP, without even needing to be aware of it.
Online translation? Not for a literary piece, but it will still let your e-commerce site get the gist of a customer complaint.
-- Uh, in my experience with FB translation, it gives a coherent "gist" 70% of the time. Which sounds good except that seems involve 15% gibberish and 10% wrong in the sense of a plausible but incorrect meaning. How can a company act on a consumer complaint if there's a significant chance what you're reading is totally off base? If "your product is too small" gets translated to "your product is too large" etc. Of course, a lot of companies ignore complaint or send form letters. Their approach wouldn't be impacted. Broadly, a lot of companies produce streams of barely meaningful vacuous bs - that really does serve some percentage of their purposes. This technology may allow the creation of this sort of thing in a more effective manner but I would claim that most of the cost of this stuff already editing it to avoid saying things that can cause real problems and so even here, the savings may be less you'd think.
Frustrating phone or bot interface? Average customer spends 10% more time, but the company saves 50% of its costs.
-- The phone robots have been around for a while. The distinction really is between number pad robots and voice recognition robots. The primary advantage of the voice recognition robots is more choices and primary disadvantage is sometimes they just don't work at all whereas the number pad robots are fairly robust.
lots of technologies have gone through this maturity curve: (1) solve toy problems, (2) solve lame but valuable problems, (3) do interesting "real" things.
-- And lots of other technologies have stopped somewhere along the way.
My point, getting back to OP, is mostly around the corpus-based approach. I wouldn't deny that there's not progress here. But I'd agree with the OP that there isn't fundamental progress. A lot of what happens is this approach is much cheaper. You turn a huge amount of data into application using a small team and a bunch of compute where previously, you'd have had to have hired many people for an equivalent. But equivalents existed previously and even had their advantages. Which isn't to say previous methods can come back - the cheapness of a brute-force solution isn't going to go away. But I would say fundamental progress needs more than this.
Indeed, translating from the #1 language to the #4 language (or back) is problematic. Google's translation is usually enough to approximate what was said, but often results in gibberish. As a small example, it consistently fails with the spanish pronoun "su" - which is a word that depending on context can mean "his", "hers", "its", "your", "y'all's", "their" -- which seems to be a good example of the compressibility of natural language. How big does the corpus used to train their models need to be to get "su" right?
Finally, since humans don't perform perfectly at these contrived tasks either, we must ask acknowledge that there is some degree of "approximate correctness" that satisfies our ideas of intelligence.