I have a similar opinion of Claude Sonnet. Superhuman knowledge; ability to apply it to solve new math/coding problems at roughly the level of a motivated high-schooler (but not corresponding exactly in profile to anything human); less ability to stay on track the longer the effort takes.
But ChatGPT a couple years ago was at more like grade-school level at problem-solving. What should I call this thing that the best LLMs can do better than the older ones, if it's not actual reasoning? Sparkling syllogistics?
Sorry, that's sarcastic, but... it's from a real exasperation at what seems like a rearguard fight against an inconvenient conclusion. I don't like it either! I think the rate of progress at building machines we don't understand is dangerous. (Understanding the training is not understanding the machinery that comes out.)
Compare the first previews of Copilot with current frontier "reasoning" models, and ask how this will develop in the next five years. Maybe it'll fizzle. If you're very confident it will: I'd like to be convinced too.
you said you said it sarcastically but I like "syllogistic" a lot. We need more volcabulary to describe what LLMs do, and if I tell ChatGPT A implies B implies C, and I tell it A is true, and I can describe that as the LLM syllogisting and not use the words "reasoning" or "thinking", that works for me.
As far as if it will fizzle, even if it does, what we have currently is already useful. Society will take time to adjust to ChatGPT-4's level of capabilities, nevermind whatever OpenAI et al releases next. It can't yet replace a software engineer, but it makes projects possible they previously weren't attempted because they required too much investment previously. So unless you're financially exposed to AI directly (which you might be, many people are!), the question of if it's going to fizzle is more academic than something that demands a rigorous answer. Proofs of a negative are really hard. Reusable rockets were "proven" to be impossible right up until it was empirically proven possible.
I don't see Tao claiming ChatGPT proved a theorem. Moreover most questions seemed to be about something already talked about online, so it seems plausible that it was included in the training data. This is IMO a big issue with evaluating LLMs, you can't keep asking the same questions because you can't be sure they will eventually answer by memory or actually reason.