Chess is essentially a puzzle. There's a single explicit, quantifiable goal, and a solution either achieves the goal or it doesn't.
Solving puzzles is a specific cognitive task, not a general one.
Language is a continuum, not a puzzle. The problem with LLMs is that testing has been reduced to performance on language puzzles, mostly with hard edges - like bar exams, or letter counting - and they're a small subset of general language use.
Solving puzzles is a specific cognitive task, not a general one.
Language is a continuum, not a puzzle. The problem with LLMs is that testing has been reduced to performance on language puzzles, mostly with hard edges - like bar exams, or letter counting - and they're a small subset of general language use.