The brute force approach is the expensive one ("probably, so far anyway" add the...

The brute force approach is the expensive one ("probably, so far anyway" add these 4 words everywhere) - and impossible to make "always correct" aka "usual blindsides". They seem to be trying a bunch of specialized training ideas here and there in the system - just like a Mixture of Experts does, in different places than was obvious so far, and with an eye toward reasonning. In particular trying to build a reasonning-oriented training base from minimal seed.

It's still not going to give an "always correct" result but we are nowhere near the point where that's needed. We are only at the point where a new idea can get you a percentage point further in benchmarks. Some fundamental limits were baked into the previous assumptions - easy to get past by leaving these assumptions behind.