Arguably I would think that the last year was mainly inner harness improvement instead model improvement but I could be wrong, just feels like that to me
I mean that just the way it tackles task in the core is generated differently, like inner harness, through system prompt or deeper root. F.e. Instead of answering instantly it goes through a pre-defined steps which strategy should be done, split task, use thinking tokens, use tools etc.