When I saw these numbers back in the initial o3-ARC post, I immediately converted them into "$ per ARC-AGI-1 %" and concluded we may be at a point where each increased increment of 'real human-like novel reasoning' gets exponentially more compute costly.
If Mike Knoop is correct, maybe R1 is pointing the way toward more efficient approaches. That would certainly be a good thing. This whole DeepSeek release and the reactions have shown by limiting the export to China of high-end GPUs, the US incentivized China to figure out how to make low-end GPUs work really well. The more subtle meta-lesson here is that the massive flood of investment capital being shoved toward leading edge AI companies has fostered a drag race mentality which prioritized winning top-line performance far above efficiency, costs, etc.
$3.4K is about what you might pay a magic circle lawyer for an opinion on a matter. Not saying o3 is an efficient use of resources, just saying that it’s not outlandish that a sufficiently good AI could be worth that kind of money.
You pay that price to a law firm to get good service and to get a "guarantee" of correctness. You get neither from an LLM. Not saying it is not worth anything but you cant compare it to a top law firm.
o3 (low) 75.7% 335K $20
o3 (high) 87.5% 57M $3.4K