Opus doubled in speed with version 4.5, leading me to speculate that they had promoted a sonnet size model. The new faster opus was the same speed as Gemini 3 flash running on the same TPUs. I think anthropics margins are probably the highest in the industry, but they have to chop that up with google by renting their TPUs.
People used to bet on ships sinking and sailors drowning.
Till they learned better.
Edit:
This was common until Parliament passed the Marine Insurance Act of 1745.
Before that, speculators could take out "wagering policies" on vessels they had no connection to. This created "coffin ships" - unseaworthy vessels sent to sea because the insurance payout for a wreck was worth more than the ship itself. The law introduced "insurable interest," meaning you cannot bet on a disaster unless you stand to lose something if it happens. This removed the incentive for sabotage and murder for profit.
Modern prediction markets are heading toward the same problem. Betting on train delays or bridge collapses without having any stake gives bad actors a reason to cause it. If the cost of sabotage is lower than the payout, the market effectively pays for the disaster to happen.
That was far crazier than I expected going into it... To the point I've seen Hollywood movies with far more believable plots that people would find unrealistic.
I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:
Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.
Another task used Opus and I manually specified the model to use. It still used the wrong model.
This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.
So if you are not actively monitoring your agent and making the corrections, you need something else that is.
You need a harness, yes, and you need quality gates the agent can't mess with, and that just kicks the work back with a stern message to fix the problems. Otherwise you're wasting your time reviewing incomplete work.
Your point being? A proper harness will mostly catch things like that. Even a low end model can be employed to do write tests plans and do consistency checks that mostly weed out stuff like that. Hence: You need a harness, or you'll spend your time worrying about dumb stuff like this.
Glancing at what it's doing is part of your multitasking rounds.
Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.
That takes more wall clock time per agent, but gets better results, so fewer redo steps.
The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with
<NO_OP_DRAFT>
draft prompt
</NO_OP_DRAFT>
instructions for modifying draft prompt
If I don't do this, a significant number of times it responds to the instructions in the draft.
reply