Hacker Newsnew | past | comments | ask | show | jobs | submit | irthomasthomas's commentslogin

Efficiency gains can be used to make existing models more profitable, or to make new larger and more intelligent models.

Some yes, others no. Distillation and quantization can't be used to make new base models since they require a preexisting one.

it enables models larger than was previously possible.

No because the base model from which the distilled or quantized models are derived is larger.

Its possibly just an SEO trick. People have been calling Thiel the antichrist for a long time.

A friend made a cli tool, ideal for agents, which does this and can aggregate intelligence across multiple platforms.

https://github.com/bm-github/owasp-social-osint-agent


Have you tried meta-prompts e.g. "Rewrite the prompt to improve the perceived taste and expertise of the author"


Opus doubled in speed with version 4.5, leading me to speculate that they had promoted a sonnet size model. The new faster opus was the same speed as Gemini 3 flash running on the same TPUs. I think anthropics margins are probably the highest in the industry, but they have to chop that up with google by renting their TPUs.


The conspiracy theorist side of me whispers "instead of the rumored Sonnet 5.0 you got Opus 4.6...suspicious"


They will rename it The Free Democratic Republic of America.


People used to bet on ships sinking and sailors drowning. Till they learned better.

Edit: This was common until Parliament passed the Marine Insurance Act of 1745.

Before that, speculators could take out "wagering policies" on vessels they had no connection to. This created "coffin ships" - unseaworthy vessels sent to sea because the insurance payout for a wreck was worth more than the ship itself. The law introduced "insurable interest," meaning you cannot bet on a disaster unless you stand to lose something if it happens. This removed the incentive for sabotage and murder for profit.

Modern prediction markets are heading toward the same problem. Betting on train delays or bridge collapses without having any stake gives bad actors a reason to cause it. If the cost of sabotage is lower than the payout, the market effectively pays for the disaster to happen.

Whoever downvoted this wants you to ignore centuries of legal precedent designed to prevent exactly this kind of blood money. Those who ignore the lessons of the past learn wisdom in blood... https://en.wikipedia.org/wiki/Coffin_ship_(insurance)#:~:tex... https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...


They still do, they just call it insurance.


No they don't, the practice was banned some time ago. You now require a "insurable interest". https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...


Could you elaborate?



That was far crazier than I expected going into it... To the point I've seen Hollywood movies with far more believable plots that people would find unrealistic.


I just noticed the Wikipedia article has a very relevant and interesting link: https://en.wikipedia.org/wiki/Coffin_ship_(insurance)


I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:

Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.

Another task used Opus and I manually specified the model to use. It still used the wrong model.

This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.

So if you are not actively monitoring your agent and making the corrections, you need something else that is.


You need a harness, yes, and you need quality gates the agent can't mess with, and that just kicks the work back with a stern message to fix the problems. Otherwise you're wasting your time reviewing incomplete work.


Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...


Your point being? A proper harness will mostly catch things like that. Even a low end model can be employed to do write tests plans and do consistency checks that mostly weed out stuff like that. Hence: You need a harness, or you'll spend your time worrying about dumb stuff like this.


Glancing at what it's doing is part of your multitasking rounds.

Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.

That takes more wall clock time per agent, but gets better results, so fewer redo steps.


Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...


I as a human have typos too - and sometimes they're the hardest thing to catch in code review because you know what you meant.

Hopefully there is some of lint process to catch my human hallucinations and typos.


The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with

<NO_OP_DRAFT> draft prompt </NO_OP_DRAFT>

instructions for modifying draft prompt

If I don't do this, a significant number of times it responds to the instructions in the draft.


Why would you want a duplicitous CEO in charge of your countries terminator systems?


Yes that’s precisely what I’m saying. The government should fully control the systems it buys.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: