More

irthomasthomas · 2026-03-25T12:55:18 1774443318

Efficiency gains can be used to make existing models more profitable, or to make new larger and more intelligent models.

cubefox · 2026-03-25T14:24:27 1774448667

Some yes, others no. Distillation and quantization can't be used to make new base models since they require a preexisting one.

irthomasthomas · 2026-03-25T17:58:09 1774461489

it enables models larger than was previously possible.

cubefox · 2026-03-25T18:07:52 1774462072

No because the base model from which the distilled or quantized models are derived is larger.

irthomasthomas · 2026-03-22T14:55:40 1774191340

Its possibly just an SEO trick. People have been calling Thiel the antichrist for a long time.

irthomasthomas · 2026-03-22T10:57:57 1774177077

A friend made a cli tool, ideal for agents, which does this and can aggregate intelligence across multiple platforms.

https://github.com/bm-github/owasp-social-osint-agent

irthomasthomas · 2026-03-12T15:45:41 1773330341

Have you tried meta-prompts e.g. "Rewrite the prompt to improve the perceived taste and expertise of the author"

irthomasthomas · 2026-03-10T10:50:05 1773139805

Opus doubled in speed with version 4.5, leading me to speculate that they had promoted a sonnet size model. The new faster opus was the same speed as Gemini 3 flash running on the same TPUs. I think anthropics margins are probably the highest in the industry, but they have to chop that up with google by renting their TPUs.

F7F7F7 · 2026-03-10T15:08:50 1773155330

The conspiracy theorist side of me whispers "instead of the rumored Sonnet 5.0 you got Opus 4.6...suspicious"

irthomasthomas · 2026-03-05T22:10:53 1772748653

They will rename it The Free Democratic Republic of America.

irthomasthomas · 2026-03-04T11:57:46 1772625466

People used to bet on ships sinking and sailors drowning. Till they learned better.

Edit: This was common until Parliament passed the Marine Insurance Act of 1745.

Before that, speculators could take out "wagering policies" on vessels they had no connection to. This created "coffin ships" - unseaworthy vessels sent to sea because the insurance payout for a wreck was worth more than the ship itself. The law introduced "insurable interest," meaning you cannot bet on a disaster unless you stand to lose something if it happens. This removed the incentive for sabotage and murder for profit.

Modern prediction markets are heading toward the same problem. Betting on train delays or bridge collapses without having any stake gives bad actors a reason to cause it. If the cost of sabotage is lower than the payout, the market effectively pays for the disaster to happen.

Whoever downvoted this wants you to ignore centuries of legal precedent designed to prevent exactly this kind of blood money. Those who ignore the lessons of the past learn wisdom in blood... https://en.wikipedia.org/wiki/Coffin_ship_(insurance)#:~:tex... https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...

cowsandmilk · 2026-03-04T12:05:45 1772625945

They still do, they just call it insurance.

irthomasthomas · 2026-03-04T12:24:20 1772627060

No they don't, the practice was banned some time ago. You now require a "insurable interest". https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...

baxtr · 2026-03-04T12:00:04 1772625604

Could you elaborate?

brazzy · 2026-03-04T12:06:41 1772626001

https://en.wikipedia.org/wiki/Lucona kinda fits...

vidarh · 2026-03-04T12:30:50 1772627450

That was far crazier than I expected going into it... To the point I've seen Hollywood movies with far more believable plots that people would find unrealistic.

brazzy · 2026-03-04T14:25:47 1772634347

I just noticed the Wikipedia article has a very relevant and interesting link: https://en.wikipedia.org/wiki/Coffin_ship_(insurance)

irthomasthomas · 2026-03-04T11:50:00 1772625000

I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:

Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.

Another task used Opus and I manually specified the model to use. It still used the wrong model.

This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.

So if you are not actively monitoring your agent and making the corrections, you need something else that is.

vidarh · 2026-03-04T11:51:27 1772625087

You need a harness, yes, and you need quality gates the agent can't mess with, and that just kicks the work back with a stern message to fix the problems. Otherwise you're wasting your time reviewing incomplete work.

irthomasthomas · 2026-03-04T12:43:56 1772628236

Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

vidarh · 2026-03-06T15:28:38 1772810918

Your point being? A proper harness will mostly catch things like that. Even a low end model can be employed to do write tests plans and do consistency checks that mostly weed out stuff like that. Hence: You need a harness, or you'll spend your time worrying about dumb stuff like this.

jmalicki · 2026-03-04T11:55:17 1772625317

Glancing at what it's doing is part of your multitasking rounds.

Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.

That takes more wall clock time per agent, but gets better results, so fewer redo steps.

irthomasthomas · 2026-03-04T12:44:50 1772628290

Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

jmalicki · 2026-03-04T13:04:51 1772629491

I as a human have typos too - and sometimes they're the hardest thing to catch in code review because you know what you meant.

Hopefully there is some of lint process to catch my human hallucinations and typos.

irthomasthomas · 2026-03-02T10:40:20 1772448020

The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with

<NO_OP_DRAFT> draft prompt </NO_OP_DRAFT>

instructions for modifying draft prompt

If I don't do this, a significant number of times it responds to the instructions in the draft.

irthomasthomas · 2026-03-01T13:33:50 1772372030

Why would you want a duplicitous CEO in charge of your countries terminator systems?

eduction · 2026-03-01T21:29:42 1772400582

Yes that’s precisely what I’m saying. The government should fully control the systems it buys.