Just tried it out for a prod issue was experiencing. Claude never does this sort...

XCSme · 2026-04-24T19:56:44 1777060604

I feel like the last 2-3 generations of models (after gpt-5.3-codex) didn't really improve much, just changed stuff around and making different tradeoffs.

pixel_popping · 2026-04-24T19:59:42 1777060782

I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4

ericpauley · 2026-04-24T20:02:23 1777060943

Has that task accomplished anything yet?

codemog · 2026-04-24T20:11:00 1777061460

I think the OP is in for a rude surprise when the task is “finished”.

hagbard_c · 2026-04-24T20:53:58 1777064038

It will go somewhat like this:

“You're really not going to like it," observed Codex.

"Tell us!"

"All right, said Codex. "The answer to your Great Question..."

"Yes...!"

"Is..." said Codex, and paused.

"Yes...!"

"Is..."

"Yes...!!!...?"

"Forty-two," said Codex, with infinite majesty and calm.

pixel_popping · 2026-04-24T20:55:19 1777064119

I bet you've asked Codex for that joke :p

xp84 · 2026-04-24T20:06:29 1777061189

Too soon to tell, give it a billion tokens before we make up our minds

pixel_popping · 2026-04-24T20:19:16 1777061956

Oh boy, you are far from what it requires, we are probably talking 3B+, but note that this is just codex, obviously codex is also doing automatic adversarial with the regular zoo (gemini-3.1-pro-preview, opus-4.6/4.7, gpt-5.3-codex, minimax-2.7, glm-5.1, mimo-2 (now 2.5) and so-on, you get the gist) :)

fl4regun · 2026-04-24T20:35:20 1777062920

what is that task doing???

owebmaster · 2026-04-25T10:17:58 1777112278

Interesting is that they had the opportunity to explain but decided that hyping it more made more sense. 3 billion tokens!!1!

anonym00se1 · 2026-04-25T01:46:15 1777081575

The correction question is: what isn't that task doing?

SecretDreams · 2026-04-24T20:16:00 1777061760

Kept the OP employed for a full extra month at their high AI metric firm, hopefully.

pixel_popping · 2026-04-24T21:50:05 1777067405

Just making Jensen proud is all.

elAhmo · 2026-04-24T23:17:55 1777072675

It made Sam richer.

pixel_popping · 2026-04-25T15:05:37 1777129537

I don't know their margin so I can't really say, but do we have 8 OpenAI accounts, I doubt they are making that much with us seeing that there isn't a single hour where we don't saturate the accounts.

cheevly · 2026-04-25T17:18:57 1777137537

Wtf are you even talking about? Sam has zero stake in OpenAI.

elAhmo · 2026-04-25T19:17:14 1777144634

Of course he doesn't.

lowdude · 2026-04-24T20:21:23 1777062083

That’s actually crazy, what kind of task is that? And is that a recurring kind of task like some analysis, or coding related?

pixel_popping · 2026-04-24T20:43:08 1777063388

Coding (along with docs, tests obviously), rewriting a huge chunk of the KVM hypervisor (in Kernel 7, started in the -rc2) and KSM and other modules, can't say too much about it yet (might do an announcement in coming weeks). The coding is automated but the plan took days of manual arguing (with all models possible) prior (while doing other things during waiting times as I currently manage 70 repos for an upcoming release of our Beta).

I think users really underestimate the capabilities of "AI" when using the right tooling/combinations of models and procedures (and loops), that's talking with 2 decades of dev behind me, genuinely I'm not on phase with people saying it produces slop of any kind, at this stage, it's mostly the fault of the prompter (or the prompter not having enough tokens to do mass adversarial), but clearly, I can genuinely state that the code produced is overall the SAME quality as I would by being extremely meticulous.

I'm like a bot following 30+ threads concurrently, sometimes it's fun, sometimes it feels like playing casino, sometimes it's boring, but this is truly an insane era if you have the funding for it, obviously we stack many MANY accounts in rotation 24/7, equivalent in API cost by myself is about 100K$+ (a month) but we pay only a fraction of that cost thanks to the plans.

PS: I have 8 monitors in front of me to manage all that (portable monitors stacked together).

Urahandystar · 2026-04-24T21:06:01 1777064761

Please do an update when you're ready, this sounds like madness to me so I'd love to see what the output is. Whatever it is I have to know.

owebmaster · 2026-04-25T10:21:14 1777112474

Typical AI psychosis. They might notice it soon or stay in this condition for months.

pixel_popping · 2026-04-25T14:43:02 1777128182

I don't think you really grasp the direction the world is taking or even really understand AI capabilities when it's put together to reach high automation, you might not agree or embrace it yet, but you will be joining the loop wagon, soon enough.

owebmaster · 2026-04-25T19:19:50 1777144790

Yeah right. Sam Altman is as high as you on this drug, but you both are going to wake up soon.

pixel_popping · 2026-04-25T20:24:47 1777148687

can you explain further? Most especially, why do you see AI stopping anytime soon and not getting just insanely better and better for the next decades (that is through combination of models or models alone, harnesses or whatever, that's just a technicality)?

Why would I need to "wake up"?

owebmaster · 2026-04-26T03:18:21 1777173501

Is what you working public? Publish it and let us know how it goes.

pixel_popping · 2026-04-26T08:15:55 1777191355

Yes, we are already public and funded, I was just describing "one task" among thousand to be fair. Can you elaborate your point?

owebmaster · 2026-04-26T11:44:30 1777203870

Show us. I want to see how impressive it is the thing you're creating using AI.

pixel_popping · 2026-04-26T15:49:52 1777218592

you do realize that all tech companies are mostly running with AI nowadays? what kind of take is this?

AlexCoventry · 2026-04-25T00:25:49 1777076749

Is it hitting intermediate milestones with solid pre-written and human-reviewed acceptance tests? If not, sounds like a very risky commitment.

ericreg92 · 2026-04-24T22:26:17 1777069577

Please do a post about this (though I realize that takes time). This sounds amazing. I have always dreamed of doing this too but just don't have the budget.

stirfish · 2026-04-25T02:46:12 1777085172

Specifically, write a post about this and do not have Claude write a post about this.

7thpower · 2026-04-24T23:08:30 1777072110

I have yet to talk to someone who is taking this approach and doesn’t end up with a dumpster fire, but here is to hoping this time is different.

Hope it works and you post about it.

Culonavirus · 2026-04-25T00:16:03 1777076163

I hope it doesn't work and they don't post about it.

ziml77 · 2026-04-25T01:07:26 1777079246

It's just too bad the subsidized costs mean they won't actually feel any real punishment for their failure. Like normally time wasted on its own is enough of a punishment for making a poor decision, but they're not even doing anything themselves here!

PeterStuer · 2026-04-25T16:11:44 1777133504

I'm also in that boat of not understanding how people fail to get a huge productivity boost from GenAI. And it's not just novices but sometimes seriously accomplished coders. It can't be they're just typing 'Make me an ERP' and then go 'these thing are dumb slop machines' right?

jamwil · 2026-04-24T22:43:44 1777070624

I’m vague on a specific reason for this feeling because there are a few to choose from and no one overpowers the other, but the emotion that comes to mind when I read this is disgust. As a society I feel we will look back on the subsidized opulence of this moment with total and utter contempt.

owebmaster · 2026-04-25T10:22:42 1777112562

There's no opulence in spending tokens for entertainment. Vibecoding your own game is the new viral game.

deaux · 2026-04-25T08:35:39 1777106139

I know exactly the feeling you mean. I get a much stronger feeling of that when I talk with friends who frequently take a plane for a 250 mile trip which has a world-class comfortable high-speed train connection with very frequent trains, each taking less than 3 hours. I'm sure you have friends who would do this in this situation - do you feel the same disgust when you hear them talking about such choices?

I still haven't seen a single person who actually cares about the environment and has willingly made significant sacrifices for it, who clamors about the environmental cost of AI. Every time I see someone do it it's someone who never cared about this before, and still doesn't really. Who buys plenty of new clothes and furniture, loves a good burger, has the latest iPhone, flies 4 times per year.

Maybe you're the unicorn in which case fair enough, you've earned the right to feel disgusted.

holmesworcester · 2026-04-24T22:56:35 1777071395

Or nostalgia for simpler times

jamwil · 2026-04-24T23:12:56 1777072376

That as well. But everyone reading GP’s posts knows in their bones that it’s unsustainable. It’s economically unsustainable and environmentally unsustainable, and in that context it strikes me as pure hoarding behaviour. Taking as much as they can for themselves before the house of cards crashes down.

I have no sympathy for OpenAI or Anthropic as corporations, but if these are the new tools of the trade, then platform abuse like GP is bragging about serves only to destroy the livelihoods of the rest of us who are content to use our fair share.

There’s no such thing as a free lunch, and the bill always comes at the end.

ragequittah · 2026-04-25T17:08:48 1777136928

I mostly hate it because the token crunch is now coming for us regular users because of people like this. A few people always ruin it for the rest of us.

jamwil · 2026-04-25T17:19:05 1777137545

Yea. It’s greed, pure and simple. And also a major misstep on the part of the inference providers to offer these subsidized plans and not anticipate these slop mills.

owebmaster · 2026-04-25T10:20:11 1777112411

> (might do an announcement in coming weeks).

Don't be surprised if/when people ignore your AI slop

r_lee · 2026-04-24T20:23:42 1777062222

...what? what kind of a task are you running?

ninkendo · 2026-04-25T02:23:53 1777083833

Sorry if I’m not getting it, but what was wrong exactly? Is the issue that it merely put “-- put the query here” in the reply, instead of repeating it again?

If so, I’m not sure I’d even consider that a problem. If the goal is for it to give you a query to run, and you ask it “let’s do it in a transaction”, it’s a reasonable thing for it to simply inform you, “yeah you can just type begin first” since it’s assuming you’re going to be pasting the query in anyway. And yeah, it does use fewer tokens, assuming the query was long. Similar to how, if it gave me a command to run, and I say “I’m getting a permission denied”, it would be reasonable for it to say “yeah do it as root, put sudo before the command”, and it’s IMO reasonable if it didn’t repeat the whole thing verbatim just with the word “sudo” first.

But if the context was that you actually expected it to run the query for you, and instead it just said “here, you run it”, then yeah that’s lazy and I’d understand the shock.

endymi0n · 2026-04-24T20:23:20 1777062200

OpenAI is the first company that has reached a level of intelligence so high, the model has finally become smart enough to make YOU do all the work. Emergent behavior in action.

All earnesty aside, OpenAI’s oddly specific singular focus on “intelligence per token” (also in the benchmarks) that literally noone else pushes so hard eerily reminds me of Apple’s Macbook anorexia era pre-M1. One metric to chase at the cost of literally anything else. GPT-5.3+ are some of the smartest models out there and could be a pleasure to work with, if they weren’t lazy bastards to the point of being completely infuriating.

syspec · 2026-04-24T19:55:54 1777060554

Can't tell if above is good or bad.

wincy · 2026-04-25T02:18:49 1777083529

I mean, I was doing triage, so wanted an immediate fix. The actual issue is we’re getting some exploding complexity when double checking the action the API is taking is valid in the data. So that needs to be refactored. I suppose it reduces token usage, but Claude Opus will happily do exactly what I want it to.

hbn · 2026-04-24T20:24:07 1777062247

GPT-5.5 shatters benchmarks for amount of faith it puts in the user.