Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just tried it out for a prod issue was experiencing. Claude never does this sort of thing, I had it write an update statement after doing some troubleshooting, and I said “okay let’s write this in a transaction with a rollback” and GPT-5.5 gave me the old “okay,

BEGIN TRAN;

-- put the query here

commit;

I feel like I haven’t had to prod a model to actually do what I told it to in awhile so that was a shock. I guess that it does use fewer tokens that way, just annoying when I’m paying for the “cutting edge” model to have it be lazy on me like that.

This is in Cursor the model popped up and so I tried it out from the model selector.



I feel like the last 2-3 generations of models (after gpt-5.3-codex) didn't really improve much, just changed stuff around and making different tradeoffs.


I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4


Has that task accomplished anything yet?


I think the OP is in for a rude surprise when the task is “finished”.


It will go somewhat like this:

“You're really not going to like it," observed Codex.

"Tell us!"

"All right, said Codex. "The answer to your Great Question..."

"Yes...!"

"Is..." said Codex, and paused.

"Yes...!"

"Is..."

"Yes...!!!...?"

"Forty-two," said Codex, with infinite majesty and calm.


I bet you've asked Codex for that joke :p


Too soon to tell, give it a billion tokens before we make up our minds


Oh boy, you are far from what it requires, we are probably talking 3B+, but note that this is just codex, obviously codex is also doing automatic adversarial with the regular zoo (gemini-3.1-pro-preview, opus-4.6/4.7, gpt-5.3-codex, minimax-2.7, glm-5.1, mimo-2 (now 2.5) and so-on, you get the gist) :)


what is that task doing???


Interesting is that they had the opportunity to explain but decided that hyping it more made more sense. 3 billion tokens!!1!


The correction question is: what isn't that task doing?


Kept the OP employed for a full extra month at their high AI metric firm, hopefully.


Just making Jensen proud is all.


It made Sam richer.


I don't know their margin so I can't really say, but do we have 8 OpenAI accounts, I doubt they are making that much with us seeing that there isn't a single hour where we don't saturate the accounts.


Wtf are you even talking about? Sam has zero stake in OpenAI.


Of course he doesn't.


That’s actually crazy, what kind of task is that? And is that a recurring kind of task like some analysis, or coding related?


Coding (along with docs, tests obviously), rewriting a huge chunk of the KVM hypervisor (in Kernel 7, started in the -rc2) and KSM and other modules, can't say too much about it yet (might do an announcement in coming weeks). The coding is automated but the plan took days of manual arguing (with all models possible) prior (while doing other things during waiting times as I currently manage 70 repos for an upcoming release of our Beta).

I think users really underestimate the capabilities of "AI" when using the right tooling/combinations of models and procedures (and loops), that's talking with 2 decades of dev behind me, genuinely I'm not on phase with people saying it produces slop of any kind, at this stage, it's mostly the fault of the prompter (or the prompter not having enough tokens to do mass adversarial), but clearly, I can genuinely state that the code produced is overall the SAME quality as I would by being extremely meticulous.

I'm like a bot following 30+ threads concurrently, sometimes it's fun, sometimes it feels like playing casino, sometimes it's boring, but this is truly an insane era if you have the funding for it, obviously we stack many MANY accounts in rotation 24/7, equivalent in API cost by myself is about 100K$+ (a month) but we pay only a fraction of that cost thanks to the plans.

PS: I have 8 monitors in front of me to manage all that (portable monitors stacked together).


Please do an update when you're ready, this sounds like madness to me so I'd love to see what the output is. Whatever it is I have to know.


Typical AI psychosis. They might notice it soon or stay in this condition for months.


I don't think you really grasp the direction the world is taking or even really understand AI capabilities when it's put together to reach high automation, you might not agree or embrace it yet, but you will be joining the loop wagon, soon enough.


Yeah right. Sam Altman is as high as you on this drug, but you both are going to wake up soon.


can you explain further? Most especially, why do you see AI stopping anytime soon and not getting just insanely better and better for the next decades (that is through combination of models or models alone, harnesses or whatever, that's just a technicality)?

Why would I need to "wake up"?


Is what you working public? Publish it and let us know how it goes.


Yes, we are already public and funded, I was just describing "one task" among thousand to be fair. Can you elaborate your point?


Show us. I want to see how impressive it is the thing you're creating using AI.


you do realize that all tech companies are mostly running with AI nowadays? what kind of take is this?


Is it hitting intermediate milestones with solid pre-written and human-reviewed acceptance tests? If not, sounds like a very risky commitment.


Please do a post about this (though I realize that takes time). This sounds amazing. I have always dreamed of doing this too but just don't have the budget.


Specifically, write a post about this and do not have Claude write a post about this.


I have yet to talk to someone who is taking this approach and doesn’t end up with a dumpster fire, but here is to hoping this time is different.

Hope it works and you post about it.


I hope it doesn't work and they don't post about it.


It's just too bad the subsidized costs mean they won't actually feel any real punishment for their failure. Like normally time wasted on its own is enough of a punishment for making a poor decision, but they're not even doing anything themselves here!


I'm also in that boat of not understanding how people fail to get a huge productivity boost from GenAI. And it's not just novices but sometimes seriously accomplished coders. It can't be they're just typing 'Make me an ERP' and then go 'these thing are dumb slop machines' right?


I’m vague on a specific reason for this feeling because there are a few to choose from and no one overpowers the other, but the emotion that comes to mind when I read this is disgust. As a society I feel we will look back on the subsidized opulence of this moment with total and utter contempt.


There's no opulence in spending tokens for entertainment. Vibecoding your own game is the new viral game.


I know exactly the feeling you mean. I get a much stronger feeling of that when I talk with friends who frequently take a plane for a 250 mile trip which has a world-class comfortable high-speed train connection with very frequent trains, each taking less than 3 hours. I'm sure you have friends who would do this in this situation - do you feel the same disgust when you hear them talking about such choices?

I still haven't seen a single person who actually cares about the environment and has willingly made significant sacrifices for it, who clamors about the environmental cost of AI. Every time I see someone do it it's someone who never cared about this before, and still doesn't really. Who buys plenty of new clothes and furniture, loves a good burger, has the latest iPhone, flies 4 times per year.

Maybe you're the unicorn in which case fair enough, you've earned the right to feel disgusted.


Or nostalgia for simpler times


That as well. But everyone reading GP’s posts knows in their bones that it’s unsustainable. It’s economically unsustainable and environmentally unsustainable, and in that context it strikes me as pure hoarding behaviour. Taking as much as they can for themselves before the house of cards crashes down.

I have no sympathy for OpenAI or Anthropic as corporations, but if these are the new tools of the trade, then platform abuse like GP is bragging about serves only to destroy the livelihoods of the rest of us who are content to use our fair share.

There’s no such thing as a free lunch, and the bill always comes at the end.


I mostly hate it because the token crunch is now coming for us regular users because of people like this. A few people always ruin it for the rest of us.


Yea. It’s greed, pure and simple. And also a major misstep on the part of the inference providers to offer these subsidized plans and not anticipate these slop mills.


> (might do an announcement in coming weeks).

Don't be surprised if/when people ignore your AI slop


...what? what kind of a task are you running?


Sorry if I’m not getting it, but what was wrong exactly? Is the issue that it merely put “-- put the query here” in the reply, instead of repeating it again?

If so, I’m not sure I’d even consider that a problem. If the goal is for it to give you a query to run, and you ask it “let’s do it in a transaction”, it’s a reasonable thing for it to simply inform you, “yeah you can just type begin first” since it’s assuming you’re going to be pasting the query in anyway. And yeah, it does use fewer tokens, assuming the query was long. Similar to how, if it gave me a command to run, and I say “I’m getting a permission denied”, it would be reasonable for it to say “yeah do it as root, put sudo before the command”, and it’s IMO reasonable if it didn’t repeat the whole thing verbatim just with the word “sudo” first.

But if the context was that you actually expected it to run the query for you, and instead it just said “here, you run it”, then yeah that’s lazy and I’d understand the shock.


OpenAI is the first company that has reached a level of intelligence so high, the model has finally become smart enough to make YOU do all the work. Emergent behavior in action.

All earnesty aside, OpenAI’s oddly specific singular focus on “intelligence per token” (also in the benchmarks) that literally noone else pushes so hard eerily reminds me of Apple’s Macbook anorexia era pre-M1. One metric to chase at the cost of literally anything else. GPT-5.3+ are some of the smartest models out there and could be a pleasure to work with, if they weren’t lazy bastards to the point of being completely infuriating.


Can't tell if above is good or bad.


I mean, I was doing triage, so wanted an immediate fix. The actual issue is we’re getting some exploding complexity when double checking the action the API is taking is valid in the data. So that needs to be refactored. I suppose it reduces token usage, but Claude Opus will happily do exactly what I want it to.


GPT-5.5 shatters benchmarks for amount of faith it puts in the user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: