Hacker Newsnew | past | comments | ask | show | jobs | submit | chaboud's commentslogin

I've joked that the 16oz US pint was a long-play metric-system scheme to drive adoption of 500ml (~16.9oz) as a measure, a Pavlovian mechanic to trick beer-drinking Americans that the metric system is actually better because it results in more beer. The joke's on them. We're all about 12oz cans! 33cl? pfft...

Germans have it nailed down with the Kölsch Stange, a 200ml glass that so readily disappears that it stays cold and you just get another from the Kranz.


no one drinks it that way, except for those who drive. and believe me, for half a litre in half an hour one yields an acceptable beer temperature

Nerds solving interesting problems find a way. Folks (myself included) worked on Privacy Pass for verifiable ad attestation, but the power of blinded attestation is meaningful in ways beyond advertising authenticity validation.

For folks at the implementation level, the problem is often the prize. I just need to get paid enough to not care about how much I get paid. And implicitly contributing to social good is a form of gamification that works well. I've encountered lots of folks who operate in the same way. Yes, we're paying a bit more of a "feed the beast" tax than the good old days, but we're still able to operate with a remarkable degree of latitude.


In a market with perfect price discovery, sure. However, over the years I have learned that even the best products for the job can (and will) lose without the right marketing, sales, distribution, etc.

Sometimes the entrenched default that collects an inertial premium doesn't get disrupted...

But, yes, anyone without a moat who operates with a presumption of retention runs the risk of being knocked off of their perch; their fate left to others.


marketing, sales, distribution, branding, supply chain, intangibles... can all be moats.. as long as the owner of that moat gets it and others dont

That's the polite version of "we know where you live". Telling someone you have their phone number is a way of saying "we'll call you and expect immediacy if you break something."

Wanna be treated like an adult? Cool. You'll also be held accountable like an adult.


Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.

(And, yeah, I'm all Claude Code these days...)


"When a measure becomes a target, it ceases to be a good measure."

Goodhart's law shows up with people, in system design, in processor design, in education...

Models are going to be over-fit to the tests unless scruples or practical application realities intervene. It's a tale as old as machine learning.


This is because of the forbidden argument in statistics. Any statistic, even something so basic as an average, ONLY works if you can guarantee the independence of the individual facts it measures.

But there's a problem with that: of course the existence of the statistical measure itself is very much a link between all those individual facts. In other words: if there is ANY causal link between the statistical measure and the events measured ... it has now become bullshit (because the law of large numbers doesn't apply anymore).

So let's put it in practice, say there's a running contest, and you display the minimum, maximum and average time of all runners that have had their turns. We all know what happens: of course the result is that the average trends up. And yet, that's exactly what statistics guarantees won't happen. The average should go up and down with roughly 50% odds when a new runner is added. This is because showing the average causes behavior changes in the next runner.

This means, of course, that basing a decision on something as trivial as what the average running time was last year can only be mathematically defensible ONCE. The second time the average is wrong, and you're basing your decision on wrong information.

But of course, not only will most people actually deny this is the case, this is also how 99.9% of human policy making works. And it's mathematically wrong! Simple, fast ... and wrong.


When I was a kid growing up in Texas, our ocean visits were to the Gulf of Mexico, off the Texas coast, and you would grab little alcohol wipes for when you got out of the ocean, to wipe the oil off.

Years later, swimming in Hawaii, I found myself looking for wipes. I mentioned it to a snorkel-outfit operator, and she looked at me like I was insane. They didn't even put damaging sunscreen in the water, and there was no expectation of little 1-2 inch sticky spots of oil.

The good old days, in the 80's, where we swam in oceans filled with slow-motion natural disasters. I wonder how much of it was place (Hawaiians seem to have a stronger relationship with the land and nature surrounding them) and how much of it was the time (20 years later).


Crude oil floating in the ocean used to be a big nuisance in parts of California. It is a natural phenomenon, created by oil deposits on the ocean floor leaking into the environment. Santa Barbara was particularly famous for it.

Extraction of that oil via commercial wells greatly reduced the natural seepage, which is why there is so little crude oil floating in that ocean water today. Oil drilling actually made the water cleaner.


To me this "drilling is good for the environment narrative" sounded a bit misleading.

And not far down the rabbit whole one finds: The author of the study often cited by oil companies for above narrative, felt impelled to publish a clarifying statement: https://luyendyk.faculty.geol.ucsb.edu/Seeps%20pubs/Luyendyk...

Maybe stricter guidelines against operational "routine" spills led to a reduction of the sticky spots, plausible?


Per NOAA and USGS, ~20 million liters of crude oil naturally seeps into that part of the California ocean each year. That is more crude oil each year than the worst oil spill in California history[0].

You are projecting your biases. There was no "drilling is good for the environment" narrative. I was recounting an interesting fact about the environment there.

Many of these seeps are under considerable pressure as there is substantial natural gas mixed in. The seepage rate of each has been mapped and studied for many decades. It has long been observed that the introduction of drilling appears to substantially reduced the seepage rate at many of these underwater sites. Drilling wells significantly reduces natural pressure in these reservoirs, likely leading to the observed reductions in seepage.

[0] https://en.wikipedia.org/wiki/1969_Santa_Barbara_oil_spill


> There was no "drilling is good for the environment" narrative.

> Oil drilling actually made the water cleaner.


For it to be a "narrative", there would need to be an additional claim that this specific case and context, which is factual, generalizes to most unrelated cases. That is not in evidence. Thinking that this was an attempt to create a narrative is a failure of reading comprehension.

This insistence that acknowledgement of facts has an ideological narrative is a pernicious strain of anti-science thinking.


To be clear, I have no skin in the game here. I thought the point you made sounded plausible and as I have zero experience or expertise, I wouldn't argue against it.

I just thought it's ridiculous - and kind of funny - to deny making the claim you literally made. I'm not sure you have a lot of legs to stand on, accusing others of "anti-science thinking" and a "failure of reading comprehension" when asking us to ignore the clear, textual evidence of that contradiction.

> For it to be a "narrative", there would need to be an additional claim that this specific case and context, which is factual, generalizes to most unrelated cases.

Says who? That seems a very narrow and unusual definition of what makes a "narrative", bent to your purpose. It seems to me, a "narrative" in common parlance just means "telling a story" or "relaying a sequence of events". I honestly have never seen someone use the word to imply generalization (doesn't mean no one ever did, of course).

In any case, given that you responded to a comment talking about the two examples of Texas and Hawaii with an example about California and an "actually", it seems pretty fair to me to say, that you even fulfilled this artificially narrowed definition.

I mean, come on, you have got to admit that you have at least been unclear, if you didn't intend to make this argument. Instead of just defensively flinging insults.


"This insistence that acknowledgement of facts has an ideological narrative is a pernicious strain of anti-science thinking."

That is very well put. This should be added to the general list of fallacies in argument, and like the other ones (the slippery slope, hasty generalization, Post hoc ergo propter hoc, etc.) more general awareness should exist about these.

The current wave of anti-science, anti-logic, rejection of objective data, etc. is like nothing I've experienced in my lifetime. This is a subjective observation, maybe it has always been this way and I never paid attention because I was caught up in whatever I used to be caught up in.


So you’re saying drilling destroyed crude oil’s natural habitat?


To this day if you walk on the beach your soles or the soles of your shoes will get sticky tar spots. You need baby oil wipes to clean them up before entering your home.

And some of it, if not most of it is not natural seepage but early environmental catastrophes in the 50s and 60s, particularly around Summerland.

(Source ex-resident)


Wtf I love Exxon now.


How did the wildlife adapt to that? There must be some cool species there


I don't know much about it but I have read that the local ecosystem is well-adapted to the oil seep environment.

That area has been like that for something like 100,000 years, which is a considerable amount of time in evolutionary terms.


I remember swimming in Santa Barbara growing up (well closer to the Ventura side really) and having to dodge oil on the sand and water.


Natural seepage is still just as big of an issue now as it was back then in those areas, including Santa Barbara.


There is still tons of tar on beaches in Santa Barbara county, mostly all from natural seeps.


For what it's worth you still need the alcohol wipes (mineral oil works well too) when swimming off the coast of Santa Barbara. It's naturally occurring oil that gets all over your feet in little annoying sticky spots.


yeah same for the Gulf Coast, oil just seeps right out of the ground at some beaches or at some times. There's plenty of man-made pollution to go around though.


Half-ish (don't get hung up on being exact, they are at least of similar orders of magnitude) of the oil that makes its way into the ocean is natural. That is, leaking out of the ground into the water not at all as a result of human activity. Obviously enormous anthropogenic oil spills make this a very spiky statistic one way or the other.

Oil production and natural oil seepage happen in the gulf of mexico because there's oil there, there's not much oil around Hawaii.

So there's likely both a human and non-human reason for this in Texas.


Growing up on the Atlantic coast of Florida, we kept a can or Renuzit solvent in the garage to wipe tar spots off our feet after coming home from the beach. I'm sure that stuff was toxic. The tar was everywhere for a few weeks, then gone for a while.

Hawaii has other problems. When I lived there, I went through a lot of Neosporin because every scrape you get from a reef pushes in bacteria that got into the ocean from the leaking sewer pipes.


Ha, yeah I remember the Galveston beaches as a kid. Left when I was 9, I can't imagine things have improved much since then...


Why stop there? Just call the LLM with the data and function description and get it to return the result!

(I'll admit that I've built a few "applications" exploring interaction descriptions with our Design team that do exactly this - but they were design explorations that, in effect, used the LLM to simulate a back-end. Glorious, but not shippable.)


That's basically how it works! (with human authored functions that validate the result, automatically providing feedback to the LLM if needed)


Because you often need the result not as a standalone artifact, but as a piece in a rigid process, consisting with well-defined business logic and control flow, with which you can't trust AI yet.


What was the gap you discovered that made it not shippable? This is an experimental project, so I'm curious to know what sorts of problems you ran into when you tried a similar approach.


Three things:

1. Confirmable, predictable behavior (can we test it, can we make assurances to customers?).

2. Comparative performance (having an LLM call to extract from a list in 100s of ms instead of code in <10ms).

3. Operating costs. LLM calls are spendy. Just think of them as hyper-unoptimized lossy function executors (along with being lossy encyclopedias), and the work starts to approach bogo algorithm levels of execution cost for some small problems.

Buuuuuut.... I had working functional prototype explorations with almost no work on my end, in an hour.

We've now extended this thinking to some experience exploration builders, so it definitely has a place in the toolbox.


The author seems to think they've hit upon something revolutionary...

They've actually hit upon something that several of us have evolved to naturally.

LLM's are like unreliable interns with boundless energy. They make silly mistakes, wander into annoying structural traps, and have to be unwound if left to their own devices. It's like the genie that almost pathologically misinterprets your wishes.

So, how do you solve that? Exactly how an experienced lead or software manager does: you have systems write it down before executing, explain things back to you, and ground all of their thinking in the code and documentation, avoiding making assumptions about code after superficial review.

When it was early ChatGPT, this meant function-level thinking and clearly described jobs. When it was Cline it meant cline rules files that forced writing architecture.md files and vibe-code.log histories, demanding grounding in research and code reading.

Maybe nine months ago, another engineer said two things to me, less than a day apart:

- "I don't understand why your clinerules file is so large. You have the LLM jumping through so many hoops and doing so much extra work. It's crazy."

- The next morning: "It's basically like a lottery. I can't get the LLM to generate what I want reliably. I just have to settle for whatever it comes up with and then try again."

These systems have to deal with minimal context, ambiguous guidance, and extreme isolation. Operate with a little empathy for the energetic interns, and they'll uncork levels of output worth fighting for. We're Software Managers now. For some of us, that's working out great.


Revolutionary or not it was very nice of the author to make time and effort to share their workflow.

For those starting out using Claude Code it gives a structured way to get things done bypassing the time/energy needed to “hit upon something that several of us have evolved to naturally”.


It's this line that I'm bristling at: "...the workflow I’ve settled into is radically different from what most people do with AI coding tools..."

Anyone who spends some time with these tools (and doesn't black out from smashing their head against their desk) is going to find substantial benefit in planning with clarity.

It was #6 in Boris's run-down: https://news.ycombinator.com/item?id=46470017

So, yes, I'm glad that people write things out and share. But I'd prefer that they not lead with "hey folks, I have news: we should *slice* our bread!"


But the author's workflow is actually very different from Boris'.

#6 is about using plan mode whereas the author says "The built-in plan mode sucks".

The author's post is much more than just "planning with clarity".


> The author's post is much more than just "planning with clarity".

Not much more, though.

It introduces "research", which is the central topic of LLMs since they first arrived. I mean, LLMs coined the term "hallucination", and turned grounding into a key concept.

In the past, building up context was thought to be the right way to approach LLM-assisted coding, but that concept is dead and proven to be a mistake, like discussing the best way to force a round peg through the square hole, but piling up expensive prompts to try to bridge the gap. Nowadays it's widely understood that it's far more effective and way cheaper to just refactor and rearchitect apps so that their structure is unsurprising and thus grounding issues are no longer a problem.

And planning mode. Each and every single LLM-assisted coding tool built their support for planning as the central flow and one that explicitly features iterations and manual updates of their planning step. What's novel about the blog post?


A detailed workflow that's quite different from the other posts I've seen.


> A detailed workflow that's quite different from the other posts I've seen.

Seriously? Provide context with a prompt file, prepare a plan in plan mode, and then execute the plan? You get more detailed descriptions of this if you read the introductory how-to guides of tools such as Copilot.


Making the model write a research file, then the plan and iterate on it by editing the plan file, then adding the todo list, then doing the implementation, and doing all that in a single conversation (instead of clearing contexts).

There's nothing revolutionary, but yes, it's a workflow that's quite different from other posts I've seen, and especially from Boris' thread that was mentioned which is more like a collection of tips.


> Making the model write a research file

Having LLMs write their prompt files was something that became a thing the moment prompt files became a thing.

> then the plan and iterate on it by editing the plan file, then adding the todo list, then doing the implementation, and doing all that in a single conversation (instead of clearing contexts).

That's literally what planning mode is.

Do yourself a favor and read the announcement of support for planning mode in Visual Studio. Visual Studio code supported it months before.

https://devblogs.microsoft.com/visualstudio/introducing-plan...


I'm not saying they invented anything. I'm saying it's a different workflow than what what I've seen on HN.

I don't care about Visual Studio, I don't use it, but the page you've linked seems to describe yet another workflow (not very detailed).


Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.

I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.


The author __is__ Boris ...


They are different Boris. I was using the names already used in this thread.


I would say he’s saying “hey folks, I have news. We should slice our bread with a knife rather than the spoon that came with the bread.”


> Anyone who spends some time with these tools (and doesn't black out from smashing their head against their desk) is going to find substantial benefit in planning with clarity.

That's obvious by now, and the reason why all mainstream code assistants now offer planning mode as a central feature of their products.

It was baffling to read the blogger making claims about what "most people" do when anyone using code assistants already do it. I mean, the so called frontier models are very expensive and time-consuming to run. It's a very natural pressure to make each run count. Why on earth would anyone presume people don't put some thought into those runs?


This kind of flows have been documented in the wild for some time now. They started to pop up in the Cursor forums 2+ years ago... eg: https://github.com/johnpeterman72/CursorRIPER

Personally I have been using a similar flow for almost 3 years now, tailored for my needs. Everybody who uses AI for coding eventually gravitates towards a similar pattern because it works quite well (for all IDEs, CLIs, TUIs)


Its ai written though, the tells are in pretty much every paragraph.


I don’t think it’s that big a red flag anymore. Most people use ai to rewrite or clean up content, so I’d think we should actually evaluate content for what it is rather than stop at “nah it’s ai written.”


>Most people use ai to rewrite or clean up content

I think your sentence should have been "people who use ai do so to mostly rewrite or clean up content", but even then I'd question the statistical truth behind that claim.

Personally, seeing something written by AI means that the person who wrote it did so just for looks and not for substance. Claiming to be a great author requires both penmanship and communication skills, and delegating one or either of them to a large language model inherently makes you less than that.

However, when the point is just the contents of the paragraph(s) and nothing more then I don't care who or what wrote it. An example is the result of a research, because I'd certainly won't care about the prose or effort given to write the thesis but more on the results (is this about curing cancer now and forever? If yes, no one cares if it's written with AI).

With that being said, there's still that I get anywhere close to understanding the author behind the thoughts and opinions. I believe the way someone writes hints to the way they think and act. In that sense, using LLM's to rewrite something to make it sound more professional than what you would actually talk in appropriate contexts makes it hard for me to judge someone's character, professionalism, and mannerisms. Almost feels like they're trying to mask part of themselves. Perhaps they lack confidence in their ability to sound professional and convincing?


People like to hide behind AI so they can claim credit for its ideas. It's the same thing in job interviews.


I don't judge content for being AI written, I judge it for the content itself (just like with code).

However I do find the standard out-of-the-box style very grating. Call it faux-chummy linkedin corporate workslop style.

Why don't people give the llm a steer on style? Either based on your personal style or at least on a writer whose style you admire. That should be easier.


Because they think this is good writing. You can’t correct what you don’t have taste for. Most software engineers think that reading books means reading NYT non-fiction bestsellers.


While I agree with:

> Because they think this is good writing. You can’t correct what you don’t have taste for.

I have to disagree about:

> Most software engineers think that reading books means reading NYT non-fiction bestsellers.

There's a lot of scifi and fantasy in nerd circles, too. Douglas Adams, Terry Pratchett, Vernor Vinge, Charlie Stross, Iain M Banks, Arthur C Clarke, and so on.

But simply enjoying good writing is not enough to fully get what makes writing good. Even writing is not itself enough to get such a taste: thinking of Arthur C Clarke, I've just finished 3001, and at the end Clarke gives thanks to his editors, noting his own experience as an editor meant he held a higher regard for editors than many writers seemed to. Stross has, likewise, blogged about how writing a manuscript is only the first half of writing a book, because then you need to edit the thing.


My flow is to craft the content of the article in LLM speak, and then add to context a few of my human-written blog posts, and ask it to match my writing style. Made it to #1 on HN without a single callout for “LLM speak”!


> I don’t think it’s that big a red flag anymore. Most people use ai to rewrite or clean up content, so I’d think we should actually evaluate content for what it is rather than stop at “nah it’s ai written.”

Unfortunately, there's a lot of people trying to content-farm with LLMs; this means that whatever style they default to, is automatically suspect of being a slice of "dead internet" rather than some new human discovery.

I won't rule out the possibility that even LLMs, let alone other AI, can help with new discoveries, but they are definitely better at writing persuasively than they are at being inventive, which means I am forced to use "looks like LLM" as proxy for both "content farm" and "propaganda which may work on me", even though some percentage of this output won't even be LLM and some percentage of what is may even be both useful and novel.


If you want to write something with AI, send me your prompt. I'd rather read what you intend for it to produce rather than what it produces. If I start to believe you regularly send me AI written text, I will stop reading it. Even at work. You'll have to call me to explain what you intended to write.


And if my prompt is a 10 page wall of text that I would otherwise take the time to have the AI organize, deduplicate, summarize, and sharpen with an index, executive summary, descriptive headers, and logical sections, are you going to actually read all of that, or just whine "TL;DR"?

It's much more efficient and intentional for the writer to put the time into doing the condensing and organizing once, and review and proofread it to make sure it's what they mean, than to just lazily spam every human they want to read it with the raw prompt, so every recipient has to pay for their own AI to perform that task like a slot machine, producing random results not reviewed and approved by the author as their intended message.

Is that really how you want Hacker News discussions and your work email to be, walls of unorganized unfiltered text prompts nobody including yourself wants to take the time to read? Then step aside, hold my beer!

Or do you prefer I should call you on the phone and ramble on for hours in an unedited meandering stream of thought about what I intended to write?


Yeah but it's not. This a complete contrivance and you're just making shit up. The prompt is much shorter than the output and you are concealing that fact. Why?

Github repo or it didn't happen. Let's go.


[flagged]


It’s certainly more interesting than whatever the AI would turn it into.


tl;dr


Even though I use LLMs for code, I just can't read LLM written text, I kind of hate the style, it reminds me too much of LinkedIn.


Very high chance someone that’s using Claude to write code is also using Claude to write a post from some notes. That goes beyond rewriting and cleaning up.


I use Claude Code quite a bit (one of my former interns noted that I crossed 1.8 Million lines of code submitted last year, which is... um... concerning), but I still steadfastly refuse to use AI to generate written content. There are multiple purposes for writing documents, but the most critical is the forming of coherent, comprehensible thinking. The act of putting it on paper is what crystallizes the thinking.

However, I use Claude for a few things:

1. Research buddy, having conversations about technical approaches, surveying the research landscape.

2. Document clarity and consistency evaluator. I don't take edits, but I do take notes.

3. Spelling/grammar checker. It's better at this than regular spellcheck, due to its handling of words introduced in a document (e.g., proper names) and its understanding of various writing styles (e.g., comma inside or outside of quotes, one space or two after a period?)

Every time I get into a one hour meeting to see a messy, unclear, almost certainly heavily AI generated document being presented to 12 people, I spend at least thirty seconds reminding the team that 2-3 hours saved using AI to write has cost 11+ person-hours of time having others read and discuss unclear thoughts.

I will note that some folks actually put in the time to guide AI sufficiently to write meaningfully instructive documents. The part that people miss is that the clarity of thinking, not the word count, is what is required.


Well, real humans may read it though. Personally I much prefer real humans write real articles than all this AI generated spam-slop. On youtube this is especially annoying - they mix in real videos with fake ones. I see this when I watch animal videos - some animal behaviour is taken from older videos, then AI fake is added. My own policy is that I do not watch anything ever again from people who lie to the audience that way so I had to begin to censor away such lying channels. I'd apply the same rationale to blog authors (but I am not 100% certain it is actually AI generated; I just mention this as a safety guard).


ai;dr

If your "content" smells like AI, I'm going to use _my_ AI to condense the content for me. I'm not wasting my time on overly verbose AI "cleaned" content.

Write like a human, have a blog with an RSS feed and I'll most likely subscribe to it.


> I don’t think it’s that big a red flag anymore.

It is to me, because it indicates the author didn't care about the topic. The only thing they cared about is to write an "insightful" article about using llms. Hence this whole thing is basically linked-in resume improvement slop.

Not worth interacting with, imo

Also, it's not insightful whatsoever. It's basically a retelling of other articles around the time Claude code was released to the public (March-August 2025)


The main issue with evaluating content for what it is is how extremely asymmetric that process has become.

Slop looks reasonable on the surface, and requires orders of magnitude more effort to evaluate than to produce. It’s produced once, but the process has to be repeated for every single reader.

Disregarding content that smells like AI becomes an extremely tempting early filtering mechanism to separate signal from noise - the reader’s time is valuable.


I think as humans it's very hard to abstract content from its form. So when the form is always the same boring, generic AI slop, it's really not helping the content.


And maybe writing an article or a keynote slides is one of the few places we can still exerce some human creativity, especially when the core skills (programming) is almost completely in the hands of LLMs already


>the tells are in pretty much every paragraph.

It's not just misleading — it's lazy. And honestly? That doesn't vibe with me.

[/s obviously]


So is GP.

This is clearly a standard AI exposition:

LLM's are like unreliable interns with boundless energy. They make silly mistakes, wander into annoying structural traps, and have to be unwound if left to their own devices. It's like the genie that almost pathologically misinterprets your wishes.


Then ask your own ai to rewrite it so it doesn't trigger you into posting uninteresting thought stopping comments proclaiming why you didn't read the article, that don't contribute to the discussion.



Agreed. The process described is much more elaborate than what I do but quite similar. I start to discuss in great details what I want to do, sometimes asking the same question to different LLMs. Then a todo list, then manual review of the code, esp. each function signature, checking if the instructions have been followed and if there are no obvious refactoring opportunities (there almost always are).

The LLM does most of the coding, yet I wouldn't call it "vibe coding" at all.

"Tele coding" would be more appropriate.


I use AWS Kiro, and its spec driven developement is exactly this, I find it really works well as it makes me slow down and think about what I want it to do.

Requirements, design, task list, coding.


I’ve also found that a bigger focus on expanding my agents.md as the project rolls on has led to less headaches overall and more consistency (non-surprisingly). It’s the same as asking juniors to reflect on the work they’ve completed and to document important things that can help them in the future. Software Manger is a good way to put this.


AGENTS.md should mostly point to real documentation and design files that humans will also read and keep up to date. It's rare that something about a project is only of interest to AI agents.


It feels like retracing the history of software project management. The post is quite waterfall-like. Writing a lot of docs and specs upfront then implementing. Another approach is to just YOLO (on a new branch) make it write up the lessons afterwards, then start a new more informed try and throw away the first. Or any other combo.

For me what works well is to ask it to write some code upfront to verify its assumptions against actual reality, not just be telling it to review the sources "in detail". It gains much more from real output from the code and clears up wrong assumptions. Do some smaller jobs, write up md files, then plan the big thing, then execute.


It makes an endless stream of assumptions. Some of them brilliant and even instructive to a degree, but most of them are unfounded and inappropriate in my experience.


'The post is quite waterfall-like. Writing a lot of docs and specs upfront then implementing' - It's only waterfall if the specs cover the entire system or app. If it's broken up into sub-systems or vertical slices, then it's much more Agile or Lean.


This is exactly what I do. I assume most people avoid this approach due to cost.


Please explain what do you mean by “cost”?


You burn a lot of money on tokens for a solution that you throw away.


Oh no, maybe the V-Model was right all the time? And right sizing increments with control stops after them. No wonder these matrix multiplications start to behave like humans, that is what we wanted them to do.


So basically you’re saying LLMs are helping us be better humans?


Better humans? How and where?


> The author seems to think they've hit upon something revolutionary...

> They've actually hit upon something that several of us have evolved to naturally.

I agree, it looks like the author is talking about spec-driven development with extra time-consuming steps.

Copilot's plan mode also supports iterations out of the box, and draft a plan only after manually reviewing and editing it. I don't know what the blogger was proposing that ventured outside of plan mode's happy path.


If you have a big rules file you’re in the right direction but still not there. Just as with humans, the key is that your architecture should make it very difficult to break the rules by accident and still be able to compile/run with correct exit status.

My architecture is so beautifully strong that even LLMs and human juniors can’t box their way out of it.


I've been doing the exact same thing for 2 months now. I wish I had gotten off my ass and written a blog post about it. I can't blame the author for gathering all the well deserved clout they are getting for it now.


Don’t worry. This advice has been going around for much more than 2 months, including links posted here as well as official advice from the major companies (OpenAI and Anthropic) themselves. The tools literally have had plan mode as a first class feature.

So you probably wouldn’t have any clout anyways, like all of the other blog posts.


I went through the blog. I started using Claude Code about 2 weeks ago and my approach is practically the same. It just felt logical. I think there are a bunch of us who have landed on this approach and most are just quietly seeing the benefits.


> LLM's are like unreliable interns with boundless energy.

This was a popular analogy years ago, but is out of date in 2026.

Specs and a plan are still good basis, they are of equal or more importance than the ephemeral code implementation.


It's alchemy all over again.


Alchemy involved a lot of do-it-yourself though. With AI it is like someone else does all the work (well, almost all the work).


It was mainly a jab at the protoscientific nature of it.


Reproducing experimental results across models and vendors is trivial and cheap nowadays.


Not if anthropic goes further in obfuscating the output of claude code.


Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.

The resulting artefact, that's what is worth testing.


> Why would you test implementation details

Because this has never been sufficient. From things like various hard to test cases to things like readability and long term maintenance. Reading and understanding the code is more efficient and necessary for any code worth keeping around.


> LLM's are like unreliable interns with boundless energy

This isn’t directed specifically at you but the general community of SWEs: we need to stop anthropomorphizing a tool. Code agents are not human capable and scaling pattern matching will never hit that goal. That’s all hype and this is coming from someone who runs the range of daily CC usage. I’m using CC to its fullest capability while also being a good shepherd for my prod codebases.

Pretending code agents are human capable is fueling this koolaide drinking hype craze.


It’s pretty clear they effectively take on the roles of various software related personas. Designer, coder, architect, auditor, etc…

Pretending otherwise is counter-productive. This ship has already sailed, it is fairly clear the best way to make use of them is to pass input messages to them as if they are an agent of a person in the role.


if only there was another simpler way to use your knowledge to write code...


I really like your analogy of LLMs as 'unreliable interns'. The shift from being a 'coder' to a 'software manager' who enforces documentation and grounding is the only way to scale these tools. Without an architecture.md or similar grounding, the context drift eventually makes the AI-generated code a liability rather than an asset. It's about moving the complexity from the syntax to the specification.


It's nice to have it written down in a concise form. I shared it with my team as some engineers have been struggling with AI, and I think this (just trying to one-shot without planning) could be why.


Cool. Good for him. I've been building agentic and observational systems and have been working to make them safe and layered in defense. And, well, I probably should have just said "fuck it" and put a disclaimer sticker on the front to let it fly.

Yeah, these systems are going to get absolutely rocked by exploits. The scale of damage is going to be comical, and, well, that's where we are right now.

Go get 'em, tiger. It's a brave new world. But, as with my 10 year old, I need to make sure the credit cards aren't readily available. He'd just buy $1k of robux. Who knows what sort of havoc uncorked agentic systems could bring?

One of my systems accidentally observed some AWS keys last night. Yeah. I rotated them, just in case.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: