I tried the agent thing on: - Large C codebase (new feature and bugfix) - Small ...

vitaflo · 2025-06-03T00:29:48 1748910588

It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

An interesting thing about many of these types of posts is they never actually detail the tools they use and how they use them to achieve their results. It shouldn’t even be that hard for them to do, they could just have their agent do it for them.

phkahler · 2025-06-03T00:53:20 1748912000

>> It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

You may be right. The author of this one even says if you spend time prettying your code you should stop yak shaving. They apparently don't care about code quality.

blibble · 2025-06-03T01:05:37 1748912737

> You may be right. The author of this one even says if you spend time prettying your code you should stop yak shaving. They apparently don't care about code quality.

brought to you by fly.io, where the corporate blog literally tells you to shove your concerns up your ass:

> Cut me a little slack as I ask you to shove this concern up your ass.

lumost · 2025-06-03T00:57:54 1748912274

The agent/model being used makes a huge difference. Cline with Claude 3.7 is ridiculously expensive but useful. Copilot is vaguely ok.

galangalalgol · 2025-06-03T01:10:13 1748913013

Even just doing the cut and paste thing for one shots, claude sonnet 4 writes good rust code, generally on the first try.

runjake · 2025-06-03T00:52:51 1748911971

You’re not providing a key piece of information to provide you with an answer: what were the prompts you used? You can share your sessions via URL.

A prompt like “Write a $x program that does $y” is generally going to produce some pretty poor code. You generally want to include a lot of details and desires in your prompt. And include something like “Ask clarifying questions until you can provide a good solution”.

A lot of the people who complain about poor code generation use poor prompting.

phkahler · 2025-06-03T00:54:53 1748912093

It'd be nice if the AI advocates shared prompts, or even recorded entire sessions. Then we could all see how great it really is.

runjake · 2025-06-03T01:04:38 1748912678

They do. All over the place.

Simon Willison has some great examples in his blog and on his GitHub. Check out Karpathy’s YouTube videos as well.

https://simonwillison.net/

jonas21 · 2025-06-03T01:06:24 1748912784

This one's been on the HN front page all day:

https://news.ycombinator.com/item?id=44159166

hermanradtke · 2025-06-03T00:55:45 1748912145

Many do. In their GitHub repos right next to the code.

foobarian · 2025-06-03T01:07:57 1748912877

"Infrastructure as Prompts" (IaP)

lexandstuff · 2025-06-03T00:56:23 1748912183

Prompt engineering isn't really that important anymore imo. If you're using a reasoning model, you can see if it understood your request by reading the reasoning trace.

simonw · 2025-06-03T01:10:00 1748913000

Disagree, effective prompting remains a crucial skill that's difficult to acquire.

I've been developing my prompting skills for nearly three years now and I still constantly find new and better ways to prompt.

I also consider knowing what "use a reasoning model" means to be part of that skill!

jacob019 · 2025-06-03T01:18:52 1748913532

That's a very dangerous thought. Prompt engineering evolved is just clear and direct communication. That's a hard thing to get right when talking to people. Heck, personally I can have a hard time with clear and coherent internal dialog. When I am working with models and encounter unexpected results, it often boils down to the model giving me what I asked for instead of what I want. I've never met anyone who always knows exactly what they want and is able to articulate it with perfect clarity. Some of the models are surprisingly good at figuring out intent, but complexity inevitably requires additional context. Whether you are working with a model or a person, or even your future self, you must spend time developing and articulating clear specifications, that is prompt engineering. Furthermore, models don't "think" like people--there's technique in how you struture specifications for optimal results.

lexandstuff · 2025-06-03T01:28:32 1748914112

Fair enough. I guess I was mainly thinking of how rarely I need to utilise the old prompt engineering techniques. Stuff like: "You are an expert software developer...", "you must do this or people will die." etc

I just tell the AI what I want, with sufficient context. Then, I check the reasoning trace to check it understood what I wanted. You need to be clear in your prompts, sure, but I don't really see it as "prompt engineering" any more.

runjake · 2025-06-03T01:06:40 1748912800

Yeah I strongly disagree with that. I think prompts are very critical.

As with any other project, it’s best to specify your wants and needs than to let someone or an LLM to guess.

phkahler · 2025-06-03T01:24:08 1748913848

There a many ways to do something wrong and few ways to do them right. It's on the AI advocates to show us session logs so we can all see how it's done right.

chucknthem · 2025-06-03T00:18:29 1748909909

How are you writing your prompts? I usually break a feature down to smaller task level before I prompt an agent (claude code in my case) to do anything. Feature level is often too hard to prompt and specify in enough detail for it to get right.

So I'd say claude 4 agents today are at smart but fresh intern level of autonomy. You still have to do the high level planning and task break down, but it can execute on tasks (say requiring 10 - 200 lines of code excluding tests). Any asking it to write much more code (200+ lines) often require a lot of follow ups and disappointment.

jvanderbot · 2025-06-03T00:22:14 1748910134

This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things seem bad.

presentation · 2025-06-03T00:42:07 1748911327

I have not had a ton of success getting good results out of LLMs but this feels like a UX problem. If there’s an effective way to frame a prompt why don’t we get a guided form instead of a single chat box input?

presentation · 2025-06-03T00:40:38 1748911238

Coding agents should take you through a questionnaire before working. Break down what you are asking for into chunks, point me to key files that are important for this change, etc etc. I feel like a bit of extra prompting would help a lot of people get much better results rather than expecting people to know the arcane art of proompting just by looking at a chat input.

consumer451 · 2025-06-03T00:53:21 1748912001

I am just a muggle, but I have been using Windsurf for months and this is the only way for me to end up with working code.

A significant portion of my prompts are writing and reading from .md files, which plan and document the progress.

When I start a new feature, it begins with: We need to add a new feature X that does ABC, create a .md in /docs to plan this feature. Ask me questions to help scope the feature.

I then manually edit the feature-x.md file, and only then tell the tool to implement it.

Also, after any major change, I say: Add this to docs/current_app_understanding.md.

Every single chat starts with: Read docs/current_app_understanding.md to get up to speed.

The really cool side benefit here is that I end up with solid docs, which I admittedly would have never created in the past.

cyral · 2025-06-03T01:17:24 1748913444

You can ask it to do this, in your initial prompt encourage it to ask questions before implementing if it is unsure. Certain models like o4 seem to do this more by default rather than Claude that tends to try to do everything without clarifying

jaggederest · 2025-06-03T00:59:46 1748912386

I mean if you ask Claude code to walk through what you should do next with you it'll ask lots of great questions and write you a great TODO.md file that it'll then walk down and check the boxes on.

You don't exactly need to know prompting, you just need to know how to ask the AI to help you prompt it.

Affric · 2025-06-03T00:53:36 1748912016

I feel like when you prompt an LLM the LLM should take it almost as "what would the best possible prompt for this prompt be and then do that"...

jacob019 · 2025-06-03T01:06:45 1748912805

I don't think it's fair to call that the agent thing. I've had profoundly positive results with agentic workflows for classification, analysis, and various business automations, including direct product pricing. You have to build an environment for the agent to make decisions in, with good instructions for what you want them to do. Then you wire it up so that the decisions have effects in the real world. You can acheieve really good results, and there is a lot of flexibility to tweak it and various tricks to optimize performance. Tools can allow agents to pull in relevant context as needed, or to execute complex multistep workflows. That is the agent thing.

Writing code is one thing that models can do when wired properly, and you can get a powerful productivity boost, but wielding the tools well is a skill of it's own, and results will vary by task, with each model having unique strengths. The most important skill is understanding the limitations.

Based on your task descriptions and the implied expectation, I'm unsurprised that you are frustrated with the results. For good results with anything requiring architecture decisions have a discussion with the model about architecture design, before diving in. Come up with a step by step plan and work through it together. Models are not like people, they know everything and nothing.

turtlebits · 2025-06-03T00:21:25 1748910085

Have it make small changes. Restrict it to a single file and scope it to <50 lines or so. Enough that you can easily digest without making it a chore.

declan_roberts · 2025-06-03T00:42:48 1748911368

A small change scoped to <50 lines is something easy to write for a normal software engineer. When do the LLMs start doing the hard part?

skydhash · 2025-06-03T01:04:05 1748912645

A small change around 50 lines is the size of an advent of code solution (the hardest part). Most of the code you write around that is for defensive coding (error handling, malformed input, expected output,…) which is the other hard part. Then you connect these cores to form a system and that’s another tough problem. And it needs to evolve to.

We’ve built tools to help us with the first part, framework with the second, architecture principles with the third and software engineering techniques for the fourth. Where do LLMs help?

kasey_junk · 2025-06-03T00:51:49 1748911909

When you wire them up to your cicd process with pull requests and the github gui as your interface, rather than sitting there passively riding along as it prompts you the changes it’s going to make.

With my async agent I do not care about how easy it is for me, it’s easier to tell the agent to do the workflow and comeback to it later when I’m ready to review it. If it’s a good change I approve the pr, if not I close it.

simonw · 2025-06-03T01:13:11 1748913191

Can you type that 50 line change in less than 15 seconds?

soraminazuki · 2025-06-04T02:50:07 1749005407

It'd take less than the time it takes to come up with The Perfect Prompt.

citizenpaul · 2025-06-03T00:30:50 1748910650

>these agent posts are paid advertising.

I'm 100% certain most if not all of them are, there is simply too much money flying around and I've seen things that marketing does in the past for way less hyped products. Though in this specific case I think the writer may simply be shilling AI to create demand for their service. Pay us monthly to one click deploy your broken incomplete AI slop. The app doesn't work? No problem just keep prompting harder and paying us more to host/build/test/deploy it...

I've also tried the agent thing and still am with only moderate success. Cursor, Claud-squad, goose, dagger AI agents. In other words all the new hotness, all with various features claiming to solve the fact that agents don't work. Guess what? they still don't.

But hey this is HN? most of the posters are tech fearing luddies right? All the contention on here must mean our grindset is wrong and we are not prompting hard enough.

There is even one shill Ghuntly that claims you need to be "redlining" ai at the cost of $500-$1000 per day to get the full benefits. LOL if that is not a veiled advertisement I don't know what is.

simonw · 2025-06-03T01:12:20 1748913140

Nobody has paid me to write anything about AI and if they did I would disclose it.

ghuntley · 2025-06-03T13:02:20 1748955740

ditto - https://ghuntley.com/disclosures

citizenpaul · 2025-06-03T17:50:23 1748973023

Forgive me if I don't consider your personal blog an authority of your honesty. "I'm not a liar, for real dude see it says right there"

Why are all your public projects "joke/toy projects" if AI is so awesome and production ready? My experience reflects this as the truth. Yet your work backs up my experience rather than your words.

To avoid only being snark. I think all software is about power/control and software has allowed unprecedented concentration of power Which is why it resists being official like other industries. No one with power wants restrictions on their power/software. Ultimately AI is good for small'ish projects and a productivity multiplier(eventually). I think it will lead a new revolution in distilling the current incumbents in the business world that are stagnant on vast proprietary software systems that previously could not be unseated. Small players will be able to codify/automate their business to make it competitive with big players. So I'm not "anti-ai".

edit: AI will simultaneously rot the proprietary software advantage from the inside out. As companies are further convinced that AI can solve their problem of having to pay people to maintain their software.

simonw · 2025-06-03T18:00:12 1748973612

I think it's pretty counter-productive to default to not trusting anyone under any circumstances.

Having a "disclosures" page on a personal website is a pretty strong quality signal for me - it's inspired me to set a task to add my own.

As with all of these things, the trick is to establish credibility over time. I've been following Geoff for a few years. Given his previous work I think he has integrity and I'm ready to believe his disclosures page.

citizenpaul · 2025-06-03T18:35:30 1748975730

I get what your saying.

However we seem to live in a time where integrity is basically valued at zero or more commonly as something to "bank" so you can cash in for an enormous payoff when the time comes. I agree he seems authentic, therefore valuable. Which means an AI company can come and offer him 7-8 figures to build hype. I think its hard for people to truly grasp just how much money is flying around in hype cycles. Those numbers are not unrealistic. That's set for life money, not many are in a position to refuse that kind of wealth. (he lives in van, just saying)

I hope he is one of the few authentic people left but the world has left me jaded.

simonw · 2025-06-03T21:31:22 1748986282

Secretly offering someone 7-8 figures to hype for you is a big business risk to take on.

If details of that deal leak, it's a big embarrassment for the company.

In the USA it is also illegal. There are substantial FTC fines to worry about. If it affects the stock price it could be classified as securities fraud (Matt Levine will happily tell you that "everything is securities fraud").

citizenpaul · 2025-06-04T22:13:26 1749075206

>If details of that deal leak, it's a big embarrassment for the company.

Intermediaries.

Also IMO the risk of someone whom is not already rich turning down that kinda money is so close to zero that it is effectively zero. No risk.

If everything is securities fraud then by that logic it would not be considered in making sketchy deals. Also as you double state, it only matters if the company is public anyway. Hmmm is openai public? Are any of the AI players besides MS,Oracle,Google? Short answer. No.

I'm not sure why with all the public unpunished criminal behavior we see nowadays you have such trouble believing that there really are lots of paid shills for such a hyped product.

simonw · 2025-06-04T23:00:09 1749078009

Probably because, as an unpaid shill, I understand why people would recommend this stuff without needing to be paid to do so.

citizenpaul · 2025-06-05T18:26:18 1749147978

I'm not accusing you of being a paid shill. My core argument is that there are lots of paid shills for AI being created for the last 2 years and going. I of course will never have the hard evidence to prove it so inferring is all I can do or point people to.

HN seems to have a very long tolerance of suspected/potential white collar crimes. So I don't expect many allies on here. Seems the mindset is the ends justifies the means prevails.

citizenpaul · 2025-06-04T22:14:34 1749075274

So you have no reply to my straightforward question? If AI has made you 1000x productive why do you only post self described toy/joke projects?

I know the answer though. Its because that is all it is good for. Everyone that has tried these tools knows that.

CapsAdmin · 2025-06-03T00:39:49 1748911189

This is my experience too most of the time. Though sometimes it does work, and sometimes a solution is found that I never thought of. But most of the time I have to change things around to my liking.

However, a counter argument to all this;

Does it matter if the code is messy?

None of this matters to the users and people who only know how to vibe code.

someguy101010 · 2025-06-03T00:43:03 1748911383

> Does it matter if the code is messy?

It matters proportionally to the amount of time I intend to maintain it for, and the amount of maintenance expected.

ComplexSystems · 2025-06-03T03:32:31 1748921551

Same here. I keep trying to figure out WTF agent that people are using to get these great results, because Copilot with Claude 4 and Gemini 2.5 has been a disastrous mess for me.