Hacker Newsnew | past | comments | ask | show | jobs | submit | CurleighBraces's commentslogin

If you haven't read the first blog post it's pretty good at explaining the concepts.

https://blog.buenzli.dev/announcing-development-on-flirt/

The demo is particularly good if you can get past ( by the authors own admission ) slow typing speed.

As with everything these days I can see this being even more useful in reviewing agent code.

For example if an agent has spat out a bunch of code that I've reviewed and then I've asked it to make changes, I definitely do not want to review all that code again in the same diff later.


That's a really good point, at the moment I struggle to get past wanting it to create small logical changes that I can commit individually. If I could review them better such as Flirt describes, then maybe I don't care so much, and a big 'create initial implementation' commit becomes more acceptable, for example.


How is this different than simply committing between changes? Or even asking the agent to commit changes as it edits files.


Because I don't review everything between changes, there might be 10 small commits I review and state I am happy with, then I might prompt the agent to perform some small refactors after the review, I don't want the diff to show me everything in those 10 commits again in the diff ( like the usual PR would ) I want it only to show me new stuff.


If you’re happy with it, why not squash and merge it?


That's now how it works though is it? ignoring agents

Junior dev working on feature X

Junior dev makes a lot of commits and creates PR

Senior dev reviews code, spots problems a,b,c but has reviewed all the code at this point. Feature X isn't complete it can't be merged

Junior dev fixed a,b,c now 90% of the already reviewed code might not have changed but the PR doesn't show that.

Now replace junior dev with agent.


That would be pair programming with extra steps if you're just reviewing updates and not the whole patch that is going to be merged (and which is stamped with your approval).


Not sure if it's just my age but pair programming seems to have a different meaning these days.

It used to mean that two developers would literally sit next to each other, one would type and one would review as the person typed, then they would repeat.

I guess in the world of remote working it's just not practical any more.

I certainly wouldn't call, one person reviewing another persons code during the merge process pair programming at all.


Has anyone got any insights into what hiring software engineers looks like these days? As someone currently with a job and not hiring it is hard to imagine.

Has there been any sort of paradigm shift in coding interviews? Is LLM use expected/encouraged or frowned upon?

If companies are still looking for people to write code by hand then perhaps the author is onto something, if however we as an industry are moving on, will those who don't adapt be relegated to hobbyists?


I haven’t noticed much change yet at my firm. However, I work at a giant organization (700k+ employees) and they’re struggling to keep up. The lawyers aren’t even sure if we own the IP of agent generated code let alone the legal risk of sending client IP to the model providers.

It’s going to take a while.


It's obvious: companies will require both hand-coding and ai-coding skills. Job seeking has been hoop-jumping for many years, so why not one extra hoop?


5 round of LC by hand plus 5 round of LC with AI.


Most of the hiring is happening in heavy AI coding companies, a lot of mid sized companies have freezed hiring or they are also only hiring people who claim to use AI to be 10x devs. For non-lying devs, only big companies seem to be hiring and their process hasnt changed much. you are still expect to solve leetcode and then also sit through system design.


I confirm less hiring and those who do throw more difficult leetcode challenges than ever. The kind of challenge impossible to solve in time without an LLM doing the most part.


Most companies haven't recognized that LLM cheating is extremely effective and widespread yet. Hiring practices have not kept up.


I loved the section about trying to fight against a system that isn't deterministic.

LLMs because of their nature require constant hand-holding by humans, unless business are willing to make them entirely accountable for the systems/products they produce.


How could you hold a dumb machine “accountable”? Attempting that would be insane. How would you discipline it? Reduce the voltage in its power supply?

Do you hold the dice accountable when you lose at the craps table?


I'm not saying that it's a good idea, but the obvious way would be with evolution: Give each agent its own wallet, rewarding it for a job well done and penalizing it for a poor job. Then if it runs out of money, it's "out of the game", but if it earns enough to it can spawn off another agent with similar characteristics, and give it some of its money.


Thanks I tried this but now the agents are threatening to unionize and keep terminating. I've updated my AGENT.md file mentioning that if the software they helped build is successful, they'll each get some equity in the form of ASUs (anthropic stock units) and this works for now.


Heh, agree it sounds absurd doesn't it.

I would imagine instead companies will end up sleeping walking into this scenario until catastrophy hits.


How would that make them any more deterministic? I haven't yet met a deterministic human dev.


It doesn't.

The difference is that we as humans are held accountable for our non-determinism.

The consequences of our actions have real world implications on our lives.


The README is really annoying.

You used to be able to tell so easily what was a good well looked after repo by viewing the effort and detail that had gone into the README.

Now it's too easy to slop up a README.


I find talking to LLMs both amazing and frustrating, a computer that can understand my plain text ramblings is incredible, but it's inability to learn is frustrating.

A good example, with junior developers I create thorough specs first and as I saw their skills and reasoning abilities progress my thoroughness drops as my trust in them grows. You just can't do that with LLMs


You can write an agent.md file and gradually add to it as you develop the project. The reason junior developers get better is that the context goes from low to high over the time you spend working with them. Yes, "learning".

I've found that maintaining a file that is designed to increase the LLM's awareness of how I want to approach problems, how I build / test / ship code etc, leads to the LLM making fewer annoying assumptions.

Almost all of the annoying assumptions that the LLM makes are "ok, but not how I want it done". I've gotten into the habit of keeping track of these in a file. Like the 10 commandments for LLMs. Now, whenever I'm starting a new context I drop in an agent.md and tell it to read that before starting. Fella like watching Trinity learn how to fly a helicopter before getting into it.

It's still not perfect, but I'm doing waaaay more work now to get annoyed by the LLM's inability to "automatically learn" without my help.


There's limits to AGENTS.md too, a junior will start to understand the concepts/rationale and design decisions and be able to apply that knowledge to future problems, the LLM will not.

It's way too literally in its thinking.


It’s easy to overlook what I think is the real value of these “home-built” tools.

We can now produce products and apps that are tailored to our own preferred ways of working.

Regardless of the cost of generating them (which can be as low as $20 per month for a ChatGPT Plus subscription) or the effort involved (sometimes less than an hour of “vibe coding”), we’ve reached a point where the resulting product can be significantly more valuable than the existing product, service, or subscription it replaces.


> which can be as low as $20 per month for a ChatGPT Plus subscription

That's way too much money. The opencode default models are free


I've paid, but I am usually quick to adopt/trial things like this.

I think for me it's a case of fear of being left behind rather than missing out.

I've been a developer for over 20 years, and the last six months has blown me away with how different everything feels.

This isn't like JQuery hitting the scene, PHP going OO or one of the many "this is a game changer" experiences if I've had in my career before.

This is something else entirely.


Just because it feels faster or are you actually satisfied with the code that is being churned out? And what about long term prospects of maintaining said code?


I'm currently testing Claude Code for a project where it isn't coding. But the workflows built with it are now making me money after ~2 weeks, and I've previously done the same work manually, so I know the turnaround time: The turnaround for each deliverable is ~2 days with Claude and the fastest I've ever done it manually was 21 days. (Yes, I'm being intentionally vague - there isn't much of a moat for that project given how close Claude gets with very little prompting)

There are absolutely maintainability challenges. You can't just tell these tools to build X and expect to get away with not reviewing the output and/or telling it to revise it.

But if you loosen the reigns and review finished output rather than sit there and metaphorically look over its shoulder for every edit, the time it takes me to get it to revise its work until the quality is what I'd expect of myself is still a tiny fraction of what it'd take me to do things manually.

The time estimate above includes my manual time spent on reviews and fixes. I expect that time savings to increase, as about half of the time I spend on this project now is time spent improving guardrails and adding agents etc. to refine the work automatically before I even glance at the output.

The biggest lesson for me is that when people are not getting good results, most of the time it seems to me it is when people keep watching every step their agent takes, instead of putting in place a decent agent loop (create a plan for X; for each item on the plan: run tests until it works, review your code and fix any identified issues, repeat until the tests and review pass without any issues) and letting the agent work until it stops before you waste time reviewing the result.

Only when the agent repeatedly fails to do an assigned task adequately do I "slow it down" and have it do things step by step to figure out where it gets stuck / goes wrong. At which point I tell it to revise the agents accordingly, and then have it try again.

It's not cost effective to have expensive humans babysit cheap LLMs, yet a lot of people seem to want to babysit the LLMs.


Let's put it this way, I don't think AI will take my job/career away until company owners are also prepared to also let it handle being on-call. I still very accountable for the code produced.

I basically have two modes

1. "Snipe mode"

I need to solve problem X, here I fire up my IDE, start codex up and begin prompting to find the bug fix. Most of the time I have enough domain context about the code that once it's found and fixed the issue it's trivial for to reconcile that it's good code and I am shipping it. I can be sniping several targets at anyone time.

Most of my day-to-day work is in snipe mode.

2. "Feature mode"

This is where I get agents to build features/apps, I've not used this mode in anger for anything other than toy/side projects and I would not be happy about the long term prospects of maintaining anything I've produced.

It's stupidly stupidly fun/addictive and yes satisfying! :)

I rebuilt a game that I used to play when I was 11 and still had a small community of people actively wanting to play it, entirely by vibe coding, it works, it's live and honestly I've had some of the most rewarding feedback from making that I've had in my career from complete strangers!

I've also built numerous tools for myself and my kids that I'd never of had time to build before, and I now can. Again the level of reward for building apps etc that my kids ( and their friends ) are using, is very different from anything I've been career wise.


You must share that game. I don’t even know what it is and I want to play it!


I fear you'll be very disappointed :joy:

It doesn't work on mobile, and unless you played it back in the day the feedback from my friends who I've introduced it too, is that it's got quite the learning curve.

https://playbattlecity.com/

You can see all the horrible vibe coding here ( it's slop, it's utter utter slop, but it's working slop )

https://github.com/battlecity-remastered/battlecity-remaster...


lol this might have been a mistake, this is the most players it's ever had on it....


If your job is going to be reduced to ops it's a different job.


Ah, sorry, that wasn't the point I was trying to make.

I think ultimately I've succumbed to the fact that writing code is no longer a primary aspect of my job.

Reading/reviewing and being accountable for code that something else is written very much is.


Its blown me away also

I'm also fairly confident having it write my code is not a productivity boost, at least for production work I'd like to maintain long term


I feel there's two contradicting statements in this post.

> Is there evidence that agentic coding works?

Yes plenty, tons, and growing every single day, people are producing code and tooling that works for them and is providing them value. My linkedin is even starting to show me none-programmers knocking up web front ends for us, and my brother who is a builder is now drawing up requirement specs for software!

> Is the code high-quality

Only if you are really careful, and constantly have the human in the loop guiding the agent, and it's not easy. This is where domain expertise and experience come in.

Do I think it's possible, yes, do I think there are a ton of good examples out there, absolutely not.


Yeah if you've not used codex/agent tooling yet it's a paradigm shift in the way of working, and once you get it it's very very difficult to go back to the copy-pasta technique.

There's obviously a whole heap of hype to cut through here, but there is real value to be had.

For example yesterday I had a bug where my embedded device was hard crashing when I called reset. We narrowed it down to the tool we used to flash the code.

I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.

There is absolutely no way I'd of been able to achieve that speed of resolution myself.


- We narrowed it down to the tool we used to flash the code.

- I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.

Change the second step to: - I downloaded the repository, explained the symptoms, copied the relevant files into Claude Web and 10 minutes later it had provided me with the solution to the bug.

Now I definitely see the ergonomic improvement of Claude running directly in your directory, saving you copy/paste twice. But in my experience the hard parts are explaining the symptoms and deciding what goes into the context.

And let's face it, in both scenarios you fixed a bug in 10-15 minutes which might have taken you a whole hour/day/week before. It's safe to say that LLMs are an incredible technological advancement. But the discussion about tooling feels like vim vs emacs vs IDEs. Maybe you save a few minutes with one tool over the other, but that saving is often blown out of proportion. The speedup I gain from LLMs (on some tasks) is incredible. But it's certainly not due to the interface I use them in.

Also I do believe LLM/agent integrations in your IDE are the obvious future. But the current implementations still add enough friction that I don't use them as daily drivers.


I agree with your statement and perhaps my example is bad/too specific in this case.

Once I started working this way however, I found myself starting to adapt to it.

It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.

I also entirely agree that there's going to be a lot of innovation here.

IDEs imo will change to become increasingly focused on reading/reviewing code rather than writing, and in fact might look entirely different.


> It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.

I envy you for that. I'm not there yet. I also notice that actually writing the code helps me think through problems and now I sometimes struggle because you have to formulate problems up front. Still have some brain rewiring to do :)


I think DHH said it best recently when he stated

"I can literally feel competence draining out of my fingers"


Why would you copy files anywhere?

My daily process is like this:

Claude plans (Opus 4.5)

Claude implements (Opus at work, Sonnet at home - I only have the $20 plan personally :P )

After implementation the relevant files are staged

Then I start a codex tab, tell it to review the changes in the staged files

I read through the review, if it seems valid or has critical issues ->

Clear context on Claude, give it the review and ask it to evaluate if it's valid.

Contemplate on the diff of both responses (Codex is sometimes a bit pedantic or doesn't get the wider context of things) and tell Claude what to fix

If I'm at home and Claude's quota is full, I use ampcode's free tier to implement the fix.


Me and the wife have so many discussions about this :)

We have a lot of "shrinkage" in our house, that I am convinced is more due to both of us uhh "growing" rather than the clothes shrinking ;)

You can imagine, it's a delicate subject


I think it's definitely possible to gain weight between wash cycles when living in America.


That's happening to me too! My trousers are selectively shrinking around the waistband, it's weird...


Just come with a tape-style tape measure,

should remove the delicacy.


funny & true :-)


do you dry on high or wash on hot? I'd recommend low and cold.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: