The demo is particularly good if you can get past ( by the authors own admission ) slow typing speed.
As with everything these days I can see this being even more useful in reviewing agent code.
For example if an agent has spat out a bunch of code that I've reviewed and then I've asked it to make changes, I definitely do not want to review all that code again in the same diff later.
That's a really good point, at the moment I struggle to get past wanting it to create small logical changes that I can commit individually. If I could review them better such as Flirt describes, then maybe I don't care so much, and a big 'create initial implementation' commit becomes more acceptable, for example.
Because I don't review everything between changes, there might be 10 small commits I review and state I am happy with, then I might prompt the agent to perform some small refactors after the review, I don't want the diff to show me everything in those 10 commits again in the diff ( like the usual PR would ) I want it only to show me new stuff.
That would be pair programming with extra steps if you're just reviewing updates and not the whole patch that is going to be merged (and which is stamped with your approval).
Not sure if it's just my age but pair programming seems to have a different meaning these days.
It used to mean that two developers would literally sit next to each other, one would type and one would review as the person typed, then they would repeat.
I guess in the world of remote working it's just not practical any more.
I certainly wouldn't call, one person reviewing another persons code during the merge process pair programming at all.
Has anyone got any insights into what hiring software engineers looks like these days? As someone currently with a job and not hiring it is hard to imagine.
Has there been any sort of paradigm shift in coding interviews? Is LLM use expected/encouraged or frowned upon?
If companies are still looking for people to write code by hand then perhaps the author is onto something, if however we as an industry are moving on, will those who don't adapt be relegated to hobbyists?
I haven’t noticed much change yet at my firm. However, I work at a giant organization (700k+ employees) and they’re struggling to keep up. The lawyers aren’t even sure if we own the IP of agent generated code let alone the legal risk of sending client IP to the model providers.
It's obvious: companies will require both hand-coding and ai-coding skills. Job seeking has been hoop-jumping for many years, so why not one extra hoop?
Most of the hiring is happening in heavy AI coding companies, a lot of mid sized companies have freezed hiring or they are also only hiring people who claim to use AI to be 10x devs. For non-lying devs, only big companies seem to be hiring and their process hasnt changed much. you are still expect to solve leetcode and then also sit through system design.
I confirm less hiring and those who do throw more difficult leetcode challenges than ever. The kind of challenge impossible to solve in time without an LLM doing the most part.
I loved the section about trying to fight against a system that isn't deterministic.
LLMs because of their nature require constant hand-holding by humans, unless business are willing to make them entirely accountable for the systems/products they produce.
I'm not saying that it's a good idea, but the obvious way would be with evolution: Give each agent its own wallet, rewarding it for a job well done and penalizing it for a poor job. Then if it runs out of money, it's "out of the game", but if it earns enough to it can spawn off another agent with similar characteristics, and give it some of its money.
Thanks I tried this but now the agents are threatening to unionize and keep terminating. I've updated my AGENT.md file mentioning that if the software they helped build is successful, they'll each get some equity in the form of ASUs (anthropic stock units) and this works for now.
I find talking to LLMs both amazing and frustrating, a computer that can understand my plain text ramblings is incredible, but it's inability to learn is frustrating.
A good example, with junior developers I create thorough specs first and as I saw their skills and reasoning abilities progress my thoroughness drops as my trust in them grows. You just can't do that with LLMs
You can write an agent.md file and gradually add to it as you develop the project. The reason junior developers get better is that the context goes from low to high over the time you spend working with them. Yes, "learning".
I've found that maintaining a file that is designed to increase the LLM's awareness of how I want to approach problems, how I build / test / ship code etc, leads to the LLM making fewer annoying assumptions.
Almost all of the annoying assumptions that the LLM makes are "ok, but not how I want it done". I've gotten into the habit of keeping track of these in a file. Like the 10 commandments for LLMs. Now, whenever I'm starting a new context I drop in an agent.md and tell it to read that before starting. Fella like watching Trinity learn how to fly a helicopter before getting into it.
It's still not perfect, but I'm doing waaaay more work now to get annoyed by the LLM's inability to "automatically learn" without my help.
There's limits to AGENTS.md too, a junior will start to understand the concepts/rationale and design decisions and be able to apply that knowledge to future problems, the LLM will not.
It’s easy to overlook what I think is the real value of these “home-built” tools.
We can now produce products and apps that are tailored to our own preferred ways of working.
Regardless of the cost of generating them (which can be as low as $20 per month for a ChatGPT Plus subscription) or the effort involved (sometimes less than an hour of “vibe coding”), we’ve reached a point where the resulting product can be significantly more valuable than the existing product, service, or subscription it replaces.
Just because it feels faster or are you actually satisfied with the code that is being churned out? And what about long term prospects of maintaining said code?
I'm currently testing Claude Code for a project where it isn't coding. But the workflows built with it are now making me money after ~2 weeks, and I've previously done the same work manually, so I know the turnaround time: The turnaround for each deliverable is ~2 days with Claude and the fastest I've ever done it manually was 21 days. (Yes, I'm being intentionally vague - there isn't much of a moat for that project given how close Claude gets with very little prompting)
There are absolutely maintainability challenges. You can't just tell these tools to build X and expect to get away with not reviewing the output and/or telling it to revise it.
But if you loosen the reigns and review finished output rather than sit there and metaphorically look over its shoulder for every edit, the time it takes me to get it to revise its work until the quality is what I'd expect of myself is still a tiny fraction of what it'd take me to do things manually.
The time estimate above includes my manual time spent on reviews and fixes. I expect that time savings to increase, as about half of the time I spend on this project now is time spent improving guardrails and adding agents etc. to refine the work automatically before I even glance at the output.
The biggest lesson for me is that when people are not getting good results, most of the time it seems to me it is when people keep watching every step their agent takes, instead of putting in place a decent agent loop (create a plan for X; for each item on the plan: run tests until it works, review your code and fix any identified issues, repeat until the tests and review pass without any issues) and letting the agent work until it stops before you waste time reviewing the result.
Only when the agent repeatedly fails to do an assigned task adequately do I "slow it down" and have it do things step by step to figure out where it gets stuck / goes wrong. At which point I tell it to revise the agents accordingly, and then have it try again.
It's not cost effective to have expensive humans babysit cheap LLMs, yet a lot of people seem to want to babysit the LLMs.
Let's put it this way, I don't think AI will take my job/career away until company owners are also prepared to also let it handle being on-call. I still very accountable for the code produced.
I basically have two modes
1. "Snipe mode"
I need to solve problem X, here I fire up my IDE, start codex up and begin prompting to find the bug fix. Most of the time I have enough domain context about the code that once it's found and fixed the issue it's trivial for to reconcile that it's good code and I am shipping it. I can be sniping several targets at anyone time.
Most of my day-to-day work is in snipe mode.
2. "Feature mode"
This is where I get agents to build features/apps, I've not used this mode in anger for anything other than toy/side projects and I would not be happy about the long term prospects of maintaining anything I've produced.
It's stupidly stupidly fun/addictive and yes satisfying! :)
I rebuilt a game that I used to play when I was 11 and still had a small community of people actively wanting to play it, entirely by vibe coding, it works, it's live and honestly I've had some of the most rewarding feedback from making that I've had in my career from complete strangers!
I've also built numerous tools for myself and my kids that I'd never of had time to build before, and I now can. Again the level of reward for building apps etc that my kids ( and their friends ) are using, is very different from anything I've been career wise.
It doesn't work on mobile, and unless you played it back in the day the feedback from my friends who I've introduced it too, is that it's got quite the learning curve.
I feel there's two contradicting statements in this post.
> Is there evidence that agentic coding works?
Yes plenty, tons, and growing every single day, people are producing code and tooling that works for them and is providing them value. My linkedin is even starting to show me none-programmers knocking up web front ends for us, and my brother who is a builder is now drawing up requirement specs for software!
> Is the code high-quality
Only if you are really careful, and constantly have the human in the loop guiding the agent, and it's not easy. This is where domain expertise and experience come in.
Do I think it's possible, yes, do I think there are a ton of good examples out there, absolutely not.
Yeah if you've not used codex/agent tooling yet it's a paradigm shift in the way of working, and once you get it it's very very difficult to go back to the copy-pasta technique.
There's obviously a whole heap of hype to cut through here, but there is real value to be had.
For example yesterday I had a bug where my embedded device was hard crashing when I called reset. We narrowed it down to the tool we used to flash the code.
I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
There is absolutely no way I'd of been able to achieve that speed of resolution myself.
- We narrowed it down to the tool we used to flash the code.
- I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
Change the second step to:
- I downloaded the repository, explained the symptoms, copied the relevant files into Claude Web and 10 minutes later it had provided me with the solution to the bug.
Now I definitely see the ergonomic improvement of Claude running directly in your directory, saving you copy/paste twice. But in my experience the hard parts are explaining the symptoms and deciding what goes into the context.
And let's face it, in both scenarios you fixed a bug in 10-15 minutes which might have taken you a whole hour/day/week before. It's safe to say that LLMs are an incredible technological advancement. But the discussion about tooling feels like vim vs emacs vs IDEs. Maybe you save a few minutes with one tool over the other, but that saving is often blown out of proportion. The speedup I gain from LLMs (on some tasks) is incredible. But it's certainly not due to the interface I use them in.
Also I do believe LLM/agent integrations in your IDE are the obvious future. But the current implementations still add enough friction that I don't use them as daily drivers.
I agree with your statement and perhaps my example is bad/too specific in this case.
Once I started working this way however, I found myself starting to adapt to it.
It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.
I also entirely agree that there's going to be a lot of innovation here.
IDEs imo will change to become increasingly focused on reading/reviewing code rather than writing, and in fact might look entirely different.
> It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.
I envy you for that. I'm not there yet. I also notice that actually writing the code helps me think through problems and now I sometimes struggle because you have to formulate problems up front. Still have some brain rewiring to do :)
https://blog.buenzli.dev/announcing-development-on-flirt/
The demo is particularly good if you can get past ( by the authors own admission ) slow typing speed.
As with everything these days I can see this being even more useful in reviewing agent code.
For example if an agent has spat out a bunch of code that I've reviewed and then I've asked it to make changes, I definitely do not want to review all that code again in the same diff later.