I am an experienced programmer, and just recently started using ChatGPT and Cursor to help me code. Some things it does like magic, and it's hard to say what n-fold improvement there is. I'll put the lower limit on 3x and an upper, on certain tasks, at 20x.
The project I am currently working on is took me about 16 hours to get to an MVP. A hobby project of similar size I did a few years ago took me about 80 hours. A lot of the work is NOT coding work that an LLM can help me with.
10x over everything is overstating it. However, if I can take my star developer who is already delivering significantly more value per dollar than my average guys and 3x him, that's quite a boost.
I was doing a short term contract over the summer porting some Verilog code to a newer FPGA. The Verilog code wasn't well documented and had been written over a period of about 20 years. There were no testbenches. Since I was making some requested changes to the design I wanted to be able to simulate it which requires creating a testbench. Verilog isn't my "native" HDL, I've done a lot more VHDL so I figured I'd see if an LLM could generate some of the boilerplate code required for a testbench (hooking up wires to the component under test, setting up clocks, etc.) I was using DeepSeek Coder for this - to my surprise it generated a pretty decent first-cut testbench, it even collected repeated code into verilog tasks and it figured out that there were some registers in the design that could be written to and read from. The testbench compiled the first time which really surprised me. There were some issues - for example, you can't write to certain registers in the design until you've set a particular bit in a particular register and it hadn't done that, but I was able to debug this pretty quickly in simulation. I figure that it saved me most of a day's work getting the testbench setup and the design simulating. My expectations were definitely exceeded.
I love working with Cursor/GenAI code, but there's one claim that I find repeated all the time that I found simply to be false - that junior/bad programmers put out tons of garbage code using GenAI assistants, that seniors later have to clean up.
In my experience (mostly with Claude) generated code tends to be clean, nice looking, but ranges from somewhat flawed to totally busted. It's not ugly, messy code that works, but clean, nice code that doesn't. An inexperienced programmer would probably have no hope of fixing it up.
What I've seen is that the output is cosmetically clean code, but the design itself is messy.
Probably the most annoying example of this is emitting one-off boilerplate code when an existing library function should have been used. For example AI-generated code loves to bypass facades and bulkheads, which degrades efforts to keep the system loosely coupled and maintainable.
Just yesterday I was cleaning up some Copilot-generated code that had a bunch of call-site logic that completely duplicated the functionality of an existing method on the object it was working with. I can guarantee you that I spent a lot more time figuring out what that code was doing than its author saved by using Copilot instead of learning how the class they were interacting with works.
> that junior/bad programmers put out tons of garbage code
That's what code review and mentoring are for. Don't blame tools, as a senior it's your role to pull the rest the team up. And in terms of process, it's ok to reject a PR or to delay it, business deadlines are not EVERYTHING. As a senior you're also in charge of what gets merged and when. You're the one responsible for the health of the codebase and tech debt.
I think that this actually gets to the crux of the problem.
On a high functioning team working on a brownfield project, the bottleneck typically isn't churning out code, it's quality control activities such as code review. The larger masses of lower-quality code that people are able to produce with Copilot make that problem worse. If the seniors are doing their jobs well then AI-generated code actually slows things down in more mature codebases, because the AI-generated code tends to take longer to review, which means that the rate of code getting into production slows down even as the rate at which developers can write it in the first place increases.
I would bet a lot of money that the people who claim huge productivity improvements with tools like Copilot are mostly self-reporting their own experience with solo and greenfield projects. That's certainly strongly implied by the bulk of positive anecdotes that include sufficient detail to be able to hazard a guess. It's also consistent with the studies we're seeing.
I was explaining it to my wife yesterday. The code is clean, but occasionally the changes are in a completely wrong file. Or sometimes it changes something that is completely irrelevant.
It's as if I had a decent programmer who occasionally loses his mind and does something non-sensical.
This is dangerous because it's easy to get lulled into a false sense of security. Most of the code looks good, but you have to be very vigilant.
It's the same with when LLMs generate writing. It sounds good, is grammatically correct, but sometimes, it's plain false.
You've got to consider second order effects, though.
Where I'm working the star developers don't have a great culture of being conscientious about writing code their non-star colleagues can read and modify without too much trouble. (In fairness, I've worked exactly one place where that wasn't the case, and that was only because avoiding this problem was a personal vendetta for the CTO.)
We gave them Copilot and it only compounded the problem. They started churning out code that's damn near unreadable at a furious pace. And that pushed the productivity of the people who are responsible for supporting and maintaining their code down toward zero.
If you're only looking at how quickly people clear tickets, it looks like Copilot was an amazing productivity boost. But that's not a full accounting of the situation. A perspective that values teamwork and sustainability over army-of-one heroism and instant gratification can easily lead to the opposite conclusion.
> Where I'm working the star developers don't have a great culture of being conscientious about writing code their non-star colleagues can read and modify without too much trouble.
I'd argue that you're "starring" the wrong people, then.
I've known and worked with developers that are mind-boggling productive - meaning they solve problems very quickly and consistently. Usually, that's at the expense of maintainability.
I've also known and worked with developers whose raw output is much lower, but significantly higher quality. It may take them a week to solve the problem the other group can get out the door in half a day - but there are solid tests, great documentation, they effectively communicate the changes to the group, and six months from now when something breaks anyone can go in and quickly fix it because the problem is isolated and easy to adapt.
I try to be part of the second group; I'd rather get six story points done in a sprint and have them done _right_ than knock out 15 points and kicking the can down the road.
I've known exactly one developer who was in both groups - i.e., they are incredibly productive and produce top-tier output at the same time. He was 19 when I met him which would make him 27 or so now. At the time my assumption was that his pace wasn't sustainable over the long term. I should look him up again and see how he's doing these days...
IMO, if your star developers aren’t writing readable, maintainable code (at least most of the time, some things are inherently complex), I don’t think it’s right to call them star developers.
FWIW, the problem you’re talking about isn’t one I’ve encountered very often at most of the companies I’ve worked at, and I’ve worked at small startups and large enterprises. I genuinely wonder why that is.
The star programmer was not a mature programmer. They were proficient and smart. They were incapable of working with a team, because it was difficult to collaborate with them because their code was inaccessible.
A principal engineer is not someone who generates the most working code. It's someone who moves the net productivity and product impact of the whole team forward.
Ugh I am dealing with an amazingly productive platform team who churn out so much stuff that the product teams just can’t keep on top of all the changes to tools and tech.
That one team is super productive at the cost of everyone else grinding to a halt.
Some of the coolest demos from my Lean Six Sigma training were all about demonstrating that oftentimes the easiest way to increase the end-to-end throughput of your value chain is to find the team member with the greatest personal productivity and force them to slow down.
You don't necessarily even have to do anything more than that. Just impose the rate limit on them and they'll automatically and unconsciously stop doing all sorts of little things - mostly various flavors of corner cutting - that make life harder for everyone around them.
They can get you 95% of the way there, the question is whether fixing the last 5% will take more time than doing all of it yourself. I presume this depends on your familiarity with the domain.
I think I agree. It's maybe 2x for me after the project has been solidly established. However, it's very useful to have the AI during the beginning, when you're writing a lot of boilerplate or integrating with APIs that you don't know super well. The AI can surprise you and connect some dots.
The lower bound is certainly lower than 3x. I do very domain-specific work (in Rust/Python) and usually the output is useless and ignoring the garbage suggestions can have a higher mental load than not using it. It does work well for boilerplate though and since boilerplate is the most annoying code to write, I'll tolerate it.
I agree that the upper bound is pretty high, with more straightforward MVPs, it can generate large swabs of code for you from a description. Also for little DSLs that I haven't memorized, generating a pyproject.toml from a bunch of constraints is so much nicer/quicker than reading the docs again every few months.
And it's not just the speed. The mental energy I save from having AI do some work vs. me doing it is allowing me to feel better, have mental energy for different things, be more productive, or just have some rest.
The project I am currently working on is took me about 16 hours to get to an MVP. A hobby project of similar size I did a few years ago took me about 80 hours. A lot of the work is NOT coding work that an LLM can help me with.
10x over everything is overstating it. However, if I can take my star developer who is already delivering significantly more value per dollar than my average guys and 3x him, that's quite a boost.