"Despite this plethora of negative experiences, executives are aggressively mandating the use of AI6. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.
Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.
But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down."
The counterpoint to this is that _SOME_ people are able to achieve force multiplication (even at the highest levels of skill, it's not just a juniors-only phenomenon), and _THAT_ is what is driving management adoption mandates. They see that 2-4x increases in productivity are possible under the correct circumstances, and they're basically passing down mandates for the rank and file to get with the program and figure out how to reproduce those results, or find another job.
> even at the highest levels of skill, it's not just a juniors-only phenomenon
AI has the most effect for people with less experience or low performance. It has less of an effect for people on the high end. It is indeed closing the skill gap and it does so by elevating the lower side of it.
This is important to know because it helps explain why people react as they do. Those who feel the most lift will be vocal about AI being good while those that don't are confused by anyone thinking AI is helpful at all.
It is not common for people on the high skill side to experience a big lift except for when they use AI for the tedious stuff that they don't really want to do. This is a sweetspot because all of the competence is there, but the willingness to do the work is not.
I have heard Dr Lilach Mollick, dir of Pedagogy at Wharton, say this has been shown numerous times. People who follow her husband, Ethan, are probably aware already.
> It is indeed closing the skill gap and it does so by elevating the lower side of it.
That's my "criticism", it's not closing the skill gap. Your skills haven't change, your output has. If you're using AI conservatively I'd say you're right, it can remove all the tedious work, which is great, but you'll still need to check that it's correct.
I'm more and more coming to the idea that for e.g. some coding jobs, CoPilot, Claude, whatever can be pretty helpful. I don't need to write a generic API call, handle all the error codes and hook up messages to the user, the robot can do that. I'll check and validate the code anyway.
Where I'm still not convinced is for communicating with other humans. Writing is hard, communication is worse. If you still have basic spelling errors, after decades of using a spellchecker, I doubt that your ability to communicate clearly will change even with an LLM helping you.
Same with arts. If you can't draw, no amount of prompting is going to change that. Which is fine, if you only care about the output, but you still don't have the skills.
My concern is the uncritical application of LLMs to all aspects of peoples daily life. If you can use an LLM to do your job faster, fine. If you can't do it without an LLM, you shouldn't be doing it with one.
"If you can use an LLM to do your job faster, fine. If you can't do it without an LLM, you shouldn't be doing it with one."
This.
People need to take responsability for what they produce. It's too easy and especially irresponsible to delegate blindly everything to AI.
> I have heard Dr Lilach Mollick, dir of Pedagogy at Wharton, say this has been shown numerous times. People who follow her husband, Ethan, are probably aware already.
I'd be curious to see the sources.
Basically every study I have ever read making some claim about programming (efficacy of IDEs, TDD, static typing, pair programming, formal CS education, ai assistants, etc...) has been a house of cards that falls apart with even modest prodding. They are usually premised on one or more inherently flawed metrics like number of github issues, LoC or whatever. That would be somewhat forgivable since there are not really any good metrics to go on, but then the studies invariably make only a perfunctory effort to disentangle even the most obvious of confounding variables, making all the results kind of pointless.
Would be happy if anyone here knew of good papers that would change my mind on this point.
> Basically every study I have ever read making some claim about programming (efficacy of IDEs, TDD, static typing, pair programming, formal CS education, ai assistants, etc...) has been a house of cards that falls apart with even modest prodding.
Isn't this true about most things in software? I mean is there anything quantifiable about "microservices" vs "monolith"? Test-driven development, containers, whatever?
I mean all of these things are in some way good, in some contexts, but it seems impossible to quantify benefits of any of them. I'm a believer that most decisions made in software are somewhat arbitrary, driven by trends and popularity and it seems like little effort is expended to come to overarching, data-backed results. When they are, they're rare and, like you said, fall apart under investigation or scrutiny.
I've always found this strange about software in general. Even every time there's a "we rewrote $THING in $NEW_LANG" and it improved memory use/speed/latency whatever, there's a chorus of (totally legitimate) criticism and inquiry about how things were measured, what attempts were made to optimize the original solutions, if changes were made along the way outside of the language choice that impacted performance etc etc.
To be clear I am not arguing that tools and practices like TDD, microservices, ai assistants, and so on have no effect. They almost certainly have an effect (good or bad).
It’s just the unfortunate reality that quantitatively measuring these effects in a meaningful way seems to basically be impossible (or at least I’ve never see it done). With enough resources I can’t think of any reason it shouldn’t be possible, but apparently those resources are not available because there are no good studies on these topics.
Thus my skepticism of the “studies” referenced earlier in the thread.
So, basically, you think all the pro-AI folks are "bad," and defensive because they feel like anti-AI folks are attacking the thing that makes them not bad? Hard to want to engage with that.
I'll bite anyhow.
AI is very, very good at working at short length-scales. It tends to be worse at working at longer length-scales (Gemini is a bit of an outlier here but even so, it holds). People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this, and know how to decompose challenging long length-scale problems into a number of smaller short-length scale problems efficiently. This isomorphic transform allows AI to tackle the original problem in a way that it's maximally efficient at, thus side-stepping their inherent weaknesses.
You can think of this sort of like mathematical transformations that make data analysis easier.
>So, basically, you think all the pro-AI folks are "bad," and defensive because they feel like anti-AI folks are attacking the thing that makes them not bad?
I break the pro-AI crowd into 3 main categories and 2 sub categories:
1. those who don't really know how to code, but AI lets them output something more than what they could do on their own. This seems to be what the GP is focused on
2. The ones who can code but are financially invested to hype up the bubble. Fairly self explanatory; the market is rough and if you're getting paid the big bucks to evangelize, it's clear where the interests lie.
3. Executives and product teams that have no actual engagement with AI, but know bringing it up excites investors. a hybrid of 1 and 2, but they aren't necessarily pretending they use it themselves. It's the latest means to an end (the end being money).
and then the smaller sects:
1. those who genuinely feel AI is the future and is simply prepping for it and trying to adapt their workflow and knowledged based around it. They may feel it can already replace people, or may feel it's a while out but progressing that way. These are probable the most honest party, but I personally feel they miss a critical aspect: what is used currently as the backbone for AI may radically change by the time it is truly viable.
2. those who are across the spectrum of AI, but see it as a means to properly address the issue of copyright. If AI wins, they get their true goal of being able to yoink many more properties without regulations to worry about.
>People who are hyper-competent/elite-skill in their domain who achieve force multiplication with gen-AI understand this,
are their real examples of this? The main issue I see is that people seem to judge "competency" based on speed and output. But not on the quality, maintainability, nor conciseness of such output. If we just needed engineers to slap together something that "works", we could be "more productive".
Well, just to give you context on my position, because I don't feel I fit into any of those molds:
I was already a very high performer before AI, leading teams, aligning product vision and technical capabilities, architecting systems and implementing at top-of-stack velocity. I have been involved in engineering around AI/ML since 2008, so I have pretty good understanding of the complexities/inconsistencies of model behavior. When I observed the ability of GPT3.5 to often generate working (if poorly written, in general) code, I knew this was a powerful tool that would eventually totally reshape development once it matured, but that I had to understand its capabilities and non-uniform expertise boundary to take advantage of its strengths without having to suffer its weaknesses. I basically threw myself fully into mastering the "art" of using LLMs, both in terms of prompting and knowing when/how to use them, and while I saw immediate gains, it wasn't until Gemini Pro 2.5 that I saw the capabilities in place for a fully agentic workflow. I've been actively polishing my agentic workflow since Gemini 2.5's release, and now I'm at the point where I write less than 10% of my own code. Overall my hand written code is still significantly "neater/tighter" than that produced by LLMs, but I'm ok with the LLM nailing the high level patterns I outline and being "good enough" (which I encourage via detailed system prompts and enforce via code review, though I often have AI rewrite its own code given my feedback rather than manually edit it).
I liken it to assembly devs who could crush the compiler in performance (not as much of a thing now in general, but it used to be), who still choose to write most of the system in c/c++ and only implement the really hot loops in assembly because that's just the most efficient way to work.
> I was already a very high performer before AI, leading teams, aligning product vision and technical capabilities, architecting systems and implementing at top-of-stack velocity
Indeed, he did not list "out-of-touch suit-at-heart tech leads that failed upwards and have misplaced managerial ambitions" as a category, but that category certainly exists, and it drives me insane.
You're making a lot of baseless assumptions with your implication there, and sadly it says more about you than me.
You might find your professional development would stop being retarded if you got the chip off your shoulder and focused on delivering maximum value to the organizations that employ you in any way possible.
Baseless? You just told us in your own words what you do and how this makes you special. Do you know what baseless means? That's before we even touch on the incredible irony of you assuming I'm a professional failure despite knowing nothing about me.
But that's just how being a manager-man goes; so focused on self-aggrandizement and surrounding yourself with yes-men that you lost your edge. Hope you can cover up your incompetence until the next promotion is due, because if ever a rung breaks away from under your feet, it'll probably be a loong way down to a position that matches your actual skill level.
> Successful people aren't bitter, that's a trait that losers develop.
>So, basically, you think all the pro-AI folks are "bad," and defensive because they feel like anti-AI folks are attacking the thing that makes them not bad?
Thanks, I needed a laugh. I have indeed grown bitter over the years, but life has always has a way to make me smile in store.
Yeah, exactly, shorter files, good context for the AI agents, good rules for the AI agents how to navigate around the codebase, reuse functions, components, keep everything always perfectly organized, with no effort from the person themselves. It is truly amazing to witness. No people can match that ability to reuse and organize.
> AI has the most effect for people with less experience or low performance. It has less of an effect for people on the high end.
I actually think that it benefits high performance workers as AI can do a lot of heavy lifting that frees them to focus on things where their skills make a difference.
Also, for less skilled or less experienced developers, they will have a harder time spotting the mistakes and inconsistencies generated by AI. This can actually become a productivity sink.
the main issue is that you end up looking down the barrel of begging claude, for the fifth time this session, to do it right- or just do it yourself in half the total time you've wasted so far.
Typically, I've been asking it to do "heavy lifting" for me.
It generally generates defective code, but it doesn't really matter all that much, it is still useful that it is mostly right, and I only need to make a few adjustments. It saves me a lot of typing.
Would I pay for it? Probably not. But it is included in my IntelliJ subscription, so why not? It is there already.
With AI code I find it very easy to understand as I prompted it, I know what to expect, I know what it will likely do. Far easier than other people code.
If your argument is that agentic coding workflows will retard the skill development of junior and midrange engineers I'm not entirely in disagreement with you, that's a problem that has to be solved in relation to how we use gen AI across every domain.
Skill retardation is actually beyond the point. I'm largely raising a counter example for why the following (rough paraphrasing) is not sound: "SOME people figured out how to use these tools to go 2x to 4x faster, you do the same, or you're fired!".
Let's say "n" is the sum complexity of a system. While some developers can take an approach that yields a development output of: (1.5 * log n), the AI tools might have a development output of: (4 * log n)^4/n. That is, initially more & faster, but eventually a lot less and slower.
The parable of the soviet beef farmer comes to mind: In this parable, the USSR mandated its beef farmers increase beef output YoY by 20%, every year. The first year, the heroic farmer improved the health of their livestock, bought a few extra cows and hit their target. The next year, to achieve 20% YoY, the farmer cuts every corner and maximizes every efficiency, they even exchange all their possessions to buy some black market cows. The third year, the farmer can't make the 20% increase, they slaughter almost all of their herd. The fourth year, the farmer has essentially no herd, they can't come close to their last years output - let alone increase it. So far short of quota, the heroic beef farmer then shot himself.
(side-note: Which is also analagous to people not raising their skill levels too, but not my main point - I'm more thinking about how development slows down relative to the complexity and size of a software system. The 'not-increasing skills' angle is arguably there too. The main point is short term trade-offs to achieve goals rather than selecting long term and sustainable targets, and the relationship of those decisions to a blind demand to increase output)
So, instead of working on the insulation of the home, instead of upgrading the heating system, to heat the home faster we burn the furniture. It works.. to a point. Like, what happens when you run out of furniture, or the house catches fire? Seemingly that will be a problem for Q2 of next year, for now, we are moving faster!!
I think this ties into the programming industry quite heavily from the perspective where managers often want things to work just long enough for them to be promoted. Doesn't have to work well for years, doesn't have to have the support tools needed for that, nope - just long enough that they can get the quarterly reward and then move on to not worry about the support mess left behind. To boot too, the feedback cycle for whether something was a good idea in software or not is slow, oftentimes years. AI tools have not been out for a long time, just a couple years themselves, it'll be another few before we see what happens when a system is grown to 5M lines through mostly AI tooling and the codebase itself is 10 years old - will that system be too brittle to update?
FWIW, I'm of the point of view that quality, time and cost are not an iron triangle - it is not a choose two situation. Instead, quality is a requirement for low cost and low time. You cannot move quickly when quality is low (from my experience, the slowdown of low quality can manifest quickly too - on the order of hours. A shortcut taken now can reduce velocity just even later that same day).
Thus, mandates from management to move 2x to 4x faster, when it's not clear that AI tools actually deliver 2x to 4x benefits over the longer term (perhaps not even in the shorter term), feels a lot like the soviet beef farmer parable, or burning furniture to stay warm.
If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization. This is part of the learning curve of the tools IMO.
> If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
I find one of the biggest differences between junior engineers and seniors is they think differently about how complexity scales. Juniors don't think about it as much and do very well in small codebases where everything is quick. They do less well when the complexity grows and sometimes the codebase just simply falls over.
It's like billiards. A junior just tries to make the most straight forward shot and get a ball in the hole. A senior does the same, but they think about where they will leave the cue ball for the next shot, and they take the shot that leaves them in a good position to make the next one.
I don't see AI as being able to possess the skills that a senior would have to say "no, this previous pattern is no longer the way we to do things because it has stopped scaling well. We need to move all of these hardcoded values to database now and then approach the problem that way." AFAIK, AI is not capable of that at all, it's not capable of a key skill of a senior engineer. Thus, it can't build a system that scales well with respect to complexity because it is not forward thinking.
I'll posit as well that knowing how to change a system so that it scales better is an emergent property. It's impossible to do that architecture up front, therefore an AI must be able to say "gee, this is not going well anymore- we need to switch this up from hardcoded variables to a database - NOW; before we implement anything else." I don't know of any AI that is capable of that. I could agree that when that point is reached, and a human starts prompting on how to refactor the system (which is a sign the complexity was not managed well) - then it's possible to reduce the interest cost of outsized complexity by then using an AI to start managing the AI induced complexity...
>as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
You're assuming organizations are operating with the goal of quality and velocity in mind. We saw that WFH made people more productive, and had higher quality of life. companies are still trying to enforce RTO as we speak. The productivity was deemed not worth it compared to other factors like real estate, management ego, and punishing the few who abused the priveledge.
We're in weird times and sadly many companies have mature tech by now. They can afford to lose productivity if it helps make number go up.
> If your AI scaling statement is accurate then the problem will eventually solve itself as organizations that mandated AI usage will start to fall behind their non-AI mandating peers.
All things being equal, I would agree. Things are not equal though. The slow down can manifest as: needing more developers for the same productivity, lots of new projects to do things like "break the AI monolith into microservices", all the things that a company needs to do when growing from 50 employees to 200 employees. Having a magicly different architecture is kinda just a different reality, too much chaos to always say that one approach would really be different. One thing though, it does often take 2 to 5 years before knowing whether the chosen approach was 'bad' or not (and why).
Companies that are trying to scale - almost no two are alike. So it'll be difficult to do a peer-to-peer comparison, it won't be apples to apples (and if so, the sample size is absurdly small). Did architecture kill a company, or bad team cohesion? Did good team cohesion save the company despite bad architecture? Did AI slop wind up slowing things down so much that the company couldn't grow revenue? Very hard to make peer-to-peer comparisons when the problem space is so complex and chaotic.
It's also amazing what people and companies can do with just sheer stubbornness. Facebook has over (I hear) 1000+ engineers just for their mobile app.
> My experience so far is that if you architect your systems properly AI continues to scale very well with code base size. It's worth noting that the architecture to support sustained AI velocity improvement may not be the architecture that some human architects have previously grown comfortable with as their optimal architecture for human productivity in their organization
I fear this is the start of a no-true-scotsman argument. That aside, what is the largest code base size you have reached so far? Would you mind providing some/any insight into the architecture differences for an AI-first codebase? Are there any articles or blog posts that I could read? I'm very interested to learn more where certain good architectures are not good for AI tooling.
AI likes modular function grammars with consistent syntax and interfaces. In practice this means you want a monolithic service architecture or a thin function-as-a-service architecture with a monolithic imported function library. Microservices should be avoided if at all possible.
The goal there is to enable straightforward static analysis and dependency extraction. With all relevant functions and dependencies defined in a single codebase or importable module, you can reliably parse the code and determine exactly which parts need to be included in context for reasoning or code generation. LLMs are bad at reasoning across service boundaries, and even if you have OpenAPI definitions the language shift tends to confuse them (and I think they're just less well trained on OpenAPI specs than other common languages).
Additionally, to use LLMs for debugging you want to have access to a single logging stream, where they can see the original sources of the logging statements in context. If engineers have to collect logs from multiple locations and load them into context manually, and go repo hopping to find the places in the code emitting those logging statements, it kills iteration speed.
Finally, LLMs _LOVE_ good documentation even more than humans, because the humans usually have the advantage of having business/domain context from real world interactions and can use that to sort of contextually fumble their way through to an understanding of code, but AI doesn't have that, so that stuff needs to be made as explicit in the code as possible.
The largest individual repo under my purview currently is around 250k LoC, my experience (with Gemini at least) is that you can load up to about 10k LoC functionally into a model at a time, which should _USUALLY_ be enough to let you work even in huge repos, as long as you pre-summarize the various folders across the repo (I like to put a README.md in every non-trivial folder in a repo for this purpose). If you're writing pure, functional code as much as possible you can use signatures and summary docs for large swathes of the repo, combined with parsed code dependencies for stuff actively being worked on, and instruct the model to request to get full source for modules as needed, and it's actually pretty good about it.
I worked at a Indian IT services firm which even until mid-2010's didn't give internet access to people at work. Their argument was that use of internet would make the developers dumb.
The argument always was assume you had internet outage for days and had to code , you would value your skills then. Well guess what its been decades now, and I don't think that situation has ever come to pass, heck not even something close to that has come to pass.
Sometimes how you do things changes. During the peak of Perl craze, my team lead often told me people who didn't use C++ weren't as smart and eventually people who used Perl would have their thinking skills atrophy when Perl wouldn't be around. That doomsday scenarios hasn't happened either. People have similar things about Java, IDEs, package managers, docker etc etc.
Businesses don't even care about these things. A real estate developer wants to sell homes, their job is not to build web sites. So as long as working site is available, they don't care who builds it, you or AI. Make whatever of this you will.
I would love to see even a semi-scientific study where two companies are funded with the same amount of capital to build the same product, one extensively using AI tools and the other not. Then after some set amount of time, the resulting products are compared across measures of quality, profitability, customer satisfaction, and so on.
Hopefully the market forces will tell us that sooner or later. It might just take a while until the goldrush of VC money stops and the reality kicks back-in.
Code =/= Product should be kept in mind. That said, I do not have a hard position on the topic, though am certain about detrimental and generational skill atrophy.
I very much agree, and I think people who are in denial about the usefulness of these tools are in for a bad time.
I've seen this firsthand multiple times: people who really don't want it to work will (unconsciously or not) sabotage themselves by writing vague prompts or withholding context/tips they'd naturally give a human colleague.
Then when the LLM inevitably fails, they get their "gotcha!" moment.
I think the people who are in denial about the uselessness of these tools are in for a bad time.
I've been playing with language models for seven years now. I've even trained them from scratch. I'm playing with aider and I use the chats.
I give them lots of context and ask specific questions about things I know. They always get things wrong in subtle ways that make me not trust them for things I don't know. Sometimes they can point me to real documentation.
gemma3:4b on my laptop with aider can merge a diff in about twenty minutes of 4070 GPU time. incredible technology. truly groundbreaking.
call me in ten years if they figure out how to scale these things without just adding 10x compute for each 1x improvement.
I mean hell, the big improvements over the last year aren't even to do with learning. Agents are just systems code. RAG is better prompting. System prompts are just added context. call me when GPT 5 drops, and isn't an incremental improvement
Exactly. And if you consider AI to be the inevitable source of unprecedented productivity gains then this filtering of employees by success with/enthusiasm for AI makes sense.
10x developers are a thing, but it's very context dependent. Put John Carmack in a room with your average FAANG principle and ask them both to build a 3D game engine with some features above and beyond what you can get out of the box with OpenGL and I would be very surprised if John didn't 10x (or more) his competition.
AI as a force multiplier is also a thing, with well structured codebases one high level dev can guide 2-3 agents through implementing features simultaneously, and each of those agents is going to be outputting code faster than your average human. The human just needs to provide high level guidance on how the features are implemented, and coach the AI on how to get unstuck when they inevitably run into things they're unable to handle.
The mental model you want for force multiplication in this context is a human lead "managing" a team of AI developers.
I'm not going to bother, if I did you'd obviously keep moving goalposts and trying to weasel out of owning your error at all costs. It's not like there's anything you've demonstrated that makes me care what you think, so there's really no point in wasting my time trying to convince you of anything. I'll let your manager in ~3 years do it, when you get PIPd.
Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.
But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down."
The counterpoint to this is that _SOME_ people are able to achieve force multiplication (even at the highest levels of skill, it's not just a juniors-only phenomenon), and _THAT_ is what is driving management adoption mandates. They see that 2-4x increases in productivity are possible under the correct circumstances, and they're basically passing down mandates for the rank and file to get with the program and figure out how to reproduce those results, or find another job.