Can we get a video of a workday conducted by these people?
Unless there's a significant sense of what people are working on, and how LLMs are helping -- there's no point engaging -- there's no detail here.
Sure, if your job is to turn out tweaks to a wordpress theme, presumably that's now 10x faster. If its to work on a new in-house electric motor in C for some machine, presumably that's almost entirely unaffected.
No doubt junior web programmers working on a task backlog, specifically designed for being easy for juniors, are loving LLMs.
I use LLMs all the time, but each non-trivial programming project that has to move out of draft-stage needs rewriting. In several cases, to such a degree that the LLM was a net impediment.
Not exactly what you're asking for, but https://news.ycombinator.com/item?id=44159166 from today is not a junior web programmer working through the backlog, and the commit history contains all the prompts.
Sure, thanks. I mean it's a typescript OAuth library, so perhaps we might say mid-level web programmer developing a library from scratch with excellent pre-existing references, and with a known good reference API to hand. I'd also count that as a good use case for an LLM.
I watched someone do a similar demonstration( https://news.ycombinator.com/item?id=44159166)live at an event. They ended up doing something like 3 pull requests to get the original change. Then had to do 4 more to get it to fix and put back things it removed. Not exactly efficient, and it was painful to sit there and be like I could have had it done manually 20x by now while we painfully waited for the AI to do the changes.
I've never been able to get it to work reliably myself either.
The internet just tells me to prompt harder. Lots of "grind-set" mentality energy around AI if you ask me. Very little substance.
I think you mistook my comment. Insofar as its anything, it was a concession to that use case.
I gave an example below: debugging a microservice-based request flow from a front-end, thru various middle layers and services, to a back-end, perhaps triggering other systems along the way. Something similar to what I worked on in 2012 for the UK olympics.
Unless I'm mistaken, and happy to be, I'm not sure where the LLM is supposed to offer a significant productivity factor here.
Overall, my point is -- indeed -- that we cannot really have good faith conversations in blog posts and comment sections. These are empirical questions which need substantial evidence from both sides -- ideally, videos of a workday.
Its very hard to guess what anyone is really talking about at the level of abstraction that all this hot air is conducted at.
As far as i can tell the people hyping LLMs the most are juniors, data scientists who do not do software engineering, and people working on greenfield/blank-page apps.
These groups never address the demand from these sceptical senior software engineers -- for obvious reasons.
I recently had to do something similar and pasted new relic logs for a trace id and copy pasted the swagger pages for all the microservices involved, and in round two the code for the service it suspected, and Gemini helped me debug without even putting my full head into the problem.
Most of my day job is worrying about the correctness of compiler optimizations. LLMs frequently can't even accurately summarize the language manual (especially on the level of detail I need).
I have done everything from architecture design for a DSP (Qualcomm), to training models that render photos on Pixel phones, to redoing Instagrams comments ranking system. I can't imaging doing anything without LLMs today, they would have made me much more productive at all of those things, whether it be Verilog, C++, python, ML, etc. I use them constantly now.
I use LLMs frequently also. But my point is, with respect to the scepticism from some engineers -- that we need to know what people are working on.
You list what look like quite greenfield projects, very self-contained, and very data science oriented. These are quite significantly uncharacteristic of software engineering in the large. They have nothing to do with interacting systems each with 100,000s lines of code.
Software engineers working on large systems (eg., many micro-services, data integration layers, etc.) are working on very different problems. Debugging a microservice system isn't something an LLM can do -- it has no ability, e.g., to trace a request through various apis from, eg., a front-end into a backend layer, into some db, to be transfered to some other db etc.
This was all common enough stuff for software engineers 20 years ago, and was part of some of my first jobs.
A very large amount of this pollyanna-LLM view, which isnt by jnr software engineers, is by data scientists who are extremely unfamiliar with software engineering.
Every codebase I listed was over 10 years old and had millions of lines of code. Instagram is probably the world's largest and most used python codebase, and the camera software I worked on was 13 years old and had millions of lines of c++ and Java. I haven't worked on many self contained things in my career.
LLMs can help with these things if you know how to use them.
Tbf, you're also the CTO of a startup selling AI tools and saying in a nonspecific way that you're sure LLMs would have been helpful on large code bases you worked on years ago. Maybe so, but not at all what they were asking for in the root comment
OK, great. All I'm saying is until we really have videos (or equivalent empirical analysis) of these use cases, it's hard to assess these claims.
Jobs comprise different tasks, some more amenable to LLMs than others. My view is that where scepticism exists amongst professional senior engineers, its probably well-founded and grounded in the kinds of tasks that they are engaged with.
I'd imagine everyone in the debate is using LLMs to some degree; and that it's mostly about what productivity factor we imagine exists.
The article you’re replying to is just one of many examples of people who profess to be productive with these tools, but are spending significant time and energy attempting to convince skeptics to use them.
You are missing the point. The person I replied to is complaining about the lack of empirical analysis. There is no empirical analysis in the article. Subjective blog articles are not scientific studies.
Someone makes a blog post with the sole purpose of convincing AI skeptics to use AI and wants so badly to convince people to use AI that he even resorts to calling AI skeptics insane.
Someone else responds that video of the author actually using the tools would be more convincing.
Then you respond with essentially “no one wants to convince you and they’re too busy to try”.
Now if you misspoke and you’d like to change what you said originally to “many AI users do want to convince AI skeptics to use AI, but they only have enough time to write blog posts not publish any more convincing evidence”, then sure that could be the case.
But that ain’t what you said. And there’s no way to interpret what you said that way.
The commenter you replied to was specifically talking about AI proponents who do in fact want to convince people to use AI and have spent countless hours trying to do so. The OP was highlighting a more convincing way of doing this.
If someone says “people who have a stated intention to accomplish X could do it better by doing Y”, and you respond with a generalization that “most people don’t want to accomplish X” that is nonsensical. If your comment is a generalization it makes even less sense.
Also we aren’t talking about individual developers publishing studies. The OP is asking for a livestream of a coding session. When we are talking about an blog from a well funded company or articles from multi billion dollar companies, or even just startup founders, a video is hardly a high bar.
> it has no ability, e.g., to trace a request through various apis
That's more a function of your tooling more than of your LLM. If you provide your LLM with tool use facilities to do that querying, i don't see the reason why it can't go off and perform that investigation - but i haven't tried it yet, off the back of this comment though, it's now high on my todo list. I'm curious.
TFA covers a similar case:
>> But I’ve been first responder on an incident and fed 4o — not o4-mini, 4o —
log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we’ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.
For the first 10 years of my career I was a contractor walking into national and multinational orgs with large existing codebases, working within pre-existing systems not merely "codebases". Both hardware systems (e.g., new 4g networking devices just as they were released) and distributed software systems.
I can think of many daily tasks I had across these roles that would not be very significantly speed-up by an LLM. I can also see that there's a few that would be. I also shudder to think what time would be wasted by me trying to learn 4g networking from LLM summarisation of new docs; and spending as much time working from improperly summarised code (etc.).
I don't think snr software engineers are so scepticial here that they're saying LLMs are not, locally, helpful to their jobs. The issue is how local this help seems to be.
I worked on debugging modem software at Qualcomm in 2011, also prerelease 4G networking. I believe that LLMs would have dramatically improved my productivity across nearly all tasks involved (if they would allow me to use an LLM from inside the faraday cage).
I write embedded firmware for wireless mesh networks and satcom. Blend of Rust and C.
I spent ~4 months using Copilot last year for hobby projects, and it was a pretty disappointing experience. At its best, it was IntelliSense but slower. At its worst, it was trying to inject 30 lines of useless BS.
I only realized there was an "agent" in VS Code because they hijacked my ctrl+i shortcut in a recent update. You can't point it at a private API without doing some GitHub org-level nonsense. As far as my job is concerned, it's a non-feature until you can point it your own API without jumping through hoops.
You used one AI tool that was never more than autocomplete a year ago and you think you have a full hold of all that AI offers today? That's like reviewing thai food when you've only had Chinese food.
>you think you have a full hold of all that AI offers today?
I absolutely don't, and I'd love if you could highlight a spot where I suggested I was. As I said, the problem isn't that I don't want to try using an agent, the problem is that I can't because one incredibly basic feature is missing from VS Code's agent thing.
I'll occasionally use chatbots, mostly for spitballing non-professional stuff. They seem to do well with ideation questions like "I'm making X, what are some approaches I could take to do Y?" In other words, I've found that they're good at bullshitting and making lists. I like R1-1776, but that's only because Perplexity Playground seems less restricted than some of the other chatbots.
It's also nice for generating some boilerplate bash stuff, when I need that kind of thing. I don't need that very often, though.
Unless there's a significant sense of what people are working on, and how LLMs are helping -- there's no point engaging -- there's no detail here.
Sure, if your job is to turn out tweaks to a wordpress theme, presumably that's now 10x faster. If its to work on a new in-house electric motor in C for some machine, presumably that's almost entirely unaffected.
No doubt junior web programmers working on a task backlog, specifically designed for being easy for juniors, are loving LLMs.
I use LLMs all the time, but each non-trivial programming project that has to move out of draft-stage needs rewriting. In several cases, to such a degree that the LLM was a net impediment.