I've had little success with Agentic coding, and what success I have had has bee...

aschobel · 2025-07-29T14:43:33 1753800213

Bingo, it's magical but the learning curve is very very steep. The METR study on open-source productivity alluded to this a bit.

I am definitely at a point where I am more productive with it, but it took a bunch of effort.

haar · 2025-07-29T15:41:27 1753803687

Apologies if I was unclear.

The more I've used it, the more I've disliked how poor the results it's produced, and the more I've realised I would have been better served by doing it myself and following a methodical path for things that I didn't have experience with.

It's easier to step through a problem as I'm learning and making small changes than an LLM going "It's done, and production ready!" where it just straight up doesn't work for 101 different tiny reasons.

airspresso · 2025-07-30T09:03:27 1753866207

My preferred approach to avoid that outcome is to divide & conquer the problem. Ask the LLM to implement each small bit in the order you'd implement it yourself given what you know about the codebase.

devmor · 2025-07-29T15:18:40 1753802320

The subjects in the study you are referencing also believed that they were more productive with it. What metrics do you have to convince yourself you aren't under the same illusionary bias they were?

simonw · 2025-07-29T15:20:20 1753802420

Yesterday I used ffmpeg to extract the frame at the 13 second mark of a video out as a JPEG.

If I didn't have an LLM to figure that out for me I wouldn't have done it at all.

throwworhtthrow · 2025-07-29T15:58:30 1753804710

LLM's still give subpar results with ffmpeg. For example when I asked Sonnet to trim a long video with ffmpeg, it put the input file parameter before the start time parameter, which triggers an unnecessary decode of the video file. [1]

Sure, use the LLM to get over the initial hump. But ffmpeg's no exception to the rule that LLM's produce subpar code. It's worth spending a couple minutes reading the docs to understand what it did so you can do it better, and unassisted, next time.

[1] https://ffmpeg.org/ffmpeg.html#:~:text=ss%20position

CamperBob2 · 2025-07-29T16:27:09 1753806429

That says more about suboptimal design on ffmpeg's part than it does about the LLM. Most humans can't deal with ffmpeg command lines, so it's not surprising that the LLM misses a few tricks.

nottorp · 2025-07-29T17:13:38 1753809218

Had a LLM generate 3 lines of working C++ code that was "only" one order of magnitude slower than what i edited the code to in 10 minutes.

If you're happy with results like that, sure, LLMs miss "a few tricks"...

ben_w · 2025-07-29T17:56:27 1753811787

You don't have to leave LLM code alone, it's fine to change it — unless, I guess, you're doing some kind of LLM vibe-code-golfing?

But this does remind me of a previous co-worker. Wrote something to convert from a custom data store to a database, his version took 20 minutes on some inputs. Swore it couldn't possibly be improved. Obviously ridiculous because it didn't take 20 minutes to load from the old data store, nor to load from the new database. Over the next few hours of looking at very mediocre code, I realised it was doing an unnecessary O(n^2) check, confirmed with the CTO it wasn't business-critical, got rid of it, and the same conversion on the same data ran in something like 200ms.

Over a decade before LLMs.

nottorp · 2025-07-29T17:59:49 1753811989

We all do that, sometimes where it’s time critical sometimes where it isn’t.

But I keep being told “AI” is the second coming of Ahura Mazda so it shouldn’t do stuff like that right?

ben_w · 2025-07-29T19:47:39 1753818459

> Ahura Mazda

Niche reference, I like it.

But… I only hear of scammers who say, and psychosis sufferers who think, LLMs are *already* that competent.

Future AI? Sure, lots of sane-seeming people also think it could go far beyond us. Special purpose ones have in very narrow domains. But current LLMs are only good enough to be useful and potentially economically disruptive, they're not even close to wildly superhuman like Stockfish is.

CamperBob2 · 2025-07-29T20:49:52 1753822192

Sure. If you ask ChatGPT to play chess, it will put up an amateur-level effort at best. Stockfish will indeed wipe the floor with it. But what happens when you ask Stockfish to write a Space Invaders game?

ChatGPT will get better at chess over time. Stockfish will not get better at anything except chess. That's kind of a big difference.

ben_w · 2025-07-29T21:11:23 1753823483

> ChatGPT will get better at chess over time

Oddly, LLMs got worse at specifically chess: https://dynomight.net/chess/

But even to the general point, there's absolutely no agreement how much better the current architectures can ultimately get, nor how quickly they can get there.

Do they have potential for unbounded improvements, albeit at exponential cost for each linear incremental improvement? Or will they asymptomatically approach someone with 5 years experience, 10 years experience, a lifetime of experience, or a higher level than any human?

If I had to bet, I'd say current models have an asymptomatic growth converging to a merely "ok" performance; and separately claim that even if they're actually unbounded with exponential cost for linear returns, we can't afford the training cost needed to make them act like someone with even just 6 years professional experience in any given subject.

Which is still a lot. Especially as it would be acting like it had about as much experience in every other subject at the same time. Just… not a literal Ahura Mazda.

CamperBob2 · 2025-07-29T21:59:52 1753826392

If I had to bet, I'd say current models have an asymptomatic growth converging to a merely "ok" performance

(Shrug) People with actual money to spend are betting twelve figures that you're wrong.

Should be fun to watch it shake out from up here in the cheap seats.

ben_w · 2025-07-29T22:26:20 1753827980

Nah, trillion dollars is about right for "ok". Percentage point of the global economy in cost, automate 2 percent and get a huge margin. We literally set more than that on actual fire each year.

For "pretty good", it would be worth 14 figures, over two years. The global GDP is 14 figures. Even if this only automated 10% of the economy, it pays for itself after a decade.

For "Ahura Mazda", it would easily be worth 16 figures, what with that being the principal God and god of the sky in Zoroastrianism, and the only reason it stops at 16 is the implausibility of people staying organised for longer to get it done.

nottorp · 2025-07-30T09:04:34 1753866274

> People with actual money to spend are betting

... but those "people with actual money to spend" have burned money on fads before. Including on "AI", several times before the current hysterics.

If you're a good actor/psychologist, it's probably a good business model to figure out how to get VC money and how to justify your startup failing so they give you money for the next startup.

CamperBob2 · 2025-07-29T18:29:11 1753813751

"I'm taking this talking dog right back to the pound. It told me to short NVDA, and you should see the buffer overflow bugs in the C++ code it wrote. Totally overhyped. I don't get it."

nottorp · 2025-07-29T18:39:20 1753814360

"We hear you have been calling our deity a talking dog. Please enter the red door for reeducation."

dingnuts · 2025-07-29T15:31:05 1753803065

It is nice to use LLMs to generate ffmpeg commands, because those can be pretty tricky, but really, you wouldn't have just used the man page before?

That explains a lot about Django that the author is allergic to man pages lol

ben_w · 2025-07-29T17:49:20 1753811360

I remember when I was a kid, people asking a teacher how to spell a word, and the answer was generally "look it up in a dictionary"… which you can only do if you already have shortlist of possible spellings.

*nix man pages are the same: if you already know which tool can solve your problem, they're easy to use. But you have to already have a shortlist of tools that can solve your problem, before you even know which man pages to read.

adastra22 · 2025-07-29T23:29:08 1753831748

That’s what GNU info is for, of course.

aebtebeten · 2025-07-30T13:55:21 1753883721

man -k (or apropos)

ben_w · 2025-07-30T20:26:12 1753907172

`apropos` would itself be an example of a *nix tool that I didn't know existed and therefore wouldn't have known to find out more about.

simonw · 2025-07-29T15:33:03 1753803183

I just took a look, and the man page DOES explain how to do that!

... on line 3,218: https://gist.github.com/simonw/6fc05ea7392c5fb8a5621d65e0ed0...

(I am very confident I am not the only person who has been deterred by ffmpeg's legendarily complex command-line interface. I feel no shame about this at all.)

lexh · 2025-07-30T01:22:49 1753838569

To be a little more fair... that example is tidily slotted into the EXAMPLES section, under the heading "You can extract images from a video, or create a video from many images".

I don't think most people read the man pages top to bottom. And even if they did, then for as much grief as you're giving ffmpeg, llm has an even larger burden... no man page and the docs weigh in at over 8k lines.

I get the general point that ffmpeg is a powerful, complex tool... but this is a weird fight to pick.

simonw · 2025-07-30T01:29:14 1753838954

I could not be more confident that "ffmpeg is difficult to figure out" is not a weird fight to pick. It's notorious!

quesera · 2025-07-29T17:14:46 1753809286

Ffmpeg is genuinely complicated! And the CLI is convoluted (in justifiable, and unfortunate ways).

But if you approach ffmpeg from the perspective of "I know this is possible", you are always correct, and can almost always reach the "how" in a handful of minutes.

Whether that's worth it or not, will vary. :)

otabdeveloper4 · 2025-07-30T07:14:25 1753859665

The correct solution here would have been to feed the man page to an LLM summarizer.

Alas instead of correct and easy solutions to problems we are focused on sci-fi robot assitant bullshit.

devmor · 2025-07-29T15:26:12 1753802772

You wouldn't have just typed "extract frame at timestamp as jpeg ffmpeg" into Google and used the StackExchange result that comes up first that gives you a command to do exactly that?

simonw · 2025-07-29T15:29:43 1753802983

Before LLMs made ffmpeg no-longer-frustrating-to-use I genuinely didn't know that ffmpeg COULD do things like that.

devmor · 2025-07-29T18:38:50 1753814330

I'm not really sure what you're saying an LLM did in this case. Inspired a lost sense of curiosity?

simonw · 2025-07-29T19:45:19 1753818319

My general point is that people say things like "yeah, but this one study showed that programmers over-estimate the productivity gain they get from LLMs so how can you really be sure?"

Meanwhile I've spent the past two years constantly building and implementing things I never would have done because of the reduction in friction LLM assistance gives me.

I wrote about this first two years ago - AI-enhanced development makes me more ambitious with my projects - https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen... - when I realized I was hacking on things with tech like AppleScript and jq that I'd previously avoided.

It's hard to measure the productivity boost you get from "wouldn't have built that thing" to "actually built that thing".

aschobel · 2025-07-30T20:31:30 1753907490

"You can just do things".

Agreed on all fronts. jq and AppleScript are a total syntax mystery to me, but now I use them all the times since claude code has figured them out.

It's so powerful knowing the shape of a solution on not having to care about the details.

Philpax · 2025-07-29T19:30:17 1753817417

Translated a vague natural language query ("cli, extract frame 13s into video") into something immediately actionable with specific examples and explanations, surfacing information that I would otherwise not know how to search for.

That's what I've done with my ffmpeg LLM queries, anyway - can't speak for simonw!

wizzwizz4 · 2025-07-29T20:50:56 1753822256

DuckDuckGo search results for "cli, extract frame 13s into video" (no quotes):

• https://stackoverflow.com/questions/10957412/fastest-way-to-...

• https://superuser.com/questions/984850/linux-how-to-extract-...

• https://www.aleksandrhovhannisyan.com/notes/video-cli-cheat-...

• https://www.baeldung.com/linux/ffmpeg-extract-video-frames

• https://ottverse.com/extract-frames-using-ffmpeg-a-comprehen...

Search engines have been able to translate "vague natural language queries" into search results for a decade, now. This pre-existing infrastructure accounts for the vast majority of ChatGPT's apparent ability to find answers.

stelonix · 2025-07-29T23:40:06 1753832406

Yet the interface is fundamentally different, the output feels much more like bro pages[0] and it's within a click of clipboarding, one CTRL V away from extracting the 13th second screenshot. I've been using Google the past 24 years and my google-fu has always left people amazed; yet I can no longer bother to go through Stack Exchange's results when an LLM not only spits it out so nicely, but also does the equivalent of a explainshell[1].

Not comparable and I fail to see why going through Google's ads/results would be better?

[0] https://github.com/pombadev/bropages

[1] https://github.com/idank/explainshell

wizzwizz4 · 2025-07-30T05:11:59 1753852319

DuckDuckGo insists on shoving "AI Assist" entries in its results, so I have a reasonable idea of how often LLMs are completely wrong even given search results. The answer's still "more than one time in five".

I did not suggest using Google Search (the company's on record as deliberately making Google Search worse), but there are other search engines. My preferred search engines don't do the fancy "interpret natural language queries" pre-processing, because I'm quite good at doing that in my head and often want to research niche stuff, but there are many still-decent search engines that do, and don't have ads in the results.

Heck, you can even pay for a good search engine! And you can have it redirect you to the relevant section of the top search result automatically: Google used to call this "I'm feeling lucky!" (although it was before URI text fragments, so it would just send you to the top of the page). All the properties you're after, much more cheaply, and you keep the information about provenance, and your answer is more-reliably accurate.

delian66 · 2025-07-30T16:53:22 1753894402

> Heck, you can even pay for a good search engine!

Can you recommend one?

0x457 · 2025-07-29T19:34:46 1753817686

LLM somewhat understood ffmpeg documentation? Not sure what is not clear here.

dfedbeef · 2025-07-30T12:36:28 1753878988

Was the answer:

ffmpeg -ss 00:00:13:00 -i myvideo.avi -frames:v 1 myimage.jpeg

Because this is on stack overflow and it took maybe one second to find.

I've found reading the man page for a tool is usually a better way to learn what a tool can do for you now and also in the future.

kamranjon · 2025-07-30T12:55:59 1753880159

This is the rub for me… people are so quick to forget the original source for a lot of the data these models were trained on, and how easy and useful these platforms were. Now Google will summarize this question for you in an AI overview before you even land on Stack Overflow. It’s killing the network effect of the open web and destroying our crowd sourced platforms in favor of a lossy compression algorithm that will eventually be regurgitating its own entrails.

dfedbeef · 2025-07-30T13:17:33 1753881453

Well, maybe. People will just stop using them and will make fun of people who do. You can only bullshit people for so long.