My experiences somewhat confirm these observations, but I also had one that was ...

olalonde · 2025-05-15T04:12:03 1747282323

Recently, Gemini helped me fix a bug in a PPP driver (Zephyr OS) without prior knowledge of PPP or even driver development really. I would copy-paste logs of raw PPP frames in HEX and it would just decode everything and explain the meaning of each bytes. In about an hour, I knew enough about PPP to fix the bug and submit a patch.

https://g.co/gemini/share/7edf8fa373fe

skydhash · 2025-05-15T04:46:11 1747284371

Or you could just read the PPP RFC [0].

I’m not saying that your approach is wrong. But most LLM workflows are either brute forcing the solution, or seeking a local minima to be stuck in. It’s like doing thousands of experiments of objects falling to figure out gravity while there’s a physics textbooks nearby.

[0]: https://datatracker.ietf.org/doc/html/rfc1661

olalonde · 2025-05-15T05:19:20 1747286360

Ironically, I could’ve read all 50 pages of that RFC and still missed the actual issue. What really helped was RFC 1331[0], specifically the "Async-Control-Character-Map" section.

That said, I’m building a product - not a PPP driver - so the quicker I can fix the problem and move on, the better.

[0] https://datatracker.ietf.org/doc/html/rfc1331

wrasee · 2025-05-15T10:13:50 1747304030

I could also walk everywhere, but sometimes technology can help.

There’s no way I could fully read that RFC in an hour. And that’s before you even know what reading to focus your attention on, so you’re just being a worse LLM at that point.

Retric · 2025-05-15T16:59:36 1747328376

The difference is you’d remember some of the context from reading the thing where an LLM is starting from scratch every single time it comes up.

cgriswald · 2025-05-15T17:33:41 1747330421

There are opportunity costs to consider along with relevance. Suppose you are staying at my place. Are you going to read the manual for my espresso machine in total or are you going to ask me to show you how to use it or make one for you?

In any case, LLMs are not magical forgetfulness machines.

You can use a calculator to avoid learning arithmetic but using a calculator doesn’t necessitate failing to learn arithmetic.

You can ask a question of a professor or fellow student, but failing to read the textbook to answer that question doesn’t necessitate failing to develop a mental model or incorporate the answer into an existing one.

You can ask an LLM a question and blindly use its answer but using an LLM doesn’t necessitate failing to learn.

Retric · 2025-05-15T18:11:50 1747332710

There’s plenty to learn from using LLM’s including how to interact with an LLM.

However, even outside of using a LLM the temptation is always to keep the blinders on do a deep dive for a very specific bug and repeat as needed. It’s the local minima of effort and very slowly you do improve as those deep dives occasionally come up again, but what keeps it from being a global minimum is these systems aren’t suddenly going away. It’s not a friend’s expresso machine, it’s now sitting in your metaphorical kitchen.

As soon as you’re dealt with say a CSS bug the odds of seeing another in the future are dramatically higher. Thus optimizing for diminishing returns means spending a few hours learning the basics of any system or protocol you encounter is just a useful strategy. If you spend 1% of your time on a strategy that makes you 2% more efficient that’s a net win.

skydhash · 2025-05-15T18:50:38 1747335038

Sometimes learning means understanding, aka a deep dive on the domain. Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it. I believe the best use case of LLMs is when you have recognized the problem and know the general shape of the solution, but have no time to wrangle the specifics of the implementation. So you can provide the context and its constraint in order to guide the LLM's generation, as well as recognize wrong outputs.

But that's not learning or even problem's solving. It's just a time saving trick. And one that's not reliable.

And the fact is that there's a lot of information about pretty much anything. But I see people trying to skip the foundation (not glamorous enough, maybe) and go straight for the complicated stuff. And LLMs are good for providing the illusion that it can be the right workflow.

Retric · 2025-05-15T22:10:34 1747347034

> Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it.

Well said. You can only spend years digging into the intricacies a handful of systems in your lifetime, but there’s still real rewards from a few hours here and there.

wrasee · 2025-05-17T11:02:57 1747479777

I would remember the reply from the LLM, and cross references back to the particular parts of the RFC it identified as worth focusing time on.

I’d argue that’s a more effective capture as to what I would remember anyway.

If wanted to learn more (in a general sense) I can take the manual away with me and study it, which I can do more effectively on its own terms, in a comfy chair with a beer. But right now I have a problem to solve.

Retric · 2025-05-17T15:56:31 1747497391

Reading it at some later date means you also spent time with the LLM without having read the RFC. So reading it in the future means it’s going to be useful fewer times and thus less efficient overall.

IE LLM then RFC takes more time then RFC then solving the issue.

wrasee · 2025-05-17T22:49:56 1747522196

Only if you assume a priori that you are going to read it anyway, which misses the whole point.

Because you should have read RFC 1331.

Even then your argument assumes that optimising for total time (to include your own learning time) is the goal, and not solving the business case as a priority (your actual problem). That assumption may not be the case when you have a patch to submit. What you solve at what time point is the general case, there’s no single optimum.

Retric · 2025-05-18T01:39:56 1747532396

You’re assuming your individual tasks perfectly align with what’s best for the organizations which is rarely the case.

Having a less skilled worker is a tradeoff for getting one very specific task accomplished sooner, that might be worth it especially if you plan to quit soon but it’s hardly guaranteed.

wrasee · 2025-05-18T10:15:55 1747563355

No, just basic judgement and prioritisation, which are valuable skills for an employee to have. The OP was effective at finding the right information they needed to solve the problem at hand: In about an hour, the OP knew enough about PPP to fix the bug and submit a patch.

Whereas it's been all morning and you're still reading the RFC, and it's the wrong RFC anway.

I know who i'd hire.

Retric · 2025-05-20T00:35:06 1747701306

And now it’s obvious why you’re working for someone else.

This time it worked, but I’ve been forced for fire people with this kind of attitude before.

halfadot · 2025-05-21T16:24:50 1747844690

You produced a passive-aggressive taunt instead of addressing the argument. For clarity: nobody was asking about your business decisions, nobody is intimidated by your story. what your personal opinions about "attitude" are is irrelevant to what's being discussed (LLMs allowing optimal time use in certain cases). Also, unless your boss made the firing decision, you weren't forced to do anything.

Retric · 2025-05-22T19:33:26 1747942406

You’re still not getting it, not having a boss means I have a very different view of businesses decisions. Most people have an overly narrow view of tasks IMO and think speed, especially for minor issues, is vastly more important than it is.

> LLMs allowing optimal time use in certain case

I never said it was slower, I’m saying what’s the tradeoff. I’ve had this same basic conversation with multiple people, and after that failed the only real option is to remove them. Ex: If you don’t quite understand why what you wrote seemingly fixes a bug don’t commit it yet, seems to work isn’t a solution.

Could be I’m not explaining very well, but ehh fuck em.

Macuyiko · 2025-05-15T17:48:25 1747331305

All of the above is true, but between solving quicker, and admitting we gave context:

I do agree with you that an LLM should not always start from scratch.

In a way it is like an animal which we have given the ultimate human instinct.

What has nature given us? Homo Erectus is 2 million years ago.

A weird world we live in.

What is context.

tralarpa · 2025-05-15T07:39:14 1747294754

Interesting that it works for you. I tried several times something similar with frames from a 5G network and it mixed fields from 4G and 5G in its answers (or even from non-cellular network protocols because they had similar features as the 5G protocol I was looking at). Occasionally, the explanation was completely invented or based on discussions of planned features for future versions.

I have really learned to mistrust and double check every single line those systems produce. Same for writing code. Everything they produce looks nice and reasonable on the surface but when you dig deaper it falls apart unless it's something very very basic.

foobarian · 2025-05-15T14:52:57 1747320777

Similarly I found the results pretty mixed whenever a library or framework with a lot of releases/versions is involved. The LLM tends to mix and match features from across versions.

Helmut10001 · 2025-05-15T04:36:41 1747283801

Yes, it fells like setting the `-h` flag for logs (human readable).

Benjammer · 2025-05-15T03:43:50 1747280630

That's some impressive prompt engineering skills to keep it on track for that long, nice work! I'll have to try out some longer-form chats with Gemini and see what I get.

I totally agree that LLMs are great at compressing information; I've set up the docs feature in Cursor to index several entire large documentation websites for major libraries and it's able to distill relevant information very quickly.

sixtyj · 2025-05-15T08:38:03 1747298283

In Gemini, it is really good to have large window with 1M tokens. However, around 100,000 it starts to make mistakes and refactor its own code.

Sometimes it is good to start new chat or switch to Claude.

And it really helps to be very precise with wording of specification what you want to achieve. Or repeat it sometimes with some added request lines.

GIGO in reality :)

johnisgood · 2025-05-15T11:52:30 1747309950

Oh my, I hate it when it rewrites >1k LOC. I have to instruct it to "modify only ..., do not touch the rest" and so forth, but GPT does not listen to this often, Claude does. I dunno about Gemini.

diggan · 2025-05-15T14:09:37 1747318177

In terms of "does useless refactors I didn't ask for nor improved anything", my own ranked list goes something like: Gemini > Claude > GPT. I don't really experience this at all with various GPT models used via the API, but overall GPTs seems to stick to the system prompt way better than the rest. Clause does OK too, but Gemini is out of control and writes soo much code and does so much you didn't ask for, really acts like a overly eager junior developer.

johnisgood · 2025-05-15T14:47:34 1747320454

The first time I used Claude, it rewrote >1k LOC without asking for it, but in retrospect, I was "using it wrong". With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno. I think I do agree with your list, but I haven't used Gemini that much.

Yeah, they do come across as "overly eager junior devs", good comparison. :D

diggan · 2025-05-15T15:28:27 1747322907

> With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno.

Personally I think it's a lot better via the API than ChatGPT. ChatGPT doesn't let you edit the "system prompt" which is really where you wanna put "how to" instructions, so it really follows them. Instructions put in the user message aren't followed as closely as when you use the system prompt, so probably why it still did something, if you were using ChatGPT.

sixtyj · 2025-05-15T16:36:41 1747327001

I received this gem in Gemini right now:

I am giving up on providing code, and on checking is it working, because it is very time consuming. Tell me when it starts working. Good luck.

:)

tough · 2025-05-15T17:44:37 1747331077

I love it when models give up, gives me some hope humans will still be required for the time being lol

johnisgood · 2025-05-15T17:17:30 1747329450

It is right, it is time consuming. I do not blame it. :D

tough · 2025-05-15T17:22:36 1747329756

LLM's are good at interpolation but bad at extrapolating

daveguy · 2025-05-15T17:34:43 1747330483

To be fair, all AI/ML and even statistical methods are bad at extrapolating.