Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My experiences somewhat confirm these observations, but I also had one that was different. Two weeks of debugging IPSEC issues with Gemini. Initially, I imported all the IPSEC documentation from OPNsense and pfSense into Gemini and informed it of the general context in which I was operating (in reference to 'keeping your context clean'). Then I added my initial settings for both sides (sensitive information redacted!). Afterwards, I entered a long feedback loop, posting logs and asking and answering questions.

At the end of the two weeks, I observed that: The LLM was much less likely to become distracted. Sometimes, I would dump whole forum threads or SO posts into it, when it said "this is not what we are seeing here, because of [earlier context or finding]. I eliminated all dead ends logically and informed it of this (yes, it can help with the reflection, but I had to make the decisions). In the end, I found the cause of my issues.

This somewhat confirms what some user here on HN said a few days ago. LLMs are good at compressing complex information into simple one, but not at expanding simple ideas into complex ones. As long as my input was larger than the output (either complexity or length), I was happy with the results.

I could have done this without the LLM. However, it was helpful in that it stored facts from the outset that I had either forgotten or been unable to retrieve quickly in new contexts. It also made it easier to identify time patterns in large log files, which helped me debug my site-to-site connection. I also optimized many other settings along the way, resolving not only the most problematic issue. This meant, in addition to fixing my problem, I learned quite a bit. The 'state' was only occasionally incorrect about my current parameter settings, but this was always easy to correct. This confirms what others already saw: If you know where you are going and treat it as a tool, it is helpful. However, don't try to offload decisions or let it direct you in the wrong direction.

Overall, 350k Tokens used (about 300k words). Here's a related blog post [1] with my overall path, but not directly corresponding to this specific issue. (please don't recommend wireguard; I am aware of it)

    [1]: https://du.nkel.dev/blog/2021-11-19_pfsense_opnsense_ipsec_cgnat/


Recently, Gemini helped me fix a bug in a PPP driver (Zephyr OS) without prior knowledge of PPP or even driver development really. I would copy-paste logs of raw PPP frames in HEX and it would just decode everything and explain the meaning of each bytes. In about an hour, I knew enough about PPP to fix the bug and submit a patch.

https://g.co/gemini/share/7edf8fa373fe


Or you could just read the PPP RFC [0].

I’m not saying that your approach is wrong. But most LLM workflows are either brute forcing the solution, or seeking a local minima to be stuck in. It’s like doing thousands of experiments of objects falling to figure out gravity while there’s a physics textbooks nearby.

[0]: https://datatracker.ietf.org/doc/html/rfc1661


Ironically, I could’ve read all 50 pages of that RFC and still missed the actual issue. What really helped was RFC 1331[0], specifically the "Async-Control-Character-Map" section.

That said, I’m building a product - not a PPP driver - so the quicker I can fix the problem and move on, the better.

[0] https://datatracker.ietf.org/doc/html/rfc1331


I could also walk everywhere, but sometimes technology can help.

There’s no way I could fully read that RFC in an hour. And that’s before you even know what reading to focus your attention on, so you’re just being a worse LLM at that point.


The difference is you’d remember some of the context from reading the thing where an LLM is starting from scratch every single time it comes up.


There are opportunity costs to consider along with relevance. Suppose you are staying at my place. Are you going to read the manual for my espresso machine in total or are you going to ask me to show you how to use it or make one for you?

In any case, LLMs are not magical forgetfulness machines.

You can use a calculator to avoid learning arithmetic but using a calculator doesn’t necessitate failing to learn arithmetic.

You can ask a question of a professor or fellow student, but failing to read the textbook to answer that question doesn’t necessitate failing to develop a mental model or incorporate the answer into an existing one.

You can ask an LLM a question and blindly use its answer but using an LLM doesn’t necessitate failing to learn.


There’s plenty to learn from using LLM’s including how to interact with an LLM.

However, even outside of using a LLM the temptation is always to keep the blinders on do a deep dive for a very specific bug and repeat as needed. It’s the local minima of effort and very slowly you do improve as those deep dives occasionally come up again, but what keeps it from being a global minimum is these systems aren’t suddenly going away. It’s not a friend’s expresso machine, it’s now sitting in your metaphorical kitchen.

As soon as you’re dealt with say a CSS bug the odds of seeing another in the future are dramatically higher. Thus optimizing for diminishing returns means spending a few hours learning the basics of any system or protocol you encounter is just a useful strategy. If you spend 1% of your time on a strategy that makes you 2% more efficient that’s a net win.


Sometimes learning means understanding, aka a deep dive on the domain. Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it. I believe the best use case of LLMs is when you have recognized the problem and know the general shape of the solution, but have no time to wrangle the specifics of the implementation. So you can provide the context and its constraint in order to guide the LLM's generation, as well as recognize wrong outputs.

But that's not learning or even problem's solving. It's just a time saving trick. And one that's not reliable.

And the fact is that there's a lot of information about pretty much anything. But I see people trying to skip the foundation (not glamorous enough, maybe) and go straight for the complicated stuff. And LLMs are good for providing the illusion that it can be the right workflow.


> Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it.

Well said. You can only spend years digging into the intricacies a handful of systems in your lifetime, but there’s still real rewards from a few hours here and there.


I would remember the reply from the LLM, and cross references back to the particular parts of the RFC it identified as worth focusing time on.

I’d argue that’s a more effective capture as to what I would remember anyway.

If wanted to learn more (in a general sense) I can take the manual away with me and study it, which I can do more effectively on its own terms, in a comfy chair with a beer. But right now I have a problem to solve.


Reading it at some later date means you also spent time with the LLM without having read the RFC. So reading it in the future means it’s going to be useful fewer times and thus less efficient overall.

IE LLM then RFC takes more time then RFC then solving the issue.


Only if you assume a priori that you are going to read it anyway, which misses the whole point.

Because you should have read RFC 1331.

Even then your argument assumes that optimising for total time (to include your own learning time) is the goal, and not solving the business case as a priority (your actual problem). That assumption may not be the case when you have a patch to submit. What you solve at what time point is the general case, there’s no single optimum.


You’re assuming your individual tasks perfectly align with what’s best for the organizations which is rarely the case.

Having a less skilled worker is a tradeoff for getting one very specific task accomplished sooner, that might be worth it especially if you plan to quit soon but it’s hardly guaranteed.


No, just basic judgement and prioritisation, which are valuable skills for an employee to have. The OP was effective at finding the right information they needed to solve the problem at hand: In about an hour, the OP knew enough about PPP to fix the bug and submit a patch.

Whereas it's been all morning and you're still reading the RFC, and it's the wrong RFC anway.

I know who i'd hire.


And now it’s obvious why you’re working for someone else.

This time it worked, but I’ve been forced for fire people with this kind of attitude before.


You produced a passive-aggressive taunt instead of addressing the argument. For clarity: nobody was asking about your business decisions, nobody is intimidated by your story. what your personal opinions about "attitude" are is irrelevant to what's being discussed (LLMs allowing optimal time use in certain cases). Also, unless your boss made the firing decision, you weren't forced to do anything.


You’re still not getting it, not having a boss means I have a very different view of businesses decisions. Most people have an overly narrow view of tasks IMO and think speed, especially for minor issues, is vastly more important than it is.

> LLMs allowing optimal time use in certain case

I never said it was slower, I’m saying what’s the tradeoff. I’ve had this same basic conversation with multiple people, and after that failed the only real option is to remove them. Ex: If you don’t quite understand why what you wrote seemingly fixes a bug don’t commit it yet, seems to work isn’t a solution.

Could be I’m not explaining very well, but ehh fuck em.


All of the above is true, but between solving quicker, and admitting we gave context:

I do agree with you that an LLM should not always start from scratch.

In a way it is like an animal which we have given the ultimate human instinct.

What has nature given us? Homo Erectus is 2 million years ago.

A weird world we live in.

What is context.


Interesting that it works for you. I tried several times something similar with frames from a 5G network and it mixed fields from 4G and 5G in its answers (or even from non-cellular network protocols because they had similar features as the 5G protocol I was looking at). Occasionally, the explanation was completely invented or based on discussions of planned features for future versions.

I have really learned to mistrust and double check every single line those systems produce. Same for writing code. Everything they produce looks nice and reasonable on the surface but when you dig deaper it falls apart unless it's something very very basic.


Similarly I found the results pretty mixed whenever a library or framework with a lot of releases/versions is involved. The LLM tends to mix and match features from across versions.


Yes, it fells like setting the `-h` flag for logs (human readable).


That's some impressive prompt engineering skills to keep it on track for that long, nice work! I'll have to try out some longer-form chats with Gemini and see what I get.

I totally agree that LLMs are great at compressing information; I've set up the docs feature in Cursor to index several entire large documentation websites for major libraries and it's able to distill relevant information very quickly.


In Gemini, it is really good to have large window with 1M tokens. However, around 100,000 it starts to make mistakes and refactor its own code.

Sometimes it is good to start new chat or switch to Claude.

And it really helps to be very precise with wording of specification what you want to achieve. Or repeat it sometimes with some added request lines.

GIGO in reality :)


Oh my, I hate it when it rewrites >1k LOC. I have to instruct it to "modify only ..., do not touch the rest" and so forth, but GPT does not listen to this often, Claude does. I dunno about Gemini.


In terms of "does useless refactors I didn't ask for nor improved anything", my own ranked list goes something like: Gemini > Claude > GPT. I don't really experience this at all with various GPT models used via the API, but overall GPTs seems to stick to the system prompt way better than the rest. Clause does OK too, but Gemini is out of control and writes soo much code and does so much you didn't ask for, really acts like a overly eager junior developer.


The first time I used Claude, it rewrote >1k LOC without asking for it, but in retrospect, I was "using it wrong". With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno. I think I do agree with your list, but I haven't used Gemini that much.

Yeah, they do come across as "overly eager junior devs", good comparison. :D


> With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno.

Personally I think it's a lot better via the API than ChatGPT. ChatGPT doesn't let you edit the "system prompt" which is really where you wanna put "how to" instructions, so it really follows them. Instructions put in the user message aren't followed as closely as when you use the system prompt, so probably why it still did something, if you were using ChatGPT.


I received this gem in Gemini right now:

I am giving up on providing code, and on checking is it working, because it is very time consuming. Tell me when it starts working. Good luck.

:)


I love it when models give up, gives me some hope humans will still be required for the time being lol


It is right, it is time consuming. I do not blame it. :D


LLM's are good at interpolation but bad at extrapolating


To be fair, all AI/ML and even statistical methods are bad at extrapolating.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: