More

Benjammer · 2025-07-16T13:41:39 1752673299

This isn't a financial model, they aren't selling the system itself, it's all tooling for data access and financial modeling. It's like they're setting up an OTB, not like they're selling you a system to pick winning horses at the track.

laughingcurve · 2025-07-16T16:33:10 1752683590

Benjammer is entirely correct. Sadly, Hacker News is no longer a place for rational discussion. Many people here nowadays are seemingly grognards who do not care to read or engage and leave snarky comments like 2010s era Reddit.

spaceman_2020 · 2025-07-16T20:01:16 1752696076

Anytime the words “AI” or “crypto” show up in a thread, the collective IQ of this place drops by 50%

If you want some of the worst takes possible on emerging tech, come to HN

But if you want an obscure hack that improves garbage collection in java by 0.1%, also come to HN

Kuinox · 2025-07-17T16:52:52 1752771172

Do you know another website ?

Benjammer · 2025-06-04T19:47:01 1749066421

The fact that Thiel backs him so hard is what worries me more than anything. Thiel has a way of making things happen when he's really committed to something on a personal level... (see the Gawker Media case)

Benjammer · 2025-05-18T02:37:12 1747535832

This kind of “hair splitting” is the foundation on current prompt engineering though…

Benjammer · 2025-05-15T02:53:44 1747277624

It's nice to see a paper that confirms what anyone who has practiced using LLM tools already knows very well, heuristically. Keeping your context clean matters, "conversations" are only a construct of product interfaces, they hurt the quality of responses from the LLM itself, and once your context is "poisoned" it will not recover, you need to start fresh with a new chat.

Helmut10001 · 2025-05-15T03:10:57 1747278657

My experiences somewhat confirm these observations, but I also had one that was different. Two weeks of debugging IPSEC issues with Gemini. Initially, I imported all the IPSEC documentation from OPNsense and pfSense into Gemini and informed it of the general context in which I was operating (in reference to 'keeping your context clean'). Then I added my initial settings for both sides (sensitive information redacted!). Afterwards, I entered a long feedback loop, posting logs and asking and answering questions.

At the end of the two weeks, I observed that: The LLM was much less likely to become distracted. Sometimes, I would dump whole forum threads or SO posts into it, when it said "this is not what we are seeing here, because of [earlier context or finding]. I eliminated all dead ends logically and informed it of this (yes, it can help with the reflection, but I had to make the decisions). In the end, I found the cause of my issues.

This somewhat confirms what some user here on HN said a few days ago. LLMs are good at compressing complex information into simple one, but not at expanding simple ideas into complex ones. As long as my input was larger than the output (either complexity or length), I was happy with the results.

I could have done this without the LLM. However, it was helpful in that it stored facts from the outset that I had either forgotten or been unable to retrieve quickly in new contexts. It also made it easier to identify time patterns in large log files, which helped me debug my site-to-site connection. I also optimized many other settings along the way, resolving not only the most problematic issue. This meant, in addition to fixing my problem, I learned quite a bit. The 'state' was only occasionally incorrect about my current parameter settings, but this was always easy to correct. This confirms what others already saw: If you know where you are going and treat it as a tool, it is helpful. However, don't try to offload decisions or let it direct you in the wrong direction.

Overall, 350k Tokens used (about 300k words). Here's a related blog post [1] with my overall path, but not directly corresponding to this specific issue. (please don't recommend wireguard; I am aware of it)

    [1]: https://du.nkel.dev/blog/2021-11-19_pfsense_opnsense_ipsec_cgnat/

olalonde · 2025-05-15T04:12:03 1747282323

Recently, Gemini helped me fix a bug in a PPP driver (Zephyr OS) without prior knowledge of PPP or even driver development really. I would copy-paste logs of raw PPP frames in HEX and it would just decode everything and explain the meaning of each bytes. In about an hour, I knew enough about PPP to fix the bug and submit a patch.

https://g.co/gemini/share/7edf8fa373fe

skydhash · 2025-05-15T04:46:11 1747284371

Or you could just read the PPP RFC [0].

I’m not saying that your approach is wrong. But most LLM workflows are either brute forcing the solution, or seeking a local minima to be stuck in. It’s like doing thousands of experiments of objects falling to figure out gravity while there’s a physics textbooks nearby.

[0]: https://datatracker.ietf.org/doc/html/rfc1661

olalonde · 2025-05-15T05:19:20 1747286360

Ironically, I could’ve read all 50 pages of that RFC and still missed the actual issue. What really helped was RFC 1331[0], specifically the "Async-Control-Character-Map" section.

That said, I’m building a product - not a PPP driver - so the quicker I can fix the problem and move on, the better.

[0] https://datatracker.ietf.org/doc/html/rfc1331

wrasee · 2025-05-15T10:13:50 1747304030

I could also walk everywhere, but sometimes technology can help.

There’s no way I could fully read that RFC in an hour. And that’s before you even know what reading to focus your attention on, so you’re just being a worse LLM at that point.

Retric · 2025-05-15T16:59:36 1747328376

The difference is you’d remember some of the context from reading the thing where an LLM is starting from scratch every single time it comes up.

cgriswald · 2025-05-15T17:33:41 1747330421

There are opportunity costs to consider along with relevance. Suppose you are staying at my place. Are you going to read the manual for my espresso machine in total or are you going to ask me to show you how to use it or make one for you?

In any case, LLMs are not magical forgetfulness machines.

You can use a calculator to avoid learning arithmetic but using a calculator doesn’t necessitate failing to learn arithmetic.

You can ask a question of a professor or fellow student, but failing to read the textbook to answer that question doesn’t necessitate failing to develop a mental model or incorporate the answer into an existing one.

You can ask an LLM a question and blindly use its answer but using an LLM doesn’t necessitate failing to learn.

Retric · 2025-05-15T18:11:50 1747332710

There’s plenty to learn from using LLM’s including how to interact with an LLM.

However, even outside of using a LLM the temptation is always to keep the blinders on do a deep dive for a very specific bug and repeat as needed. It’s the local minima of effort and very slowly you do improve as those deep dives occasionally come up again, but what keeps it from being a global minimum is these systems aren’t suddenly going away. It’s not a friend’s expresso machine, it’s now sitting in your metaphorical kitchen.

As soon as you’re dealt with say a CSS bug the odds of seeing another in the future are dramatically higher. Thus optimizing for diminishing returns means spending a few hours learning the basics of any system or protocol you encounter is just a useful strategy. If you spend 1% of your time on a strategy that makes you 2% more efficient that’s a net win.

skydhash · 2025-05-15T18:50:38 1747335038

Sometimes learning means understanding, aka a deep dive on the domain. Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it. I believe the best use case of LLMs is when you have recognized the problem and know the general shape of the solution, but have no time to wrangle the specifics of the implementation. So you can provide the context and its constraint in order to guide the LLM's generation, as well as recognize wrong outputs.

But that's not learning or even problem's solving. It's just a time saving trick. And one that's not reliable.

And the fact is that there's a lot of information about pretty much anything. But I see people trying to skip the foundation (not glamorous enough, maybe) and go straight for the complicated stuff. And LLMs are good for providing the illusion that it can be the right workflow.

Retric · 2025-05-15T22:10:34 1747347034

> Only a few domains are worth that. For the others, it's only about placing landmark so you can quickly recognize a problem and find the relevant information before solving it.

Well said. You can only spend years digging into the intricacies a handful of systems in your lifetime, but there’s still real rewards from a few hours here and there.

wrasee · 2025-05-17T11:02:57 1747479777

I would remember the reply from the LLM, and cross references back to the particular parts of the RFC it identified as worth focusing time on.

I’d argue that’s a more effective capture as to what I would remember anyway.

If wanted to learn more (in a general sense) I can take the manual away with me and study it, which I can do more effectively on its own terms, in a comfy chair with a beer. But right now I have a problem to solve.

Retric · 2025-05-17T15:56:31 1747497391

Reading it at some later date means you also spent time with the LLM without having read the RFC. So reading it in the future means it’s going to be useful fewer times and thus less efficient overall.

IE LLM then RFC takes more time then RFC then solving the issue.

wrasee · 2025-05-17T22:49:56 1747522196

Only if you assume a priori that you are going to read it anyway, which misses the whole point.

Because you should have read RFC 1331.

Even then your argument assumes that optimising for total time (to include your own learning time) is the goal, and not solving the business case as a priority (your actual problem). That assumption may not be the case when you have a patch to submit. What you solve at what time point is the general case, there’s no single optimum.

Retric · 2025-05-18T01:39:56 1747532396

You’re assuming your individual tasks perfectly align with what’s best for the organizations which is rarely the case.

Having a less skilled worker is a tradeoff for getting one very specific task accomplished sooner, that might be worth it especially if you plan to quit soon but it’s hardly guaranteed.

wrasee · 2025-05-18T10:15:55 1747563355

No, just basic judgement and prioritisation, which are valuable skills for an employee to have. The OP was effective at finding the right information they needed to solve the problem at hand: In about an hour, the OP knew enough about PPP to fix the bug and submit a patch.

Whereas it's been all morning and you're still reading the RFC, and it's the wrong RFC anway.

I know who i'd hire.

Retric · 2025-05-20T00:35:06 1747701306

And now it’s obvious why you’re working for someone else.

This time it worked, but I’ve been forced for fire people with this kind of attitude before.

halfadot · 2025-05-21T16:24:50 1747844690

You produced a passive-aggressive taunt instead of addressing the argument. For clarity: nobody was asking about your business decisions, nobody is intimidated by your story. what your personal opinions about "attitude" are is irrelevant to what's being discussed (LLMs allowing optimal time use in certain cases). Also, unless your boss made the firing decision, you weren't forced to do anything.

Retric · 2025-05-22T19:33:26 1747942406

You’re still not getting it, not having a boss means I have a very different view of businesses decisions. Most people have an overly narrow view of tasks IMO and think speed, especially for minor issues, is vastly more important than it is.

> LLMs allowing optimal time use in certain case

I never said it was slower, I’m saying what’s the tradeoff. I’ve had this same basic conversation with multiple people, and after that failed the only real option is to remove them. Ex: If you don’t quite understand why what you wrote seemingly fixes a bug don’t commit it yet, seems to work isn’t a solution.

Could be I’m not explaining very well, but ehh fuck em.

Macuyiko · 2025-05-15T17:48:25 1747331305

All of the above is true, but between solving quicker, and admitting we gave context:

I do agree with you that an LLM should not always start from scratch.

In a way it is like an animal which we have given the ultimate human instinct.

What has nature given us? Homo Erectus is 2 million years ago.

A weird world we live in.

What is context.

tralarpa · 2025-05-15T07:39:14 1747294754

Interesting that it works for you. I tried several times something similar with frames from a 5G network and it mixed fields from 4G and 5G in its answers (or even from non-cellular network protocols because they had similar features as the 5G protocol I was looking at). Occasionally, the explanation was completely invented or based on discussions of planned features for future versions.

I have really learned to mistrust and double check every single line those systems produce. Same for writing code. Everything they produce looks nice and reasonable on the surface but when you dig deaper it falls apart unless it's something very very basic.

foobarian · 2025-05-15T14:52:57 1747320777

Similarly I found the results pretty mixed whenever a library or framework with a lot of releases/versions is involved. The LLM tends to mix and match features from across versions.

Helmut10001 · 2025-05-15T04:36:41 1747283801

Yes, it fells like setting the `-h` flag for logs (human readable).

Benjammer · 2025-05-15T03:43:50 1747280630

That's some impressive prompt engineering skills to keep it on track for that long, nice work! I'll have to try out some longer-form chats with Gemini and see what I get.

I totally agree that LLMs are great at compressing information; I've set up the docs feature in Cursor to index several entire large documentation websites for major libraries and it's able to distill relevant information very quickly.

sixtyj · 2025-05-15T08:38:03 1747298283

In Gemini, it is really good to have large window with 1M tokens. However, around 100,000 it starts to make mistakes and refactor its own code.

Sometimes it is good to start new chat or switch to Claude.

And it really helps to be very precise with wording of specification what you want to achieve. Or repeat it sometimes with some added request lines.

GIGO in reality :)

johnisgood · 2025-05-15T11:52:30 1747309950

Oh my, I hate it when it rewrites >1k LOC. I have to instruct it to "modify only ..., do not touch the rest" and so forth, but GPT does not listen to this often, Claude does. I dunno about Gemini.

diggan · 2025-05-15T14:09:37 1747318177

In terms of "does useless refactors I didn't ask for nor improved anything", my own ranked list goes something like: Gemini > Claude > GPT. I don't really experience this at all with various GPT models used via the API, but overall GPTs seems to stick to the system prompt way better than the rest. Clause does OK too, but Gemini is out of control and writes soo much code and does so much you didn't ask for, really acts like a overly eager junior developer.

johnisgood · 2025-05-15T14:47:34 1747320454

The first time I used Claude, it rewrote >1k LOC without asking for it, but in retrospect, I was "using it wrong". With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno. I think I do agree with your list, but I haven't used Gemini that much.

Yeah, they do come across as "overly eager junior devs", good comparison. :D

diggan · 2025-05-15T15:28:27 1747322907

> With GPT, even when I told it to not do it, it still did that, but that was some time ago and it was not done via the API, so I dunno.

Personally I think it's a lot better via the API than ChatGPT. ChatGPT doesn't let you edit the "system prompt" which is really where you wanna put "how to" instructions, so it really follows them. Instructions put in the user message aren't followed as closely as when you use the system prompt, so probably why it still did something, if you were using ChatGPT.

sixtyj · 2025-05-15T16:36:41 1747327001

I received this gem in Gemini right now:

I am giving up on providing code, and on checking is it working, because it is very time consuming. Tell me when it starts working. Good luck.

:)

tough · 2025-05-15T17:44:37 1747331077

I love it when models give up, gives me some hope humans will still be required for the time being lol

johnisgood · 2025-05-15T17:17:30 1747329450

It is right, it is time consuming. I do not blame it. :D

tough · 2025-05-15T17:22:36 1747329756

LLM's are good at interpolation but bad at extrapolating

daveguy · 2025-05-15T17:34:43 1747330483

To be fair, all AI/ML and even statistical methods are bad at extrapolating.

morsecodist · 2025-05-15T03:03:03 1747278183

This matches my experience exactly. "poisoned" is a great way to put it. I find once something has gone wrong all subsequent responses are bad. This is why I am iffy on ChatGPT's memory features. I don't notice it causing any huge problems but I don't love how it pollutes my context in ways I don't fully understand.

somenameforme · 2025-05-15T05:57:16 1747288636

It's interesting how much the nature of LLMs fundamentally being self recursive next token predictors aligns with the Chinese Room experiment. [1] In such experiment it also makes perfect sense that a single wrong response would cascade into a series of subsequent ever more drifting errors. I think it all emphasizes the relevance of the otherwise unqualifiable concept of 'understanding.'

In many ways this issue could make the Chinese Room thought experiment even more compelling. Because it's a very practical and inescapable issue.

[1] - https://en.wikipedia.org/wiki/Chinese_room

jampekka · 2025-05-15T08:41:19 1747298479

I don't think the Chinese room thought experiment is about this, or performance of LLMs in general. Searle explicitly argues that a program can't induce "understanding" even if it mimicked human understanding perfectly because programs don't have "causal powers" to generate "mental states".

This is mentioned in the Wikipedia page too: "Although its proponents originally presented the argument in reaction to statements of artificial intelligence (AI) researchers, it is not an argument against the goals of mainstream AI research because it does not show a limit in the amount of intelligent behavior a machine can display."

keiferski · 2025-05-15T07:22:18 1747293738

Great comment on the Chinese room. That idea seems to be dismissed nowadays but the concept of “cascading failure to understand context” is absolutely relevant to LLMs. I often find myself needing to explain basic details over and over again to an LLM; when with a person it would be a five second, “no, I mean like this way, not that way” explanation.

OtherShrezzing · 2025-05-15T09:35:30 1747301730

I find using tools like LMStudio, which lets you edit your chat history on the fly, really helps deal with this problem. The models you can host locally are much weaker, but they perform a little better than the really big models once you need to factor in these poisoning problems.

A nice middle-ground I'm finding is to ask Claude an initial conversation starter in its "thinking" mode, and then copy/paste that conversation into LMStudio and have a weaker model like Gemma pick-up from where Claude left off.

AstroBen · 2025-05-15T04:55:09 1747284909

good point on the memory feature. Wow that sounds terrible

distances · 2025-05-15T07:26:44 1747294004

The memory is easy to turn off. It sounded like a very bad idea to cross-contaminate chats so I disabled it as soon as ChatGPT introduced it.

shrewduser · 2025-05-16T04:35:26 1747370126

I have very limited experience with llms but i've always thought of it as a compounding errors problem, once you get a small error early on it can compound and go completely off track later.

b800h · 2025-05-15T06:29:58 1747290598

I've been saying for ages that I want to be able to fork conversations so I can experiment with the direction an exchange takes without irrevocably poisoning a promising well. I can't do this with ChatGPT, is anyone aware of a provider that offers this as a feature?

stuffoverflow · 2025-05-15T06:36:49 1747291009

Google AI studio, ChatGPT and Claude all support this. Google AI studio is the only one that let's you branch to a separate chat though. For ChatGPT and claude you just edit the message you want to branch from.

giordanol · 2025-05-15T13:47:31 1747316851

Feels like a semi-simple UX fix could make this a lot more natural. Git-style forks but for chats.

Garlef · 2025-05-15T06:56:24 1747292184

Support: Yes. But the UX is not optimized for this.

Imagine trying to find a specific output/input that was good in the conversation tree.

layer8 · 2025-05-15T07:53:41 1747295621

Yes, it would be nice if you could at least bookmark a particular branch.

m4houk · 2025-05-15T07:27:47 1747294067

I once built something like this for fun as a side project.

You can highlight some text in a chat and fork the chat to talk about that text selection, so the LLM has context of that along with the previous chat history and it responds in a new chat (entire chat history up to that point from the parent chat gets copied over - basically inspired by the Unix `fork`).

Your text selection from the parent chat would get turned into a hyperlink to the new child chat so you can always get to it again if you're reading the parent chat.

lewdwig · 2025-05-15T07:53:33 1747295613

T3.chat supports convo forking and in my experience works really well.

The fundamental issue is that LLMs do not currently have real long term memory, and until they do, this is about the best we can do.

therockhead · 2025-05-15T09:20:15 1747300815

I need to think about this a bit more, but I think I would love a thread feature in ChatGPT, so that it has the context up to the point of creation but doesn’t affect the main conversation. It would help in two ways, it keeps the main topic from getting poisoned , and allow me to minimise text clutter when i go off on tangents during the conversation.

actualwitch · 2025-05-15T08:35:02 1747298102

I stumbled upon this issue myself when designing prompts for agentic systems and got mad at the lack of tools to support this flow, so I built one myself! I called it Experiment, it allows easy conversation forking and editing while retaining all logs.

https://github.com/actualwitch/experiment

bambax · 2025-05-15T07:49:53 1747295393

On Openrouter you can delete previous answers (and questions) and maintain a separate conversation with different models.

But it would indeed be nice to either disable answers (without deleting them) or forking a conversation. It wouldn't be hard to implement; I wonder if there's a market for just this?

a_e_k · 2025-05-15T07:20:56 1747293656

If you're happy running local models, llama.cpp's built-in web-server's interface can do this.

granra · 2025-05-15T06:32:34 1747290754

Some 3rd party UIs offer this, I use typingmind sometimes that does but AFAIK some open source ones do too.

anonexpat · 2025-05-15T06:32:15 1747290735

I believe Claude has forking in their web interface.

dr_dshiv · 2025-05-15T13:34:57 1747316097

The #1 tip I teach is to make extensive use of the teeny-tiny mostly hidden “edit” button in ChatGPT and Claude. When you get a bad response, stop and edit to get a better one, rather than letting crap start to multiply crap.

diggan · 2025-05-15T13:43:23 1747316603

Hear hear! Basically if the first reply isn't good/didnt understand/got something wrong, restart from the beginning with a better prompt, explaining more/better. Rinse and repeat.

forgotTheLast · 2025-05-15T15:46:00 1747323960

You can do even better by asking it to ask clarifying questions before generating anything, then editing your initial prompt with those clarifications.

cruffle_duffle · 2025-05-16T01:14:02 1747358042

It is also a great way to branch conversations from some shared “initial context”.

They really need to make that edit feature much more prominent. It is such an important way to interact with the model.

CobrastanJorji · 2025-05-15T05:06:25 1747285585

An interesting little example of this problem is initial prompting, which is effectively just a permanent, hidden context that can't be cleared. On Twitter right now, the "Grok" bot has recently begun frequently mentioning "White Genocide," which is, y'know, odd. This is almost certainly because someone recently adjusted its prompt to tell it what its views on white genocide are meant to be, which for a perfect chatbot wouldn't matter when you ask it about other topics, but it DOES matter. It's part of the context. It's gonna talk about that now.

dragonwriter · 2025-05-15T06:49:38 1747291778

> This is almost certainly because someone recently adjusted its prompt to tell it what its views on white genocide are meant to be

Well, someone did something to it; whether it was training, feature boosting the way Golden Gate Claude [0] was done, adjusting the system prompt, or assuring that it's internet search for contextual information would always return material about that, or some combination of those, is neither obvious nor, if someone had a conjecture as to which one or combination it was, easily falsifiable/verifiable.

[0] https://www.anthropic.com/news/golden-gate-claude

lolinder · 2025-05-15T13:14:06 1747314846

Source [0]. The examples look pretty clearly like they stuck it in the context window, not trained it in. It consistently seems to structure the replies as though the user they're replying to is the one who brought up white genocide in South Africa, and it responds the way that LLMs often respond to such topics: saying that it's controversial and giving both perspectives. That's not behavior I would expect if they had done the Golden Gate Claude method, which inserted the Golden Gate Bridge a bit more fluidly into the conversation rather than seeming to address a phantom sentence that the user supposedly said.

Also, let's be honest, in a Musk company they're going to have taking the shortest possible route to accomplishing what he wanted them to.

[0] https://www.cnn.com/2025/05/14/business/grok-ai-chatbot-repl...

CobrastanJorji · 2025-05-15T20:46:08 1747341968

When your boss is a crazy, drugged-up billionaire who has ADD and also runs the government, when he tells you to do something, you do it the fast way.

9dev · 2025-05-15T06:03:18 1747288998

Well, telling an AI chatbot to insist on discussing a white genocide seems like a perfectly Elon thing to do!

M4v3R · 2025-05-15T06:44:28 1747291468

> This is almost certainly because someone recently adjusted its prompt to tell it what its views on white genocide are

Do you have any source on this? System prompts get leaked/extracted all the time so imagine someone would notice this

Edit: just realized you’re talking about the Grok bot, not Grok the LLM available on X or grok.com. With the bot it’s probably harder to extract its exact instructions since it only replies via tweets. For reference here’s the current Grok the LLM system prompt: https://github.com/asgeirtj/system_prompts_leaks/blob/main/g...

lenkite · 2025-05-15T08:40:25 1747298425

Probably because it is now learning from a lot of videos posted on X by misc right-wingers showing rallying cries of South African politicians like Julius Malema, Paul Mashatile etc. Not very odd.

As merely 3 of over a dozen examples:

https://x.com/DefiantLs/status/1922213073957327219

https://x.com/PPC4Liberty/status/1922650016579018855

https://x.com/News24/status/1920909178236776755

micromacrofoot · 2025-05-15T12:22:37 1747311757

nah, llms don't learn like this — they specifically added it to the system prompt

stevedonovan · 2025-05-15T06:52:07 1747291927

Ah, Elon paying attention to hid companies again!

Context poisoning is not a uniquely LLM problem

ezst · 2025-05-15T05:25:55 1747286755

The heck??

CobrastanJorji · 2025-05-15T16:34:37 1747326877

Yeah, things are a little weird on Twitter these days. https://www.nbcnews.com/tech/tech-news/elon-musks-ai-chatbot...

unshavedyak · 2025-05-15T03:15:45 1747278945

Has any interface implemented a .. history cleaning mechanism? Ie with every chat message focus on cleaning up dead ends in the conversation or irrelevant details. Like summation but organic for the topic at hand?

Most history would remain, it wouldn’t try to summarize exactly, just prune and organize the history relative to the conversation path?

nosefurhairdo · 2025-05-15T03:21:20 1747279280

I've had success having a conversation about requirements, asking the model to summarize the requirements as a spec to feed into a model for implementation, then pass that spec into a fresh context. Haven't seen any UI to do this automatically but fairly trivial/natural to perform with existing tools.

dep_b · 2025-05-15T14:54:26 1747320866

Doing the same. Though I wish there was some kind of optimization of text generated by an LLM for an LLM. Just mentioning it’s for an LLM instead of Juan consumption yields no observably different results.

ithkuil · 2025-05-15T06:29:06 1747290546

"Every problem in computer science can be solved with another level of indirection."

One could argue that the attention mechanism in transformers is already designed to do that.

But you need to train it more specifically with that in mind if you want it to be better at damping attention to parts that are deemed irrelevant by the subsequent evolution of the conversation.

And that requires the black art of ML training.

While thinking of doing this as a hack on top of the chat product feels more like engineering and we're more familiar with that as a field.

olalonde · 2025-05-15T04:17:04 1747282624

Not sure if that's what you mean but Claude Code has a /compact command which gets triggered automatically when you exceed the context window.

The prompt it uses: https://www.reddit.com/r/ClaudeAI/comments/1jr52qj/here_is_c...

QuadmasterXLII · 2025-05-15T04:42:53 1747284173

the problem is that it needs to read the log to prune the log, and so if there is garbage in the log, which needs to be pruned to keep from poisoning the main chat, then the garbage will poison the pruning model, and it will do a bad job pruning.

Benjammer · 2025-05-15T03:30:23 1747279823

I mean, you could build this, but it would just be a feature on top of a product abstraction of a "conversation".

Each time you press enter, you are spinning up a new instance of the LLM and passing in the entire previous chat text plus your new message, and asking it to predict the next tokens. It does this iteratively until the model produces a <stop> token, and then it returns the text to you and the PRODUCT parses it back into separate chat messages and displays it in your UI.

What you are asking the PRODUCT to now do is to edit your and its chat messages in the history of the chat, and then send that as the new history with your latest message. This is the only way to clean the context because the context is nothing more than your messages and its previous responses, plus anything that tools have pulled in. I think it would be sort of a weird feature to add to a chat bot to have the chat bot, each time you send a new message, go back through the entire history of your chat and just start editing the messages to prune out details. You would scroll up and see a different conversation, it would be confusing.

IMO, this is just part of prompt engineering skills to keep your context clean or know how to "clean" it by branching/summarizing conversations.

rrr_oh_man · 2025-05-15T04:44:54 1747284294

Or delete / edit messages in AI Studio or Open Router.

hobofan · 2025-05-15T06:32:28 1747290748

Not a history cleaning mechanism, but related to that, Cursor in the most recent release introduced a feature to duplicate your chat (so you can saveguard yourself against poisoning and go back to and unpoisoned point in history), which seems like an addmision of the same problem.

kqr · 2025-05-15T04:20:47 1747282847

Isn't this what Claude workbench in the Anthropic console does? It lets the user edit both sides of the conversation history.

bredren · 2025-05-15T14:27:05 1747319225

This is why I created FileKitty, which lets you quickly concatenate multiple source code files into markdown-formatted copy-pasta:

https://github.com/banagale/FileKitty

When getting software development assistance, relying on LLM products to search code bases etc leaves too much room for error. Throw in what amounts to lossy compression of that context to save the service provider on token costs and the LLM is serving watered down results.

Getting the specific context right up front and updating that context as the conversation unfolds leads to superior results.

Even then, you do need to mind the length of conversations. I have a prompt designed to capture conversational context, and transfer it into a new session. It identifies files that should be included in the new initial prompt, etc.

For a bit more discussion on this, see this thread and its ancestry: https://news.ycombinator.com/item?id=43711216

CompoundEyes · 2025-05-15T03:55:16 1747281316

Agreed poisoned is a good term. I’d like to see “version control” for conversations via the API and UI that lets you rollback to a previous place or clone from that spot into a new conversation. Even a typo or having to clarify a previous message skews the probabilities of future responses due to the accident.

mh- · 2025-05-15T03:56:20 1747281380

"Forking" or "branching" (probably better received outside of SWEs) a conversation really ought to be a first class feature of ChatGPT et Al.

HaZeust · 2025-05-15T03:59:01 1747281541

It is in Google Gemini, which I really hate to say - but I've been using a lot more than GPT. I reckon I'll be cancelling my Pro if Gemini stays with this lead for my everyday workflows.

energy123 · 2025-05-15T04:18:27 1747282707

How? I use the Gemini web app and don't see it.

voidspark · 2025-05-15T04:26:07 1747283167

http://aistudio.google.com

crooked-v · 2025-05-15T04:59:20 1747285160

AI Studio is borderline unusable for long conversations. I don't know what in the world it's doing but it sure looks like a catastrophic memory leak in the basic design.

voidspark · 2025-05-15T05:00:03 1747285203

I have been using it up to 100k tokens so far without issues. Never needed to go further than that. But much of that was in uploaded documents.

drittich · 2025-05-15T05:02:27 1747285347

Also exists in LM Studio.

wunderwuzzi23 · 2025-05-15T06:39:47 1747291187

This was part of ChatGPT from pretty much the beginning, maybe not the initial version but few weeks later- don't recall exactly

layer8 · 2025-05-15T06:11:04 1747289464

This has been in ChatGPT from pretty early on? Just edit any prompt, it creates a new branch, and you can switch back and forth.

b800h · 2025-05-15T06:36:19 1747290979

Blimey, I didn't realise the entire thread was saved when you edited a prompt. Very good! Mind you, it feels "unsafe". I'd like to be able to clone a thread.

cruffle_duffle · 2025-05-16T01:21:18 1747358478

You can toggle between different branches in both GPT and Claude.

gdudeman · 2025-05-15T05:30:39 1747287039

It is!

It exists in Claude as a true branch - you can see the old threads - and in ChatGPT as without the history.

Edit a previous reply and hit “go” to see it in action.

gdudeman · 2025-05-15T05:28:51 1747286931

This exists in Claude. Edit any previous message and it will fork the conversation.

MattGaiser · 2025-05-15T03:00:55 1747278055

Yep. I regretted leaving on memory as it is poisoned my conversations with irrelevant junk.

neom · 2025-05-15T03:10:41 1747278641

You can go in and delete memory items

Adambuilds · 2025-05-15T09:10:02 1747300202

I agree—once the context is "poisoned," it’s tough to recover. A potential improvement could be having the LLM periodically clean or reset certain parts of the context without starting from scratch. However, the challenge would be determining which parts of the context need resetting without losing essential information. Smarter context management could help maintain coherence in longer conversations, but it’s a tricky balance to strike.Perhaps using another agent to do the job?

pseudocomposer · 2025-05-15T13:25:50 1747315550

I mostly just use LLMs for autocomplete (not chat), but wouldn’t this be fixed by adding a “delete message” button/context option in LLM chat UIs?

If you delete the last message from the LLM (so now, you sent the last message), it would then generate a new response. (This would be particularly useful with high-temperature/more “randomly” configured LLMs.)

If you delete any other message, it just updates the LLM context for any future responses it sends (the real problem at hand, context cleanup).

I think seeing it work this way would also really help end users who think LLMs are “intelligent” to better understand that it’s just a big, complex autocomplete (and that’s still very useful).

Maybe this is standard already, or used in some LLM UI? If not, consider this comment as putting it in the public domain.

Now that I’m thinking about it, it seems like it might be practical to use “sub-contextual LLMs” to manage the context of your main LLM chat. Basically, if an LLM response in your chat/context is very long, you could ask the “sub-contextual LLM” to shorten/summarize that response, thus trimming down/cleaning the context for your overall conversation. (Also, more simply, an “edit message” button could do the same, just with you, the human, editing the context instead of an LLM…)

dr_dshiv · 2025-05-15T13:33:38 1747316018

This is how Claude’s UI used to work, in practice, where you could edit the context directly.

amelius · 2025-05-15T06:50:54 1747291854

I suppose that the chain-of-thought style of prompting that is used by AI chat applications internally also breaks down because of this phenomenon.

Macuyiko · 2025-05-15T17:44:41 1747331081

Weirdly it has gotten so far that I have embedded this into my workflow and will often prompt:

> "Good work so far, now I want to take it to another step (somewhat related but feeling it too hard): <short description>. Do you think we can do it in this conversation or is it better to start fresh? If so, prepare an initial prompt for your next fresh instantiation."

Sometimes the model says that it might be better to start fresh, and prepares a good summary prompt (including a final 'see you later'), whereas in other cases it assures me it can continue.

I have a lot of notebooks with "initial prompts to explore forward". But given the sycophancy going on as well as one-step RL (sigh) post-training [1], it indeed seems AI platforms would like to keep the conversation going.

[1] RL in post-training has little to do with real RL and just uses one shot preference mechanisms with an RL inspired training loop. There is very little work in terms of long-term preferences slash conversations, as that would increase requirements exponentially.

senordevnyc · 2025-05-15T17:58:14 1747331894

Is there any reason to think that LLMs have the introspection ability to be able to answer your question effectively? I just default to having them provide a summary that I can use to start the next conversation, because I’m unclear on how an LLM would know it’s losing the plot due to long context window.

jimmySixDOF · 2025-05-15T13:08:01 1747314481

>"conversations" are only a construct of product interfaces

This seems to be in flux now due to RL training on multiturn eval datasets so while the context window is evergreen every time, there will be some bias towards interpreting each prompt as part of a longer conversation. Mutliturn post training is not scaled out yet in public but I think it may be the way to keep on the 'double time spent on goal every 7 months curve'

bentt · 2025-05-15T13:13:06 1747314786

Yes even when coding and not conversing I often start new conversations where I take the current code and explain it new. This often gives better results than hammering on one conversation.

This feels like something that can be fixed with manual instructions which prompt the model to summarize and forget. This might even map appropriately to human psychology. Working Memory vs Narrative/Episodic Memory.

freehorse · 2025-05-15T10:47:37 1747306057

Which is why I really like zed's chat UX experience: being able to edit the full prior conversation like a text file, I can go back and clean it up, do small adjustments, delete turns etc and then continue the discussion with a cleaner and more relevant context.

I have made zed one of my main llm chat interfaces even for non-programming tasks, because being able to do that is great.

yaur · 2025-05-15T13:48:15 1747316895

One of the most frustrating features of ChatGPT is “memories” which can cause that poisoning to follow you around between chats.

aleksituk · 2025-05-15T13:54:17 1747317257

Yarp! And "poisoning" can be done with "off-topic" questions and answers as well as just sort of "dilution". Have noticed this when doing content generation repeatedly, tight instructions get diluted over time.

QuantumGood · 2025-05-15T14:37:28 1747319848

" 'conversations' are only a construct of product interface" is so helpful maintain top-of-mind, but difficult because of all the "conversational" cues

veunes · 2025-05-15T06:33:05 1747290785

What surprised me is how early the models start locking into wrong assumptions

djmips · 2025-05-15T04:27:57 1747283277

Happens with people too if you think about it.

kfarr · 2025-05-15T06:52:21 1747291941

Who gets lost in multi-turn conversations?

TheOtherHobbes · 2025-05-15T07:43:13 1747294993

Everyone?

How often in meetings does everyone maintain a running context of the entire conversation, instead of responding to the last thing that was said with a comment that has an outstanding chance of being forgotten as soon as the next person starts speaking?

djmips · 2025-05-15T17:26:10 1747329970

Indeed - and since human's are susceptible to injection prompt, all it needs is one derailing comment to take things off course.

oaeirjtlj · 2025-05-15T15:41:28 1747323688

And now that chatgpt has a "memory" and can access previous conversations, it might be poisoned permanently. It gets one really bad idea, and forever after it insists on dumping that bad idea into every subsequent response ever after you repeatedly tell it "THAT'S A SHIT IDEA DON'T EVER MENTION THAT AGAIN". Sometimes it'll accidentally include some of its internal prompting, "user is very unhappy, make sure to not include xyz", and then it'll give you a response that is entirely focused around xyz.

Benjammer · 2025-05-09T16:15:50 1746807350

Avogadro's number has a 10^23 in it to account for this atom-->physical matter sort of "scale up" conversion. Atoms are really small...

not_kurt_godel · 2025-05-09T16:42:19 1746808939

Sometimes I have a hard time wrapping my head around reconciling that with the estimated number of protons in the observable universe which is "only" ~10^80 (https://en.m.wikipedia.org/wiki/Eddington_number). Seems like it "should" be much higher, but orders of magnitude are sometimes deceptive to our intuition.

philsnow · 2025-05-09T17:32:25 1746811945

Unrelated, but I moved to a more rural area a while back and I’m surrounded by orchards and fields a fair amount of time, and my mind just can’t wrap itself around the scale of agriculture.

One avocado tree can produce around 200 avocados per year, and the orchards around here are probably around 150 trees/acre, so 30k avocados/acre/year.

Each avocado has about 250 calories (and that is just the parts that we eat, the tree has to put energy and mass into the pit and skin etc). These are food calories / kcal, so that’s 250k calories per avocado, or ~7.5 billion calories per year per acre.

7.5B calories/year is just about exactly 1kW, so that orchard is converting sunlight (and water, air, and trace minerals) to avocado calories at a continuous rate of 1kW. It’s incredible. The USDA says that as of 2022 there were about 880M acres of farmland in the United States alone.

dredmorbius · 2025-05-09T23:34:15 1746833655

1 acre is about 4,050 m^2, and incident sunlight has an average intensity of 1kW/m^2.

So your avocado orchard is converting incident sunlight to food calories with an efficiency of about 0.025%.

(This ... isn't wildly inefficient for photosynthesis, though typical values range from 1--3% AFAIU, though I've not computed this on a per-acre / per-hectare basis.)

Mind too that you're getting more than just avocado meat, there are also the skins and pits as you note, as well as leaves and wood, all of which could be used as fuel should we really want to.

Ecologists look at the net total energy conversion of ecosystems, often expressed not in terms of energy but as carbon fixation --- how much CO2 is captured from the atmosphere and converted to biomass.

And that amount is ... surprisingly limited. We'll often hear that humans use only a small fraction of the sunlight incident on the Earth's surface, but once you start accounting for various factors, that becomes far less comforting than it's usually intended. Three-quarters of Earth's surface is oceans (generally unsuitable for farming), plants and the biosphere require a certain amount of that activity, etc., etc. It turns out that humans already account for about 40% of net primary productivity (plant metabolism) of the biosphere. Increasing our utilisation of that is ... not likely, likely greatly disruptive, and/or both.

Another interesting statistic: In 1900, just as the Model T Ford was being introduced, and local transport (that is, exclusive of inter-city rail and aquatic transport) was principally dependent on human feet or horse's hooves, twenty percent of the US grain crop went to animal feed. (And much of that ended up on city streets.) We had a biofuel-based economy, and it consumed much of our food supply.

(Stats are for the US but would be typical of other countries of the time.)

This isn't an argument that fossil fuels are "good", or that renewables are "bad". It does point out, however, that changing our present system is hard, and any solution will cause pain and involve compromises.

colechristensen · 2025-05-09T21:54:14 1746827654

It takes a bit to accept your (10^0 m) place in the universe on the length scale between the Planck length (10^-35 m), the width of a proton (10^-15 m) and the diameter of the observable universe (10^27 m).

kbelder · 2025-05-09T23:24:37 1746833077

I wonder if there's any reason we're roughly in the middle.

colechristensen · 2025-05-11T21:44:37 1746999877

Well the ratio of the strong force, vs electromagnetism and the speed of light define the size of the atom. Life requires machinery to self replicate and the distance between a DNA base pair is a sugar molecule attached in a chain so that's about as small as possible. Intelligence requires a certain amount of complexity of something like a brain, and it has to be made of cells and doubtful it could be made more than an order of magnitude smaller.

Could intelligent life exist based on some other physical phenomena than a self-replicating string of atoms? Maybe some unknown quantum phenomena inside neutron stars or something big and slow on galactic scales or something new which fills the dark matter gap...

But otherwise it's physics driving where units of "stuff" can exist, and the correct scales for long term complexity/turbulence can happen, like the thin film of goo on the outside of the frozen crust of a molten rock we are.

sebastiennight · 2025-05-11T18:39:59 1746988799

"Roughly" still being off by a factor of 10^5 means an amoeba, or bacteria in general, would be acing the "middle" better than us though.

HPsquared · 2025-05-09T16:43:41 1746809021

My brain says that's only 4 times as many.

Geee · 2025-05-09T16:30:21 1746808221

Avogadro was a weird looking guy https://en.wikipedia.org/wiki/File:Amadeo_Avogadro.png

frainfreeze · 2025-05-09T16:40:40 1746808840

Aren't we all a bit weird looking? I'm more entertained that URL was already in my browser history

baruz · 2025-05-09T17:04:35 1746810275

Huh. It was grayed out for me as well, but I have no recollection of having had to look up moles, Avogadro, or even chemistry-related topics in Wikipedia for at least several months.

evertedsphere · 2025-05-09T17:00:56 1746810056

it's hn greying out the post

DonHopkins · 2025-05-09T17:45:45 1746812745

It looks like his MIND=BLOWN, then popped and re-inflated in Theme Hospital. It just goes to show how dangerous it is to think about such big numbers.

https://www.youtube.com/watch?v=Le_znuXcP2M

dhosek · 2025-05-09T16:38:20 1746808700

He was obviously an alien.

maratc · 2025-05-09T19:07:33 1746817653

Ah, the source of "hey girls, take my number" meme.

Benjammer · 2025-05-06T15:38:07 1746545887

I've found that heavily commented code can be better for the LLM to read later, so it pulls in explanatory comments into context at the same time as reading code, similar to pulling in @docs, so maybe it's doing that on purpose?

koakuma-chan · 2025-05-06T15:50:14 1746546614

No, it's just bad. I've been writing a lot of Python code past two days with Gemini 2.5 Pro Preview, and all of its code was like:

```python

def whatever():

  --- SECTION ONE OF THE CODE ---

  ...

  --- SECTION TWO OF THE CODE ---

  try:
    [some "dangerous" code]
  except Exception as e:
     logging.error(f"Failed to save files to {output_path}: {e}")
     # Decide whether to raise the error or just warn
     # raise IOError(f"Failed to save files to {output_path}: {e}")

```

(it adds commented out code like that all the time, "just in case")

It's terrible.

I'm back to Claude Code.

NeutralForest · 2025-05-06T16:29:15 1746548955

I'm seeing it trying to catch blind exceptions in Python all the time. I see it in my colleagues code all the time, it's driving me nuts.

JoshuaDavid · 2025-05-06T17:43:29 1746553409

The training loop asked the model to one-shot working code for the given problems without being able to iterate. If you had to write code that had to work on the first try, and where a partially correct answer was better than complete failure, I bet your code would look like that too.

In any case, it knows what good code looks like. You can say "take this code and remove spurious comments and prefer narrow exception handling over catch-all", and it'll do just fine (in a way it wouldn't do just fine if your prompt told it to write it that way the first time, writing new code and editing existing code are different tasks).

NeutralForest · 2025-05-06T21:11:53 1746565913

It's only an example, there's pretty of irrelevant stuff that LLMs default to which is pretty bad Python. I'm not saying it's always bad but there's a ton of not so nice code or subtly wrong code generated (for example file and path manipulation).

jerkstate · 2025-05-06T16:45:00 1746549900

There are a bunch of stupid behaviors of LLM coding that will be fixed by more awareness pretty soon. Imagine putting the docs and code for all of your libraries into the context window so it can understand what exceptions might be thrown!

maccard · 2025-05-06T17:12:56 1746551576

Copilot and the likes have been around for 4 years, and we’ve been hearing this all along. I’m bullish on LLM assistants (not vibe coding) but I’d love to see some of these things actually start to happen.

kenjackson · 2025-05-06T17:35:52 1746552952

I feel like it has gotten better over time, but I don't have any metrics to confirm this. And it may also depend on what type of you language/libraries that you use.

maccard · 2025-05-06T21:49:56 1746568196

I feel like there was a huge jump when cursor et al appeared, and things have been “changing” since then rather than improving.

NeutralForest · 2025-05-06T21:13:14 1746565994

It just feels to me like trying to derive correct behavior without a proper spec so I don't see how it'll get that much better. Maybe we'll collectively remove the pathological code but otherwise I'm not seeing it.

tclancy · 2025-05-06T17:06:01 1746551161

Well, at least now we know who to blame for the training data :)

des429 · 2025-05-10T01:21:15 1746840075

What’s a blind exception?

brandall10 · 2025-05-06T15:54:29 1746546869

It's certainly annoying, but you can try following up with "can you please remove superfluous comments? In particular, if a comment doesn't add anything to the understanding of the code, it doesn't deserve to be there".

diggan · 2025-05-06T16:23:02 1746548582

I'm having the same issue, and no matter what I prompt (even stuff like "Don't add any comments at all to anything, at any time") it still tries to add these typical junior-dev comments where it's just re-iterating what the code on the next line does.

tough · 2025-05-06T17:32:54 1746552774

you can have a script that drops them all

shawabawa3 · 2025-05-06T16:45:50 1746549950

You don't need a follow up

Just end your prompt with "no code comments"

brandall10 · 2025-05-06T20:23:54 1746563034

I prefer not to do that as comments are helpful to guide the LLM, and esp. show past decisions so it doesn't redo things, at least in the scope of a feature. For me this tends to be more of a final refactoring step to tidy them up.

breppp · 2025-05-06T17:38:19 1746553099

I always thought these were there to ground the LLM on the task and produce better code, an artifact of the fact that this will autocomplete better based on past tokens. Similarly always thought this is why ChatGPT always starts every reply with repeating exactly what you asked again

rst · 2025-05-06T18:14:27 1746555267

Comments describing the organization and intent, perhaps. Comments just saying what a "require ..." line requires, not so much. (I find it will frequently put notes on the change it is making in comments, contrasting it with the previous state of the code; these aren't helpful at all to anyone doing further work on the result, and I wound up trimming a lot of them off by hand.)

Benjammer · 2025-04-18T02:44:57 1744944297

One problem is how can you even set up a "fair" competition between an AI and Rainbolt? He does ones where it flashes for a fraction of a second and then he guesses the country. How do you simulate "only saw it for a fraction of a second" to an AI?

al_borland · 2025-04-18T03:01:53 1744945313

Maybe limit the time the AI is allowed to think? In the post it showed the AI thought for almost a minute.

I’ve seen Rainbolt ID an image based on some dirt and nothing else. I’d want to see AI be able to do that before saying it’s a solved problem.

sandermvanvliet · 2025-04-18T10:18:51 1744971531

“This is the gradient of Senegal”

dewey · 2025-04-18T02:58:53 1744945133

It’s fair, it just means he’s better and AI isn’t there yet.

Benjammer · 2025-04-15T21:31:26 1744752686

This is the nerdiest way I've ever seen someone talk about John Cena

whycome · 2025-04-15T21:40:21 1744753221

What's funny is that the described action didn't click until your comment.

Benjammer · 2025-04-06T15:33:46 1743953626

I think the only reason that demand is endless is due to how profitable it is to sell that attention to advertisers...

Benjammer · 2025-04-05T20:59:35 1743886775

If the influencer is actually giving good advice because it's illegal to pay them to promote a product, is that such a bad thing? What sci fi are you talking about?