Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed. All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself and get into a loop that hallucinates over and over. It won’t fight back, even if it happened to be right to begin with. It has no backbone to be confident it is right.


> All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself and get into a loop that hallucinates over and over.

Yeah, it's seems to be a terrible approach to try to "correct" the context by adding clarifications or telling it what's wrong.

Instead, start from 0 with the same initial prompt you used, but improve it so the LLM gets it right in the first response. If it still gets it wrong, begin from 0 again. The context seems to be "poisoned" really quickly, if you're looking for accuracy in the responses. So better to begin from the beginning as soon as it veers off course.


You are suggesting a decent way to work around the limitations of the current iteration of this technology.

The grand-parent comment was pointing out that this limitation exists; not that it can't be worked around.


> The grand-parent comment was pointing out that this limitation exists

Sure, I agree with that, but I was replying to the comment my reply was made as a reply to, which seems to not use this workflow yet, which is why they're seeing "a loop that hallucinates over and over".


That's what I like about Deepseek. The reasoning output is so verbose that I often catch problems with my prompt before the final output is even generated. Then I do exactly what you suggest.


Yeah it may be that previous training data, the model was given a strong negative signal when the human trainer told it it was wrong. In more subjective domains this might lead to sycophancy. If the human is always right and the data is always right, but the data can be interpreted multiple ways, like say human psychology, the model just adjusts to the opinion of the human.

If the question is about harder facts which the human disagrees with, this may put it into an essentially self-contradictory state, where the locus of possibilitie gets squished from each direction, and so the model is forced to respond with crazy outliers which agree with both the human and the data. The probability of an invented reference being true may be very low, but from the model's perspective, it may still be one of the highest probability outputs among a set of bad choices.

What it sounds like they may have done is just have the humans tell it it's wrong when it isn't, and then award it credit for sticking to its guns.


I put in the ChatGPT system prompt to be not sycophantic, be honest, and tell me if I am wrong. When I try to correct it, it hallucinates more complicated epicycles to explain how it was right the first time.


> All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself

Fucking Gemini Pro on the other hand digs in, and starts deciding it's in a testing scenario and get adversarial, starts claiming it's using tools the user doesn't know about, etc etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: