Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing I don't like about LLMs is that they vomit out a page of prose as filler around the key point which could just be a short sentence.

At least that has been my experience. I admit I don't use LLMs very much.



It's time to bind "Please be concise in your answer and only mention important details. Use a single paragraph and avoid lists. Keep me in the discussion, I'll ask for details later." to F1.


You've just made me realize that I actually do need that as a macro. Probably type that ten times per day lately. Others might include "in one sentence" or "only answer yes or no, and link sources proving your assertion".


If you’re using ChatGPT, add it to your memory so it always remembers that you prefer that.


No matter how many times I get ChatGPT to write my rules to long-term memory (I checked, and multiple rules exist in LTM multiple times), it inevitably forgets some or all of the rules because after a while, it can only see what's right in front of it, and not (what should be) the defining schema that you might provide.


I haven't used ChatGPT in a while. I used to run into a problem that sounds similar. If you're talking about:

1. Rules that get prefixed in front of your prompt as part of the real prompt ChatGPT gets. Like what they do with the system prompt.

And

2. Some content makes your prompt too big for the context windows where the rules get cut off.

Then, it might help to measure the tokens in the overall prompt, have a max number, and warn if it goes over it. I had a custom, chat app that used their API's with this feature built in.

Another possibility is, when this is detected, it asks you if you want to use one with a larger, context window. Those cost more. So, it would be presented as an option. My app let me select any of their models to do that manually.


Yep. Somehow I made mine noticably grumpier but I dont know which setting or memory piece did the job.

I really like not being complimented on literally everything with a wall of text anymore.


Yeah but it kind of kneecaps the model. They need tokens to "think". It's better to have them create a long response then distill it down later.


You need tokens to create more revenue for the company that is running the LLM. Nothing more, nothing less


Is there a well-known benchmark for this? I don't feel that short vs long answers make any difference, but ofc feelings aren't what we can measure.

Also, if that works, why doesn't copilot/cursor write lots of excessive code mixed with lots of prose only to distill it later?


> don't feel that short vs long answers make any difference

The “thinking” models are really verbose output models that summarise the thinking at the end. These tend to outperform non-thinking models, but at a higher cost.

Anthropic lets you see some/all of the thinking so you can see how the model arrived at the answer.


So if I replace "answer" with "summarize" that should work then?


One problem with LLMs is that the amount of "thinking" they do when answering a question is dependent on how many tokens they use generating the answer. A big part of the power of models like deepseek R1 is they figured out how to get a model to use a lot of tokens in a logical way to work towards solving a problem. The models don't know the answer they come to it by generating it, and generating more helps them. In the future we'll probably see the trend continue where the model generates a "thinking" response first, then the model summarizes the answer concisely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: