That is an interesting observation. I have not gotten to the point of too long c...

CharlieDigital · on Feb 15, 2024

The use case we have is that we are asking the LLM to write articles.

As part of this, we tried having a reviewer agent "correct" the writer agent.

For example, in an article about a pasta-based recipe, the writer wrote a line like "grab your spoon and dig in" and then later wrote another line about "twirl your fork".

The reviewer agent is able to pick up this logical deviation and ask the writer to correct it. But given an instruction like "it doesn't have to be perfect", the reviewer will continue to find fault with the output from the writer for each revision so long as the content is long enough.

One workaround is that instead of fixing one long article, have the reviewer only look at small paragraphs or sections. The problem with this is that the final output can feel disjointed since the writer is no longer working with the full context of the article. This can lead to repeated sentence structure or even full on repeated phrases since you're no longer applying the sampling settings across the full text.

In the end, it was more efficient and deterministic to simply write two discrete passes: 1) writer writes the article and 2) another separate call to review and correct.

TheLegace · on Feb 15, 2024

How do you get the output to be formatted correctly or without any branches.

Say for example I want a step-by-step instruction for an action.

But the response will have 1. 2. 3. and sometimes if there are multiple pathways there will long answer with 2.a,b,c,d. This is not ideal I would rather have the most simple case(2.a.) and a short summary for other options. I have described it in the prompt but still cannot get nice clean response without to many variations of the same step.

dongecko · on Feb 15, 2024

I have not encountered this problem yet. When I was talking about the format of the answer I meant the following: No matter if you're using Langchain, Llamaindex, something self made, or Instructor (just to get a json back); under the hood there is somewhere the request to the LLM to reply in a structured way, like "answer in the following json format", or "just say 'a', 'b' or 'c'". ChatGPT tends to obey this rather well, most locally running LLMs don't. They answer like:

> Sure my friend, here is your requested json:

> ```

> {

> name: "Daniel",

> age: 47

> }

> ```

Unfortunately, the introductory sentence breaks directly parsing the answer, which means extra coding steps, or tweaking your prompt.

int_19h · on Feb 15, 2024

It's pretty easy to force a locally running model to always output valid JSON: when it gives you probabilities for the next tokens, discard all tokens that would result in invalid JSON at that point (basically reverse parsing), and then apply the usual techniques to pick the completion only from the remaining tokens. You can even validate against a JSON schema that way, so long as it is simple enough.

There are a bunch of libraries for this already, e.g.: https://github.com/outlines-dev/outlines

PeterisP · on Feb 15, 2024

If that's what you need, it would make all sense to redo the instruction fine-tuning of the model, instead of fiddling with prompt or processing to work around the model settings that go counter to what you want.

dongecko · on Feb 15, 2024

At the very beginning of my journey I did some fine tuning with Lora on a (I believe) Falcon model, but I haven't looked at it since. My impression was that injecting knowledge via fine tuning doesn't work, but tweaking behavior does. So your answer makes much sense to me. Thanks for bringing that up! I will definitively try that out.