> Lack of JSON schema restriction is a significant barrier to entry on hooking L...

robbiemitchell · on June 1, 2024

Asking even a top-notch LLM to output well formed JSON simply fails sometimes. And when you’re running LLMs at high volume in the background, you can’t use the best available until the last mile.

You work around it with post-processing and retries. But it’s still a bit brittle given how much stuff happens downstream without supervision.

fancy_pantser · on June 2, 2024

Constrained output with GBNF or JSON is much more efficient and less error-prone. I hope nobody outside of hobby projects is still using error/retry loops.

joatmon-snoo · on June 2, 2024

Constraining output means you don’t get to use ChatGPT or Claude though, and now you have to run your own stuff. Maybe for some folks that’s OK, but really annoying for others.

fancy_pantser · on June 2, 2024

You're totally right, I'm in my own HPC bubble. The organizations I work with create their own models and it's easy for me to forget that's the exception more than the rule. I apologize for making too many assumptions in my previous comment.

joatmon-snoo · on June 2, 2024

Not at all!

Out of curiosity- do those orgs not find the loss of generality that comes from custom models to be an issue? e.g. vs using Llama or Mistral or some other open model?

int_19h · on June 2, 2024

I do wonder why, though. Constraining output based on logits is a fairly simple and easy-to-implement idea, so why is this not part of e.g. the OpenAI API yet? They don't even have to expose it at the lowest level, just use it to force valid JSON in the output on their end.

jncfhnb · on June 2, 2024

… why would you have the LLM spit out a json rather than define the json yourself and have the LLM supply values?

esafak · on June 2, 2024

If the LLM doesn't output data that conforms to a schema, you can't reliably parse it, so you're back to square one.

jncfhnb · on June 2, 2024

It’s significantly easier to output an integer than a JSON with a key value structure where the value is an integer and everything else is exactly as desired

esafak · on June 2, 2024

That's because you've dumbed down the problem. If it was just about outputting one integer, there would be nothing to discuss. Now add a bunch more fields, add some nesting and other constraints into it...

jncfhnb · on June 2, 2024

The more complexity you add the less likely the LLM is to give you a valid response in one shot. It’s still going to be easier to get the LLM to supply values to a fixed scheme than to get the LLM to give the answers and the scheme

neverokay · on June 2, 2024

Is there a general model that got fine tuned on these json schema/output pairs?

Seems like it would be universally useful.

janpieterz · on June 2, 2024

How would I do this reliably? Eg give me 10 different values, all in one prompt for performance reasons?

Might not need JSON but whatever format it outputs, it needs to be reliable.

jncfhnb · on June 2, 2024

Don’t do it all in one prompt.

janpieterz · on June 2, 2024

Right, but now I’m basically running a huge performance hit, need to parallelize my queries etc.

I was parsing a document recently, 10-ish questions for 1 document, would make things expensive.

Might be what’s needed but not ideal.

jncfhnb · on June 3, 2024

LLM performance is a function of the number of tokens, not queries

yeahwhatever10 · on June 2, 2024

The phrase you want to search is "constrained decoding".

BoorishBears · on June 2, 2024

The best available actually have the fewest knobs for JSON schema enforcement (ie. OpenAI's JSON mode, which technically can still produce incorrect JSON)

If you're using anything less you should have a grammar that enforces exactly what tokens are allowed to be output. Fine Tuning can help too in case you're worried about the effects of constraining the generation, but in my experience it's not really a thing