How good is Gemma at structured output generation, JSON schema compliance and tool use? Particularly the smaller versions, particularly in foreign languages?
We will run our internal evals on it for sure, but just wanted to ask whether that's even a use case that the team considered and trained for.
Hey, I'm from the Gemma team. There's a couple of angles to your question
We do care about prompted instructions, like json schema, and it is something we eval for and encourage you to try. Here's an example from Gemma2 to guide folks looking to do what it sounds like you're interested in.
The Ollama stuff is the old llama.cpp stuff that constrains output tokens.
It's great, I've used it to get outputs from as small a model as 1B.
But it's a stark difference in quality from, say, Phi-4's native tool-calling.
If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.
Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)
I notice in my (brief and probably user error filled, I'm an embedded dev, not an AI expert) testing, it(and pretty much every other small model) seems to have trouble interpreting numbers expressed as words when filling out a JSON object like:
You might say something like five thousand fifty six, and it will fill in something like 556 or 5560.
Like as if it is just transferring digits one by one, not using the structure to know about the implicit zero.
Which is very interesting since that seems like a mistake I would make too!
It doesn't do it all the time, and I only know about the ollama quantized version, and I mostly only try the 1B models, and I've seen similar issues with other sub-2B models as well.
The other interesting thing is in a chat, almost every model I've tried seems to interpret the numbers correctly, if you say "what's ten million and fifty times eight" it will start with "10,000,050 x 8 is...".
Sometimes they get the math wrong after that, but the number interpretation is right.
I wonder if there's something special about all "intro text" in the chat mode that is actually acting like reasoning, or if the digit separators(that don't exist in JSON) help them figure out what they're doing?
I wonder if it would be better for some applications to include a line of thoughts/summary/intro in the JSON format constraint?
Just tried gemma3:4b for structured output and it fails with a strange error ( ollama is the latest):
Ollama error: POST predict: Post "http://127.0.0.1:49675/completion": read tcp 127.0.0.1:49677->127.0.0.1:49675: wsarecv: An existing connection was forcibly closed by the remote host.
Not sure this is Ollama or gemma3:4b problem. At the same time, gemma3:12b works fine for the same API request (100% identical, only difference is model id).
We will run our internal evals on it for sure, but just wanted to ask whether that's even a use case that the team considered and trained for.