It's possible that its not about "native knowledge" but about how the descriptions (which get mapped into the prompt) for each of the tools are setup (or even their order; LLM behavior can be very sensitive to not-obviously-important prompt differences.)
I'd be cautious inferring generalizations about behavior and then explanations of those generalizations from observation of a particular LLM used via a particular toolchain.
That said, that it does that in that environment is still an interesting observation.
I'd be cautious inferring generalizations about behavior and then explanations of those generalizations from observation of a particular LLM used via a particular toolchain.
That said, that it does that in that environment is still an interesting observation.