What you're saying sounds plausible, but I don't see how we can conclude that definitively without at least some empirical tests, say a set words that predictably give an error along token boundaries.
The thing is, there are many ways a model can get around to answering the same question, it doesn't just depend on the architecture but also on how the training data is structured.
For example, if it turned out tokenization was the cause of this glitch, conceivably it could be fixed by adding enough documents with data relating to letter counts, providing another path to get the right output.
What you're saying sounds plausible, but I don't see how we can conclude that definitively without at least some empirical tests, say a set words that predictably give an error along token boundaries.
The thing is, there are many ways a model can get around to answering the same question, it doesn't just depend on the architecture but also on how the training data is structured.
For example, if it turned out tokenization was the cause of this glitch, conceivably it could be fixed by adding enough documents with data relating to letter counts, providing another path to get the right output.