Not 100% so for chain of thought models, they should recognize to spell the word letter by letter in some separated form and then count the tokens in that form. The Qwen distill seems to do exactly this really well:
> Step-by-step explanation:
> 1. Break down each word: "not", "really", "a", "tokenizer", "issue".
> 2. Count 'e's in each word:
> - "not": 0
> - "really": 1
> - "a": 0
> - "tokenizer": 2
> - "issue": 1
> 3. Sum the counts: 0 + 1 + 0 + 2 + 1 = 4.
>
> Answer: There are 4 E's in the phrase.
In the thought portion it broke the words up every which way you could think to check then validated the total by listing the letters in a number list by index and counting that compared to the sums of when it did each word.
"Be trained how to map" implies someone is feeding in a list of every token and what the letters for that token are as training data and then training that. More realistically, this just happens automatically during training as the model figures out what splits work with which tokens because that answer was right when it came across a spelling example or question. The "reasoning" portion comes into play by its ability to judge whether what it's doing is working rather than go with the first guess. E.g. feeding "zygomaticomaxillary" and asking for the count of 'a's gives a CoT
> <comes to an initial guess>
> Wait, is that correct? Let me double-check because sometimes I might miscount or miss letters.
> Maybe I should just go through each letter one by one. Let's write the word out in order:
> <writes one letter per line with the conclusion for each
> *Answer:* There are 3 "a"s in "zygomaticomaxillary."
It's not the only example of how to judge a model but there are more ways to accurately answering this problem than "hardcode the tokenizer data in the training" and heavily trained CoT models should be expected to hit on at least several of these other ways or it is suspect they miss similar types of things elsewhere.
> Step-by-step explanation:
> 1. Break down each word: "not", "really", "a", "tokenizer", "issue".
> 2. Count 'e's in each word:
> - "not": 0
> - "really": 1
> - "a": 0
> - "tokenizer": 2
> - "issue": 1
> 3. Sum the counts: 0 + 1 + 0 + 2 + 1 = 4.
>
> Answer: There are 4 E's in the phrase.
In the thought portion it broke the words up every which way you could think to check then validated the total by listing the letters in a number list by index and counting that compared to the sums of when it did each word.