See line 233: https://github.com/google/sentencepiece/blob/master/src/unig...
I would suspect the n-gram counts don't cross pre-token boundaries, but I don't have time to find that in the code right now.
https://github.com/google/sentencepiece/blob/master/doc/opti...
reply
See line 233: https://github.com/google/sentencepiece/blob/master/src/unig...
I would suspect the n-gram counts don't cross pre-token boundaries, but I don't have time to find that in the code right now.