Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about a tokenizer where the digits 0-9 just get the tokens 0-9, all other tokens are derived in the usual BPE manner (except that no token other than 0-9 is allowed to contain digits). Such a model wouldn't have a notable reduction in context size and should behave largely the same in terms of training speed and language tasks, but if sbierwagen is right it might be significantly better at math (and at least on an intuitive level this seems to make sense).


This is pretty much what LLaMA did, for what it's worth. Unfortunately there are so many other architectural and training differences that comparing the arithmetic ability of LLaMA and GPT-3 would not settle this debate.


> This is pretty much what LLaMA did, for what it's worth.

I don't think that's exactly accurate. The GP proposed:

>> What about a tokenizer where the digits 0-9 just get the tokens 0-9, all other tokens are derived in the usual BPE manner (except that no token other than 0-9 is allowed to contain digits).

The paper says:

> We tokenize the data with the byte-pair encoding (BPE) algorithm (Sennrich et al., 2015), using the implementation from Sentence-Piece (Kudo and Richardson, 2018). Notably, we split all numbers into individual digits, and fallback to bytes to decompose unknown UTF-8 characters.

I read this as sayng the a number like 1002 would be split into 1,0,0,2 and then represented by the tokens IDs that point to those digits. These are not the same things.

Interestingly, the code FB provided doesn't seem to have any special handling for digits: https://github.com/facebookresearch/llama/blob/main/llama/to...


>These are not the same things.

You mean because the token ids are in a different order? This is irrelevant to a transformer model, there is no inductive bias that similar tokens should have related token ids.

>Interestingly, the code FB provided doesn't seem to have any special handling for digits

Yeah, token handling is done in sentencepiece.


There's nothing in a NN that would make the token order important.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: