The trained embedding vectors for the token equivalents of W4 and W-4 would be m... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		carom on Oct 23, 2024 \| parent \| context \| favorite \| on: Probably pay attention to tokenizers The trained embedding vectors for the token equivalents of W4 and W-4 would be mapped to a similar space due to their appearance in the same contexts.

dangerlibrary on Oct 23, 2024 [–]

The point of the GP post is that the "w-4" token had very different results from ["w", "-4"] or similar algorithms where the "w" and "4" wound up in separate tokens.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact