Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
carom
on Oct 23, 2024
|
parent
|
context
|
favorite
| on:
Probably pay attention to tokenizers
The trained embedding vectors for the token equivalents of W4 and W-4 would be mapped to a similar space due to their appearance in the same contexts.
dangerlibrary
on Oct 23, 2024
[–]
The point of the GP post is that the "w-4" token had very different results from ["w", "-4"] or similar algorithms where the "w" and "4" wound up in separate tokens.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: