Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The tokenizer covers the entire dataset.

Well, this is only trivially true. You can feed binary data to the LLM and it probably has tokens that only cover single bytes of that.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: