Hacker News new | past | comments | ask | show | jobs | submit login

Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm sure it's more efficient at encoding text but each token doesn't fit into a 16 bit register which must make it more inefficient computationally?



The tokens are immediately transformed into embeddings (very large vectors), so the 17 bit values are not used for any computation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: