> You're just taking the translation to concepts that the LLM is certainly alrea...

mdp2021 · 2025-01-01T12:09:55 1735733395

Well, individual letters in these languages in use* do not convey specific meaning, while individual tokens do - so, you cannot really construe a ladder that would go from letter to token, then from token to sentence.

This said, to research whether the search for concepts (in the solutions space) works better than the search for tokens seems absolutely dutiful, in absence of a solid theory that showed otherwise.

(*Sounds convey their own meaning e.g. in proto-Indo-European according to some interpretations, but that becomes too remote in the current descendants - you cannot reconstruct the implicit sound-token in words directly in English, just from the spelling.)

IanCal · 2025-01-01T12:30:29 1735734629

Is that true? I thought there was a desire to move towards byte level work rather than tokens, and that the benefits of tokens was more that you are reducing the context size for the same input.

fngjdflmdflg · 2025-01-01T18:02:43 1735754563

>there was a desire to move towards byte level work rather than tokens

Yeah, latest work on this is from Meta a last month.[0] It showed good results.

[0] https://ai.meta.com/research/publications/byte-latent-transf... (https://news.ycombinator.com/item?id=42415122)