Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is AI hallucinating words now? I thought that would have been the easiest thing to restrict with a sufficient dictionary.

Or maybe it's an ancient dictionary. I was kind of surprised at the sizes of ]dictionaries I could find while trying to test out a personal project.



IIUC the input to LLMs is tokenized not on word boundaries but some kind of inter-syllable boundaries, because then whatever the model associated with "task" will also apply to "tasking", "tasked", "taskmaster", etc for example. So a model making up compounds that don't exist would be fully possible and even desirable, especially since real humans do it with English all the time.


They’re called “lemma”


The intent is the same, but as I understand it LLMs don't tokenize based on lemmas, though some of the tokens probably line up with them.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: