Grammar is surprisingly easy to learn from unstructured data, to an extent. (sou...

Der_Einzige · on Feb 6, 2023

Well, yes, but the tokenization schemes are causing these models to struggle on actually following syntactic rules in poetry, e.g. syllable counts in hiakus, rhymes. I actually wrote an entire paper which gwern cited about how to make LLMs always follow these kind of constraints with no mistakes.

https://paperswithcode.com/paper/most-language-models-can-be...

If you don't believe this is a problem, try getting ChatGPT to write a paragraph of correct English which omits words which use the letter "e" in it. Too bad you can't use my technique on ChatGPT since they don't expose their output probability distribution...

shawntan · on Feb 7, 2023

Ya I guess I was comparing the difficulty of learning to "produce mostly grammatically correct sentences in most cases" to continual learning. From the 'inside' it feels like everything OP said is just the opposite.