Grammar is surprisingly easy to learn from unstructured data, to an extent. (source: I spent a lot of my PhD doing this kinda thing.)
Continual learning seems to be a tough problem though, from what I'm seeing of my friends working on this problem. Like I said in another comment, just doing gradient updates form new data is fraught with problems. RL has a bunch of techniques to mitigate issues that arise with that, but I think it's still an active area of research.
Well, yes, but the tokenization schemes are causing these models to struggle on actually following syntactic rules in poetry, e.g. syllable counts in hiakus, rhymes. I actually wrote an entire paper which gwern cited about how to make LLMs always follow these kind of constraints with no mistakes.
If you don't believe this is a problem, try getting ChatGPT to write a paragraph of correct English which omits words which use the letter "e" in it. Too bad you can't use my technique on ChatGPT since they don't expose their output probability distribution...
Ya I guess I was comparing the difficulty of learning to "produce mostly grammatically correct sentences in most cases" to continual learning. From the 'inside' it feels like everything OP said is just the opposite.
Continual learning seems to be a tough problem though, from what I'm seeing of my friends working on this problem. Like I said in another comment, just doing gradient updates form new data is fraught with problems. RL has a bunch of techniques to mitigate issues that arise with that, but I think it's still an active area of research.