I take my quip both ways, so I would wager that even with finetuning, these models are only 1 generation ahead in esoteric language performance and therefore _still not very good_. Am I correct?
you wrote, emphatically, that it would be "still not very good". Why do you believe that it would be still not very good after training on a specific problem? LLMs aren't able to do things outside their training data, as vast as it is, but if it's in it's training data, why are you emphatic that it's still not very good? If I ask it to make something that it just needs to copy out sample code of, it would be pretty good at that one very specific task to me.