It's "surprising" since in the article, the 2.7B parameter model (Ada) didn't even produce sentences, while the result from 2.7B GPT Neo is closer in quality to what was output from GPT-3 Babbage (6.7B parameters).
As you can see from no. 10, GPT Neo did remember the context of trying to come up with pick-up lines. The fact that some of the lines feel "Markov chain level" is likely caused by the network's bad understanding of what pick-up lines are. Pick-up lines are a rather difficult concept, which is what the OP article tries to demonstrate.
"The result" as in "the first one"? The rest is almost Markov chain level...