> The result is surprisingly good "The result" as in "the first one"? The rest i...

PufPufPuf · on March 24, 2021

It's "surprising" since in the article, the 2.7B parameter model (Ada) didn't even produce sentences, while the result from 2.7B GPT Neo is closer in quality to what was output from GPT-3 Babbage (6.7B parameters).

As you can see from no. 10, GPT Neo did remember the context of trying to come up with pick-up lines. The fact that some of the lines feel "Markov chain level" is likely caused by the network's bad understanding of what pick-up lines are. Pick-up lines are a rather difficult concept, which is what the OP article tries to demonstrate.

dylanbyte · on March 24, 2021

Why do you say Ada is 2.7b ?

PufPufPuf · on March 24, 2021

The sizes of the four GPT-3 variants were shared on Reddit by Stella Athena, one of the researchers behind GPT Neo: https://www.reddit.com/r/MachineLearning/comments/ma9kaw/p_e...