Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The result is surprisingly good

"The result" as in "the first one"? The rest is almost Markov chain level...



It's "surprising" since in the article, the 2.7B parameter model (Ada) didn't even produce sentences, while the result from 2.7B GPT Neo is closer in quality to what was output from GPT-3 Babbage (6.7B parameters).

As you can see from no. 10, GPT Neo did remember the context of trying to come up with pick-up lines. The fact that some of the lines feel "Markov chain level" is likely caused by the network's bad understanding of what pick-up lines are. Pick-up lines are a rather difficult concept, which is what the OP article tries to demonstrate.


Why do you say Ada is 2.7b ?


The sizes of the four GPT-3 variants were shared on Reddit by Stella Athena, one of the researchers behind GPT Neo: https://www.reddit.com/r/MachineLearning/comments/ma9kaw/p_e...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: