The sudden lurch into paranormal book review at the end is interesting. I guess ...

macawfish · on Aug 23, 2019

Notice that "<|endoftext|>" delimiter. A lot of the samples I generated had that, and would then rapidly switch into a whole different tone or style. Maybe there was an error in their training where they somehow didn't separate training samples properly? I don't know enough about machine learning to say.

I also find it interesting that this sample got -4 points where the Sokal affair sample I posted got +4 points.

I imagine it has more to do with the emotions each sample evokes in various hackernews readers. Could it be that hackernews readers are likely to have a distaste for postcolonialism, but are likely to be fans of materialist rationalism? I think so, based on years of reading their comments :)

AdamDKing · on Aug 23, 2019

On the <|endoftext|>: GPT-2 and this model were trained by sampling fixed-length segments of text from a set of web pages. So if the sample happens to start near the end of one page then it will fill in the rest of the length with the beginning of another page. The model learns to do the same. TalkToTransformer.com hides this by not showing what comes after the <|endoftext|> token.

macawfish · on Aug 25, 2019

That explains why sometimes the talktotransformer samples are so short!