GPT is using multi-headed attention, so of course it's not as simple as putting a few texts together, but I was still interested in finding some similar texts (that can be done because the training data is only 1MB).
That's a really interesting idea. Could you go into detail about how you're searching for similar texts using GPT?
It's true that the probability distribution is a sort of "edit distance". And GPT has already been used for text compression: https://bellard.org/nncp/gpt2tc.html so it seems not too far of a stretch to use it for similarity matching.
(Sure, perhaps there are more efficient or more effective techniques than using GPT for this, but I like the idea and am curious how it works.)