This only calculates perplexity and burstiness. I don't think that's going to work very well. It would be much better to try and detect whether the distribution from which a piece of text was drawn is closer to that of a human, or that of a large language model.
But how would one go about detecting something like that? Well, one would need a model of human language trained to approximate the distribution of tokens in a large corpus of natural language... text...
But how would one go about detecting something like that? Well, one would need a model of human language trained to approximate the distribution of tokens in a large corpus of natural language... text...
Oh wait.