Strong downvote, if that were possible. Someone's already quoted the heart rate ...

detectivestory · on Aug 8, 2023

I've commented below that I have tried playing some lateral thinking games with chatGPT and I have found it to be pretty poor at understanding what is going on when it has limited information/context to work with. I had played around with implementing chatGTP as a bot player on my "dark stories" app: https://detective-stories.com/ but I found that it wasn't close to even an average human player when it came to the skills needed to play either role in the game.

Smaug123 · on Aug 8, 2023

Oh absolutely a fair criticism! Personally I've all but stopped using ChatGPT (I used it six times in the last seven days before today, and two of them were for the same question about GitHub Markdown) because it's just too unreliable. But I really resent the preprint being given as evidence for ChatGPT's disutility, because it's simply bad evidence.

usgroup · on Aug 8, 2023

Might there be another explanation which fits the facts other than the authors being wrong?

Smaug123 · on Aug 8, 2023

I'm all ears!

croes · on Aug 8, 2023

OpenAI read the paper and changed the model?

Smaug123 · on Aug 8, 2023

Quick work, if they did so since the preprint was posted six days ago, of which two were a weekend! My version of ChatGPT claims to be the 3rd August version, which gave them one day to respond unless they were somehow targeting some sneak peek pre-preprint.

croes · on Aug 8, 2023

Don't know how much time they need to tweak their model but here is another possibility.

OoenAI sells GPT 4 but it's only GPT 3.5 because of lack of resources.

Or more sinister, they knew what the author was about to test and gave him the inferior model so it could be easily debunked.

usgroup · on Aug 8, 2023

27th July was the first version of the paper.

https://www.preprints.org/manuscript/202308.0148/v2

Smaug123 · on Aug 8, 2023

A whole four working days to adjust the model in between preprint release and the version of ChatGPT I'm using, then! Do you think that's plausible? I certainly don't.

croes · on Aug 8, 2023

Or simply the model was improved between the author's test and the release of the paper.

BTW the time stamp of the model is easily falsifiable.

We are talking about a billion dollar business opportunity so expect foul play all along.

adamsmith143 · on Aug 8, 2023

Yeah man they have teams on standby to adjust the model whenever a random unknown author posts something on obscure pre-print servers. Then they spend hundreds of thousands of compute $ to improve the model on that one metric the paper attacks.

croes · on Aug 8, 2023

Have you tried a similar question with different parameters?

It's pretty easy if you assume people are checking the exact same quote.