> Every paper and comment I've seen that claims "GPT-4 can't reason" is either a...

tornato7 · on Aug 8, 2023

There is a huge difference between being correct 50% of the time (like a coin flip) and correct 51% of the time. Because in the second case, you can run the model 10,000 times and the median response will be the correct one at least 97% of the time. A coin can't do that. Any papers evaluating GPT-4's responses should be doing repeat trials and building confidence intervals, like any other research. Anything else is just bad science.

YeGoblynQueenne · on Aug 9, 2023

Remind me, why do we need a computer program that gets it wrong 49% of the time and has to be run 10,000 times to get it right almost half the time, and in questions that can be answered correctly 100% of the time with a different program? And taking into account the 49%-wrong program costs millions to train and requires gigantic amounts of data and 100+ person teams to create, which alternative programs don't.

What, at the end of the day, is the use of a computer stripped of computer-like precision and recall (as in the ability to retrieve facts from memory)?

Why are people so excited about a piece of software that works only for some people, some of the time, like homeopathy or astrology?

And what does all that have to do with science?

AstralStorm · on Aug 9, 2023

Heck, ask ChatGPT if it can understand error reduction by iteration.

"How can I get my program that produces correct responses 66% of the time to always produce a correct response?"

I'm not paying for that answer. Note that it requires inverting the mathematics to prove your result is always correct.

After asking GPT that, ask it to explain how it arrived at this conclusion step by step.