Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Every paper and comment I've seen that claims "GPT-4 can't reason" is either an example of bad prompting, bad/vague english, or taking advantage of the limitations of tokenization (like asking GPT-4 how many letters there are in some word).

The problem isn't bad prompting. The problem is lack of repetition. You can ask GPT4 the same question 10 times (with same config) and you'll get wildly different/nondeterministic responses. Sometimes it accidentally happens to be correct (IME much less than half the time). Even if it was 50%, would you say a coin flip reasons? Does an 8 ball reason?



There is a huge difference between being correct 50% of the time (like a coin flip) and correct 51% of the time. Because in the second case, you can run the model 10,000 times and the median response will be the correct one at least 97% of the time. A coin can't do that. Any papers evaluating GPT-4's responses should be doing repeat trials and building confidence intervals, like any other research. Anything else is just bad science.


Remind me, why do we need a computer program that gets it wrong 49% of the time and has to be run 10,000 times to get it right almost half the time, and in questions that can be answered correctly 100% of the time with a different program? And taking into account the 49%-wrong program costs millions to train and requires gigantic amounts of data and 100+ person teams to create, which alternative programs don't.

What, at the end of the day, is the use of a computer stripped of computer-like precision and recall (as in the ability to retrieve facts from memory)?

Why are people so excited about a piece of software that works only for some people, some of the time, like homeopathy or astrology?

And what does all that have to do with science?


Heck, ask ChatGPT if it can understand error reduction by iteration.

"How can I get my program that produces correct responses 66% of the time to always produce a correct response?"

I'm not paying for that answer. Note that it requires inverting the mathematics to prove your result is always correct.

After asking GPT that, ask it to explain how it arrived at this conclusion step by step.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: