Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Asking GPT-4 top solve simple few-step classical planning problems (twitter.com/rao2z)
9 points by YeGoblynQueenne on April 16, 2023 | hide | past | favorite | 2 comments


the TLDR seems to be that much of the improvement for GPT-4 on more complex benchmarks that GPT3 previously struggled with is due to either straight up consuming the answers to benchmarks and pattern matching them or having RLFH give the LLM the answer. Just obfuscating words for the same question actually caused GPT4 to do worse than GPT3


Also see discussion (sub-thread) on why just guessing SAT answers won't cut it in the long run:

https://twitter.com/rao2z/status/1553082695852298240




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: